Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.
{"title":"Web Spam: A Study of the Page Language Effect on the Spam Detection Features","authors":"A. Alarifi, Mansour Alsaleh","doi":"10.1109/ICMLA.2012.229","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.229","url":null,"abstract":"Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130878514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Voice and multimedia communications are rapidly migrating from traditional networks to TCP/IP networks (Internet), where services are provisioned by SIP (Session Initiation Protocol). This paper proposes an on-line filter that examines the stream of incoming SIP messages and classifies them as good or bad. The classification is carried out in two stages: first a lexical analysis is performed to weed out those messages that do not belong to the language generated by the grammar defined by the SIP standard. After this first stage, a second filtering occurs which identifies messages that somehow differ - in structure or contents - from messages that were previously classified as good. While the first filter stage is straightforward, as the classification is crisp (either a messages belongs to the language or it does not), the second stage requires a more delicate handling, as it is not a sharp decision whether a message is semantically meaningful or not. The approach we followed for this step is based on using past experience on previously classified messages, i.e. a "learn-by-example" approach, which led to a classifier based on Support-Vector-Machines (SVM) to perform the required analysis of each incoming SIP message. The paper describes the overall architecture of the two-stage filter and then explores several points of the configuration-space for the SVM to determine a good configuration setting that will perform well when used to classify a large sample of SIP messages obtained from real traffic collected on a VoIP installation at our institution. Finally, the performance of the classification on additional messages collected from the same source is presented.
{"title":"On the Use of SVMs to Detect Anomalies in a Stream of SIP Messages","authors":"Raihana Ferdous, R. Cigno, A. Zorat","doi":"10.1109/ICMLA.2012.109","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.109","url":null,"abstract":"Voice and multimedia communications are rapidly migrating from traditional networks to TCP/IP networks (Internet), where services are provisioned by SIP (Session Initiation Protocol). This paper proposes an on-line filter that examines the stream of incoming SIP messages and classifies them as good or bad. The classification is carried out in two stages: first a lexical analysis is performed to weed out those messages that do not belong to the language generated by the grammar defined by the SIP standard. After this first stage, a second filtering occurs which identifies messages that somehow differ - in structure or contents - from messages that were previously classified as good. While the first filter stage is straightforward, as the classification is crisp (either a messages belongs to the language or it does not), the second stage requires a more delicate handling, as it is not a sharp decision whether a message is semantically meaningful or not. The approach we followed for this step is based on using past experience on previously classified messages, i.e. a \"learn-by-example\" approach, which led to a classifier based on Support-Vector-Machines (SVM) to perform the required analysis of each incoming SIP message. The paper describes the overall architecture of the two-stage filter and then explores several points of the configuration-space for the SVM to determine a good configuration setting that will perform well when used to classify a large sample of SIP messages obtained from real traffic collected on a VoIP installation at our institution. Finally, the performance of the classification on additional messages collected from the same source is presented.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130966840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a model-based criterion for assessing the clustering results of spatial data, where both geometrical constraints and observation attributes are taken into account. An extra parameter is often used in the aim of controlling the importance of each characteristic. Since the values of both terms vary according to different realizations of data, it becomes essential to determine the parameter value which has a large influence on the clustering criterion value. Thus, an `upper-lower bound' technique is proposed to solve that problem caused by stochastic properties in both terms. In addition, we apply a normalization method to regularize the parameter value. The effectiveness of this approach is validated through the experimental results by using simulated reliability data.
{"title":"A Normalized Criterion of Spatial Clustering in Model-Based Framework","authors":"X. Wang, E. Grall-Maës, P. Beauseroy","doi":"10.1109/ICMLA.2012.99","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.99","url":null,"abstract":"This paper presents a model-based criterion for assessing the clustering results of spatial data, where both geometrical constraints and observation attributes are taken into account. An extra parameter is often used in the aim of controlling the importance of each characteristic. Since the values of both terms vary according to different realizations of data, it becomes essential to determine the parameter value which has a large influence on the clustering criterion value. Thus, an `upper-lower bound' technique is proposed to solve that problem caused by stochastic properties in both terms. In addition, we apply a normalization method to regularize the parameter value. The effectiveness of this approach is validated through the experimental results by using simulated reliability data.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130826387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Khoshgoftaar, D. Dittman, Randall Wald, Alireza Fazelpour
Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are well suited for the large levels of high dimensionality that are inherent in bioinformatics datasets (for example: DNA microarray datasets) due to their intuitive output (a ranked lists of features or genes) and their relatively small computational time compared to other techniques. This paper presents seven univariate feature selection techniques and collects them into a single family entitled First Order Statistics (FOS) based feature selection. These seven all share the trait of using first order statistical measures such as mean and standard deviation, although this is the first work to relate them to one another and consider their performance compared with one another. In order to examine the properties of these seven techniques we performed a series of similarity and classification experiments on eleven DNA microarray datasets. Our results show that in general, each feature selection technique will create diverse feature subsets when compared to the other members of the family. However when we look at classification we find that, with one exception, the techniques will produce good classification results and that the techniques will have similar performances to each other. Our recommendation, is to use the rankers Signal-to-Noise and SAM for the best classification results and to avoid Fold Change Ratio as it is consistently the worst performer of the seven rankers.
{"title":"First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques","authors":"T. Khoshgoftaar, D. Dittman, Randall Wald, Alireza Fazelpour","doi":"10.1109/ICMLA.2012.192","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.192","url":null,"abstract":"Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are well suited for the large levels of high dimensionality that are inherent in bioinformatics datasets (for example: DNA microarray datasets) due to their intuitive output (a ranked lists of features or genes) and their relatively small computational time compared to other techniques. This paper presents seven univariate feature selection techniques and collects them into a single family entitled First Order Statistics (FOS) based feature selection. These seven all share the trait of using first order statistical measures such as mean and standard deviation, although this is the first work to relate them to one another and consider their performance compared with one another. In order to examine the properties of these seven techniques we performed a series of similarity and classification experiments on eleven DNA microarray datasets. Our results show that in general, each feature selection technique will create diverse feature subsets when compared to the other members of the family. However when we look at classification we find that, with one exception, the techniques will produce good classification results and that the techniques will have similar performances to each other. Our recommendation, is to use the rankers Signal-to-Noise and SAM for the best classification results and to avoid Fold Change Ratio as it is consistently the worst performer of the seven rankers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While finding natural clusters in high dimensional data is in itself a challenge, the dynamic nature of data adds another greater challenge. Many applications such as Data Warehouses and WWW demand the presence of efficient incremental clustering algorithms to handle their dynamic data. So far, numerous useful incremental clustering algorithms have been developed for large datasets such as incremental K-means, incremental DBSCAN, similarity histogram-based clustering (SHC) and mean shift. However, targeting clusters of different shapes and densities is yet to be efficiently tackled. In this work, an efficient incremental clustering algorithm (Incremental Mitosis) is proposed. It is based on Mitosis clustering algorithm which maximizes the relatedness of distances between patterns of the same cluster. The proposed algorithm is able to discover clusters of arbitrary shapes and densities in dynamic high dimensional data. Experimental results show that the proposed algorithm efficiently clusters the data and maintains the accuracy of Mitosis algorithm.
{"title":"Incremental Mitosis: Discovering Clusters of Arbitrary Shapes and Densities in Dynamic Data","authors":"Rania Ibrahim, N. Ahmed, N. A. Yousri, M. Ismail","doi":"10.1109/ICMLA.2012.26","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.26","url":null,"abstract":"While finding natural clusters in high dimensional data is in itself a challenge, the dynamic nature of data adds another greater challenge. Many applications such as Data Warehouses and WWW demand the presence of efficient incremental clustering algorithms to handle their dynamic data. So far, numerous useful incremental clustering algorithms have been developed for large datasets such as incremental K-means, incremental DBSCAN, similarity histogram-based clustering (SHC) and mean shift. However, targeting clusters of different shapes and densities is yet to be efficiently tackled. In this work, an efficient incremental clustering algorithm (Incremental Mitosis) is proposed. It is based on Mitosis clustering algorithm which maximizes the relatedness of distances between patterns of the same cluster. The proposed algorithm is able to discover clusters of arbitrary shapes and densities in dynamic high dimensional data. Experimental results show that the proposed algorithm efficiently clusters the data and maintains the accuracy of Mitosis algorithm.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130808893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of robust sparse coding is considered. It is defined as finding linear reconstruction coefficients that minimize the sum of absolute values of the errors, instead of the more typically used sum of squares of the errors. This change lowers the influence of large errors and enhances the robustness of the solution to noise in the data. Sparsity is enforced by limiting the sum of absolute values of the coefficients. We present an algorithm that finds the path traced by the coefficients when the sparsity-inducing constraint is varied. The optimality conditions are derived and included in the algorithm to speed its execution. The proposed method is validated on the problem of robust face recognition.
{"title":"Obtaining Full Regularization Paths for Robust Sparse Coding with Applications to Face Recognition","authors":"J. Chorowski, J. Zurada","doi":"10.1109/ICMLA.2012.66","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.66","url":null,"abstract":"The problem of robust sparse coding is considered. It is defined as finding linear reconstruction coefficients that minimize the sum of absolute values of the errors, instead of the more typically used sum of squares of the errors. This change lowers the influence of large errors and enhances the robustness of the solution to noise in the data. Sparsity is enforced by limiting the sum of absolute values of the coefficients. We present an algorithm that finds the path traced by the coefficients when the sparsity-inducing constraint is varied. The optimality conditions are derived and included in the algorithm to speed its execution. The proposed method is validated on the problem of robust face recognition.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chris Sumner, A. Byers, Rachel Boochever, Gregory J. Park
Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.
{"title":"Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets","authors":"Chris Sumner, A. Byers, Rachel Boochever, Gregory J. Park","doi":"10.1109/ICMLA.2012.218","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.218","url":null,"abstract":"Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study presents a novel adaptive control based on a neural network for dc - dc converters. The control method is required to adapt to changes of conditions to obtain high performance dc-dc converters. In this study, the neural network control is adopted to improve the transient response of dc-dc converters. It woks in coordination with a conventional PID control to realize a high adaptive method. The neural network is trained with data which is obtained on-line. Therefore, the neural network control can adapt dynamically to change of input. The adaptation is realized by the modification of the reference in the PID control. The effect of the presented method is confirmed in simulations. Results show the presented method contributes to realize such adaptive control.
{"title":"A Novel Neural Network Based Control Method with Adaptive On-Line Training for DC-DC Converters","authors":"H. Maruta, M. Motomura, F. Kurokawa","doi":"10.1109/ICMLA.2012.152","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.152","url":null,"abstract":"This study presents a novel adaptive control based on a neural network for dc - dc converters. The control method is required to adapt to changes of conditions to obtain high performance dc-dc converters. In this study, the neural network control is adopted to improve the transient response of dc-dc converters. It woks in coordination with a conventional PID control to realize a high adaptive method. The neural network is trained with data which is obtained on-line. Therefore, the neural network control can adapt dynamically to change of input. The adaptation is realized by the modification of the reference in the PID control. The effect of the presented method is confirmed in simulations. Results show the presented method contributes to realize such adaptive control.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126710345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes a fast background learning algorithm for foreground detection under changing illumination. Gaussian Mixture Model (GMM) is an effective statistical model in background learning. We first focus on Titterington's online EM algorithm that can be used for real-time unsupervised GMM learning, and then advocate a deterministic data assignment strategy to avoid Bayesian computation. The color of the foreground is apt to be influenced by the environmental illumination that usually produce undesirable effect for GMM updating, however, a collinear feature of pixel intensity under changing light is discovered in RGB color space. This feature is afterward used as a reliable clue to decide which part of mixture to update under changing light. A foreground detection step proposed in early version of this work is employed to extract foreground objects by comparing the estimated background model with the current video frame. Experiments have shown the proposed method is able to achieve satisfactory static background images of scenes as well as is also superior to some mainstream methods in detection performance under both indoor and outdoor scenes.
{"title":"Real-Time Statistical Background Learning for Foreground Detection under Unstable Illuminations","authors":"Dawei Li, Lihong Xu, E. Goodman","doi":"10.1109/ICMLA.2012.85","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.85","url":null,"abstract":"This work proposes a fast background learning algorithm for foreground detection under changing illumination. Gaussian Mixture Model (GMM) is an effective statistical model in background learning. We first focus on Titterington's online EM algorithm that can be used for real-time unsupervised GMM learning, and then advocate a deterministic data assignment strategy to avoid Bayesian computation. The color of the foreground is apt to be influenced by the environmental illumination that usually produce undesirable effect for GMM updating, however, a collinear feature of pixel intensity under changing light is discovered in RGB color space. This feature is afterward used as a reliable clue to decide which part of mixture to update under changing light. A foreground detection step proposed in early version of this work is employed to extract foreground objects by comparing the estimated background model with the current video frame. Experiments have shown the proposed method is able to achieve satisfactory static background images of scenes as well as is also superior to some mainstream methods in detection performance under both indoor and outdoor scenes.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126715755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Bayindir, M. Yesilbudak, I. Colak, Ş. Sağiroğlu
Excitation current of a synchronous motor has a key role in reactive power compensation. For this purpose, the k-nearest neighbor (k-NN) classifier designed in this paper predicts the excitation current parameter using n-tupled inputs. Load current, power factor, power factor error and the change of excitation current parameters were utilized in n-tupled inputs. Moreover, Euclidean, Manhattan and Minkowski distance metrics were employed for measuring the closeness among the observations and the nearest neighbor number k was assigned as 1, 2, 3, 4 and 5, respectively. The forecasting results have shown that the k-NN classifier which uses power factor and the change of excitation current parameters achieved the best forecasting accuracy for k=1 in Minkowski distance metric. However, the k-NN classifier which uses load current, power factor and power factor error parameters gave the worst forecasting accuracy for k=5 in Minkowski distance metric.
{"title":"Excitation Current Forecasting for Reactive Power Compensation in Synchronous Motors: A Data Mining Approach","authors":"R. Bayindir, M. Yesilbudak, I. Colak, Ş. Sağiroğlu","doi":"10.1109/ICMLA.2012.185","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.185","url":null,"abstract":"Excitation current of a synchronous motor has a key role in reactive power compensation. For this purpose, the k-nearest neighbor (k-NN) classifier designed in this paper predicts the excitation current parameter using n-tupled inputs. Load current, power factor, power factor error and the change of excitation current parameters were utilized in n-tupled inputs. Moreover, Euclidean, Manhattan and Minkowski distance metrics were employed for measuring the closeness among the observations and the nearest neighbor number k was assigned as 1, 2, 3, 4 and 5, respectively. The forecasting results have shown that the k-NN classifier which uses power factor and the change of excitation current parameters achieved the best forecasting accuracy for k=1 in Minkowski distance metric. However, the k-NN classifier which uses load current, power factor and power factor error parameters gave the worst forecasting accuracy for k=5 in Minkowski distance metric.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"42 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114048572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}