首页 > 最新文献

2012 11th International Conference on Machine Learning and Applications最新文献

英文 中文
Web Spam: A Study of the Page Language Effect on the Spam Detection Features 网页垃圾邮件:网页语言对垃圾邮件检测特征的影响研究
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.229
A. Alarifi, Mansour Alsaleh
Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.
尽管搜索引擎已经部署了各种技术来检测和过滤Web垃圾邮件,但Web结结巴巴者继续开发新的策略来影响搜索引擎排名算法的结果,以获得不应得的高排名。本文研究了页面语言对垃圾邮件检测特性的影响。我们研究了一组选定的检测特征的分布如何根据页面语言变化。此外,我们使用一组选定的检测特征研究了页面语言对给定分类器检测率的影响。分析结果表明,为分离垃圾邮件页面的分类器选择合适的特征在很大程度上取决于所检查的Web页面的语言,部分原因是每种类型的口吃者使用不同的Web垃圾邮件机制集。
{"title":"Web Spam: A Study of the Page Language Effect on the Spam Detection Features","authors":"A. Alarifi, Mansour Alsaleh","doi":"10.1109/ICMLA.2012.229","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.229","url":null,"abstract":"Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130878514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
On the Use of SVMs to Detect Anomalies in a Stream of SIP Messages 基于svm的SIP消息流异常检测
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.109
Raihana Ferdous, R. Cigno, A. Zorat
Voice and multimedia communications are rapidly migrating from traditional networks to TCP/IP networks (Internet), where services are provisioned by SIP (Session Initiation Protocol). This paper proposes an on-line filter that examines the stream of incoming SIP messages and classifies them as good or bad. The classification is carried out in two stages: first a lexical analysis is performed to weed out those messages that do not belong to the language generated by the grammar defined by the SIP standard. After this first stage, a second filtering occurs which identifies messages that somehow differ - in structure or contents - from messages that were previously classified as good. While the first filter stage is straightforward, as the classification is crisp (either a messages belongs to the language or it does not), the second stage requires a more delicate handling, as it is not a sharp decision whether a message is semantically meaningful or not. The approach we followed for this step is based on using past experience on previously classified messages, i.e. a "learn-by-example" approach, which led to a classifier based on Support-Vector-Machines (SVM) to perform the required analysis of each incoming SIP message. The paper describes the overall architecture of the two-stage filter and then explores several points of the configuration-space for the SVM to determine a good configuration setting that will perform well when used to classify a large sample of SIP messages obtained from real traffic collected on a VoIP installation at our institution. Finally, the performance of the classification on additional messages collected from the same source is presented.
语音和多媒体通信正迅速从传统网络向TCP/IP网络(Internet)迁移,在TCP/IP网络中,业务由SIP(会话发起协议)提供。本文提出了一种在线过滤器,该过滤器检查传入的SIP消息流并将其分类为好或坏。分类分两个阶段进行:首先执行词法分析,以清除那些不属于由SIP标准定义的语法生成的语言的消息。在第一阶段之后,将进行第二次过滤,以识别在结构或内容上与先前分类为良好的消息有所不同的消息。虽然第一个筛选阶段很简单,因为分类很清晰(消息要么属于该语言,要么不属于该语言),但第二阶段需要更精细的处理,因为它不能明确地决定消息是否在语义上有意义。我们在此步骤中采用的方法是基于使用过去对先前分类消息的经验,即“通过示例学习”方法,该方法导致基于支持向量机(SVM)的分类器对每个传入的SIP消息执行所需的分析。本文描述了两阶段过滤器的整体架构,然后探讨了支持向量机配置空间的几个点,以确定一个良好的配置设置,当用于分类从我们机构的VoIP安装上收集的真实流量中获得的大量SIP消息样本时,该配置设置将表现良好。最后,给出了对从同一来源收集的其他消息进行分类的性能。
{"title":"On the Use of SVMs to Detect Anomalies in a Stream of SIP Messages","authors":"Raihana Ferdous, R. Cigno, A. Zorat","doi":"10.1109/ICMLA.2012.109","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.109","url":null,"abstract":"Voice and multimedia communications are rapidly migrating from traditional networks to TCP/IP networks (Internet), where services are provisioned by SIP (Session Initiation Protocol). This paper proposes an on-line filter that examines the stream of incoming SIP messages and classifies them as good or bad. The classification is carried out in two stages: first a lexical analysis is performed to weed out those messages that do not belong to the language generated by the grammar defined by the SIP standard. After this first stage, a second filtering occurs which identifies messages that somehow differ - in structure or contents - from messages that were previously classified as good. While the first filter stage is straightforward, as the classification is crisp (either a messages belongs to the language or it does not), the second stage requires a more delicate handling, as it is not a sharp decision whether a message is semantically meaningful or not. The approach we followed for this step is based on using past experience on previously classified messages, i.e. a \"learn-by-example\" approach, which led to a classifier based on Support-Vector-Machines (SVM) to perform the required analysis of each incoming SIP message. The paper describes the overall architecture of the two-stage filter and then explores several points of the configuration-space for the SVM to determine a good configuration setting that will perform well when used to classify a large sample of SIP messages obtained from real traffic collected on a VoIP installation at our institution. Finally, the performance of the classification on additional messages collected from the same source is presented.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130966840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A Normalized Criterion of Spatial Clustering in Model-Based Framework 基于模型框架的空间聚类归一化准则
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.99
X. Wang, E. Grall-Maës, P. Beauseroy
This paper presents a model-based criterion for assessing the clustering results of spatial data, where both geometrical constraints and observation attributes are taken into account. An extra parameter is often used in the aim of controlling the importance of each characteristic. Since the values of both terms vary according to different realizations of data, it becomes essential to determine the parameter value which has a large influence on the clustering criterion value. Thus, an `upper-lower bound' technique is proposed to solve that problem caused by stochastic properties in both terms. In addition, we apply a normalization method to regularize the parameter value. The effectiveness of this approach is validated through the experimental results by using simulated reliability data.
本文提出了一种基于模型的空间数据聚类结果评价准则,该准则同时考虑几何约束和观测属性。为了控制每个特征的重要性,通常使用一个额外的参数。由于数据实现的不同,这两项的值也不同,因此确定对聚类准则值影响较大的参数值就变得至关重要。因此,提出了一种“上下界”技术来解决这两项的随机性质所引起的问题。此外,我们应用一种归一化方法对参数值进行正则化。通过仿真可靠性数据的实验结果验证了该方法的有效性。
{"title":"A Normalized Criterion of Spatial Clustering in Model-Based Framework","authors":"X. Wang, E. Grall-Maës, P. Beauseroy","doi":"10.1109/ICMLA.2012.99","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.99","url":null,"abstract":"This paper presents a model-based criterion for assessing the clustering results of spatial data, where both geometrical constraints and observation attributes are taken into account. An extra parameter is often used in the aim of controlling the importance of each characteristic. Since the values of both terms vary according to different realizations of data, it becomes essential to determine the parameter value which has a large influence on the clustering criterion value. Thus, an `upper-lower bound' technique is proposed to solve that problem caused by stochastic properties in both terms. In addition, we apply a normalization method to regularize the parameter value. The effectiveness of this approach is validated through the experimental results by using simulated reliability data.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130826387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques 基于一阶统计量的特征选择:一个多样化和强大的特征选择技术家族
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.192
T. Khoshgoftaar, D. Dittman, Randall Wald, Alireza Fazelpour
Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are well suited for the large levels of high dimensionality that are inherent in bioinformatics datasets (for example: DNA microarray datasets) due to their intuitive output (a ranked lists of features or genes) and their relatively small computational time compared to other techniques. This paper presents seven univariate feature selection techniques and collects them into a single family entitled First Order Statistics (FOS) based feature selection. These seven all share the trait of using first order statistical measures such as mean and standard deviation, although this is the first work to relate them to one another and consider their performance compared with one another. In order to examine the properties of these seven techniques we performed a series of similarity and classification experiments on eleven DNA microarray datasets. Our results show that in general, each feature selection technique will create diverse feature subsets when compared to the other members of the family. However when we look at classification we find that, with one exception, the techniques will produce good classification results and that the techniques will have similar performances to each other. Our recommendation, is to use the rankers Signal-to-Noise and SAM for the best classification results and to avoid Fold Change Ratio as it is consistently the worst performer of the seven rankers.
降维技术已经成为处理生物信息学数据集的必要步骤。众所周知,特征选择等技术不仅可以缩短计算时间,而且可以通过在后续分析中去除冗余和不相关的特征或基因来改善实验结果。单变量特征选择技术特别适合于生物信息学数据集(例如:DNA微阵列数据集)中固有的高维度的大水平,因为它们具有直观的输出(特征或基因的排名列表),并且与其他技术相比,它们的计算时间相对较小。本文提出了七种单变量特征选择技术,并将其归纳为一类基于一阶统计量的特征选择技术。这七种方法都有一个共同的特点,即使用一阶统计方法,如平均值和标准差,尽管这是第一次将它们相互联系起来,并将它们的表现相互比较。为了检验这七种技术的特性,我们对11个DNA微阵列数据集进行了一系列的相似性和分类实验。我们的结果表明,在一般情况下,与家族的其他成员相比,每种特征选择技术将创建不同的特征子集。然而,当我们观察分类时,我们发现,除了一个例外,这些技术将产生良好的分类结果,并且这些技术将具有彼此相似的性能。我们的建议是,使用排名器信号噪声和SAM来获得最佳分类结果,并避免Fold Change Ratio,因为它一直是七个排名器中表现最差的。
{"title":"First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques","authors":"T. Khoshgoftaar, D. Dittman, Randall Wald, Alireza Fazelpour","doi":"10.1109/ICMLA.2012.192","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.192","url":null,"abstract":"Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are well suited for the large levels of high dimensionality that are inherent in bioinformatics datasets (for example: DNA microarray datasets) due to their intuitive output (a ranked lists of features or genes) and their relatively small computational time compared to other techniques. This paper presents seven univariate feature selection techniques and collects them into a single family entitled First Order Statistics (FOS) based feature selection. These seven all share the trait of using first order statistical measures such as mean and standard deviation, although this is the first work to relate them to one another and consider their performance compared with one another. In order to examine the properties of these seven techniques we performed a series of similarity and classification experiments on eleven DNA microarray datasets. Our results show that in general, each feature selection technique will create diverse feature subsets when compared to the other members of the family. However when we look at classification we find that, with one exception, the techniques will produce good classification results and that the techniques will have similar performances to each other. Our recommendation, is to use the rankers Signal-to-Noise and SAM for the best classification results and to avoid Fold Change Ratio as it is consistently the worst performer of the seven rankers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Incremental Mitosis: Discovering Clusters of Arbitrary Shapes and Densities in Dynamic Data 增量有丝分裂:发现动态数据中任意形状和密度的簇
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.26
Rania Ibrahim, N. Ahmed, N. A. Yousri, M. Ismail
While finding natural clusters in high dimensional data is in itself a challenge, the dynamic nature of data adds another greater challenge. Many applications such as Data Warehouses and WWW demand the presence of efficient incremental clustering algorithms to handle their dynamic data. So far, numerous useful incremental clustering algorithms have been developed for large datasets such as incremental K-means, incremental DBSCAN, similarity histogram-based clustering (SHC) and mean shift. However, targeting clusters of different shapes and densities is yet to be efficiently tackled. In this work, an efficient incremental clustering algorithm (Incremental Mitosis) is proposed. It is based on Mitosis clustering algorithm which maximizes the relatedness of distances between patterns of the same cluster. The proposed algorithm is able to discover clusters of arbitrary shapes and densities in dynamic high dimensional data. Experimental results show that the proposed algorithm efficiently clusters the data and maintains the accuracy of Mitosis algorithm.
虽然在高维数据中寻找自然集群本身就是一个挑战,但数据的动态特性又增加了另一个更大的挑战。许多应用程序,如数据仓库和WWW,都需要有效的增量聚类算法来处理它们的动态数据。到目前为止,已经为大型数据集开发了许多有用的增量聚类算法,如增量K-means、增量DBSCAN、基于相似性直方图的聚类(SHC)和均值移位。然而,针对不同形状和密度的集群尚未得到有效解决。本文提出了一种高效的增量聚类算法——增量有丝分裂算法。它基于有丝分裂聚类算法,该算法最大限度地提高了同一聚类中模式之间距离的相关性。该算法能够在动态高维数据中发现任意形状和密度的聚类。实验结果表明,该算法能有效地对数据进行聚类,并保持有丝分裂算法的准确性。
{"title":"Incremental Mitosis: Discovering Clusters of Arbitrary Shapes and Densities in Dynamic Data","authors":"Rania Ibrahim, N. Ahmed, N. A. Yousri, M. Ismail","doi":"10.1109/ICMLA.2012.26","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.26","url":null,"abstract":"While finding natural clusters in high dimensional data is in itself a challenge, the dynamic nature of data adds another greater challenge. Many applications such as Data Warehouses and WWW demand the presence of efficient incremental clustering algorithms to handle their dynamic data. So far, numerous useful incremental clustering algorithms have been developed for large datasets such as incremental K-means, incremental DBSCAN, similarity histogram-based clustering (SHC) and mean shift. However, targeting clusters of different shapes and densities is yet to be efficiently tackled. In this work, an efficient incremental clustering algorithm (Incremental Mitosis) is proposed. It is based on Mitosis clustering algorithm which maximizes the relatedness of distances between patterns of the same cluster. The proposed algorithm is able to discover clusters of arbitrary shapes and densities in dynamic high dimensional data. Experimental results show that the proposed algorithm efficiently clusters the data and maintains the accuracy of Mitosis algorithm.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130808893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Obtaining Full Regularization Paths for Robust Sparse Coding with Applications to Face Recognition 鲁棒稀疏编码的全正则化路径及其在人脸识别中的应用
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.66
J. Chorowski, J. Zurada
The problem of robust sparse coding is considered. It is defined as finding linear reconstruction coefficients that minimize the sum of absolute values of the errors, instead of the more typically used sum of squares of the errors. This change lowers the influence of large errors and enhances the robustness of the solution to noise in the data. Sparsity is enforced by limiting the sum of absolute values of the coefficients. We present an algorithm that finds the path traced by the coefficients when the sparsity-inducing constraint is varied. The optimality conditions are derived and included in the algorithm to speed its execution. The proposed method is validated on the problem of robust face recognition.
研究了鲁棒稀疏编码问题。它被定义为寻找线性重建系数,使误差绝对值的总和最小化,而不是更典型地使用误差的平方和。这种变化降低了大误差的影响,增强了解决方案对数据噪声的鲁棒性。稀疏性是通过限制系数绝对值的总和来实现的。我们提出了一种算法,当稀疏性诱导约束发生变化时,找到由系数跟踪的路径。导出了最优性条件,并将其纳入算法中,以加快算法的执行速度。在鲁棒人脸识别问题上对该方法进行了验证。
{"title":"Obtaining Full Regularization Paths for Robust Sparse Coding with Applications to Face Recognition","authors":"J. Chorowski, J. Zurada","doi":"10.1109/ICMLA.2012.66","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.66","url":null,"abstract":"The problem of robust sparse coding is considered. It is defined as finding linear reconstruction coefficients that minimize the sum of absolute values of the errors, instead of the more typically used sum of squares of the errors. This change lowers the influence of large errors and enhances the robustness of the solution to noise in the data. Sparsity is enforced by limiting the sum of absolute values of the coefficients. We present an algorithm that finds the path traced by the coefficients when the sparsity-inducing constraint is varied. The optimality conditions are derived and included in the algorithm to speed its execution. The proposed method is validated on the problem of robust face recognition.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets 从推特的使用和推特的语言分析预测黑暗人格特质
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.218
Chris Sumner, A. Byers, Rachel Boochever, Gregory J. Park
Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.
社交媒体网站现在是互联网用户最受欢迎的目的地,为社会科学家提供了一个了解在线行为的绝佳机会。与社交媒体相关的研究论文越来越多,其中一小部分专注于人格预测。迄今为止,研究主要集中在人格的五大特征上,但一个相对未被探索的领域是自恋、马基雅维利主义者和精神病等反社会特征,通常被称为黑暗三合一。这项研究探讨了在多大程度上可以根据Twitter的使用情况来确定反社会人格特征。研究人员将2927名推特用户的黑暗人格特质和大五人格特质与他们的个人资料属性和语言使用进行了比较。分析表明,这些变量之间存在一些统计上显著的关系。通过使用众包机器学习算法,我们表明机器学习提供了有用的预测率,但在从Twitter活动预测个人的黑暗三合一特征方面并不完美。虽然预测模型可能不适合预测个人的性格,但当模型应用于大群体时,它们可能仍然具有实际重要性,例如获得观察反社会特征在人群中是增加还是减少的能力。我们的研究结果提出了一些重要的问题,这些问题与不受监管地使用社交媒体分析进行筛选有关。重要的是,对社交媒体网站中嵌入的个人信息得出结论的实践和伦理意义得到更好的理解。
{"title":"Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets","authors":"Chris Sumner, A. Byers, Rachel Boochever, Gregory J. Park","doi":"10.1109/ICMLA.2012.218","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.218","url":null,"abstract":"Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 266
A Novel Neural Network Based Control Method with Adaptive On-Line Training for DC-DC Converters 基于神经网络的DC-DC变换器自适应在线训练控制方法
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.152
H. Maruta, M. Motomura, F. Kurokawa
This study presents a novel adaptive control based on a neural network for dc - dc converters. The control method is required to adapt to changes of conditions to obtain high performance dc-dc converters. In this study, the neural network control is adopted to improve the transient response of dc-dc converters. It woks in coordination with a conventional PID control to realize a high adaptive method. The neural network is trained with data which is obtained on-line. Therefore, the neural network control can adapt dynamically to change of input. The adaptation is realized by the modification of the reference in the PID control. The effect of the presented method is confirmed in simulations. Results show the presented method contributes to realize such adaptive control.
提出了一种基于神经网络的直流变换器自适应控制方法。为了获得高性能的dc-dc变换器,要求控制方法适应条件的变化。本研究采用神经网络控制来改善dc-dc变换器的暂态响应。它与传统的PID控制协同工作,实现了高自适应控制。神经网络用在线获取的数据进行训练。因此,神经网络控制可以动态适应输入的变化。通过修改PID控制中的参考值来实现自适应。仿真结果验证了该方法的有效性。结果表明,该方法有助于实现这种自适应控制。
{"title":"A Novel Neural Network Based Control Method with Adaptive On-Line Training for DC-DC Converters","authors":"H. Maruta, M. Motomura, F. Kurokawa","doi":"10.1109/ICMLA.2012.152","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.152","url":null,"abstract":"This study presents a novel adaptive control based on a neural network for dc - dc converters. The control method is required to adapt to changes of conditions to obtain high performance dc-dc converters. In this study, the neural network control is adopted to improve the transient response of dc-dc converters. It woks in coordination with a conventional PID control to realize a high adaptive method. The neural network is trained with data which is obtained on-line. Therefore, the neural network control can adapt dynamically to change of input. The adaptation is realized by the modification of the reference in the PID control. The effect of the presented method is confirmed in simulations. Results show the presented method contributes to realize such adaptive control.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126710345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Real-Time Statistical Background Learning for Foreground Detection under Unstable Illuminations 不稳定光照下前景检测的实时统计背景学习
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.85
Dawei Li, Lihong Xu, E. Goodman
This work proposes a fast background learning algorithm for foreground detection under changing illumination. Gaussian Mixture Model (GMM) is an effective statistical model in background learning. We first focus on Titterington's online EM algorithm that can be used for real-time unsupervised GMM learning, and then advocate a deterministic data assignment strategy to avoid Bayesian computation. The color of the foreground is apt to be influenced by the environmental illumination that usually produce undesirable effect for GMM updating, however, a collinear feature of pixel intensity under changing light is discovered in RGB color space. This feature is afterward used as a reliable clue to decide which part of mixture to update under changing light. A foreground detection step proposed in early version of this work is employed to extract foreground objects by comparing the estimated background model with the current video frame. Experiments have shown the proposed method is able to achieve satisfactory static background images of scenes as well as is also superior to some mainstream methods in detection performance under both indoor and outdoor scenes.
本文提出了一种快速背景学习算法,用于光照变化下的前景检测。高斯混合模型(GMM)是一种有效的背景学习统计模型。我们首先研究了Titterington的在线EM算法,该算法可用于实时无监督GMM学习,然后提出了一种确定性数据分配策略,以避免贝叶斯计算。前景的颜色容易受到环境光照的影响,通常会对GMM的更新产生不利的影响,但在RGB色彩空间中,发现了光照变化下像素强度的共线特征。这个特征随后被用作一个可靠的线索来决定在变化的光线下更新混合物的哪一部分。本文采用早期提出的前景检测步骤,通过将估计的背景模型与当前视频帧进行比较,提取前景目标。实验表明,该方法在室内和室外场景下都能获得令人满意的静态场景背景图像,并且在检测性能上也优于一些主流方法。
{"title":"Real-Time Statistical Background Learning for Foreground Detection under Unstable Illuminations","authors":"Dawei Li, Lihong Xu, E. Goodman","doi":"10.1109/ICMLA.2012.85","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.85","url":null,"abstract":"This work proposes a fast background learning algorithm for foreground detection under changing illumination. Gaussian Mixture Model (GMM) is an effective statistical model in background learning. We first focus on Titterington's online EM algorithm that can be used for real-time unsupervised GMM learning, and then advocate a deterministic data assignment strategy to avoid Bayesian computation. The color of the foreground is apt to be influenced by the environmental illumination that usually produce undesirable effect for GMM updating, however, a collinear feature of pixel intensity under changing light is discovered in RGB color space. This feature is afterward used as a reliable clue to decide which part of mixture to update under changing light. A foreground detection step proposed in early version of this work is employed to extract foreground objects by comparing the estimated background model with the current video frame. Experiments have shown the proposed method is able to achieve satisfactory static background images of scenes as well as is also superior to some mainstream methods in detection performance under both indoor and outdoor scenes.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126715755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Excitation Current Forecasting for Reactive Power Compensation in Synchronous Motors: A Data Mining Approach 同步电机无功补偿励磁电流预测:一种数据挖掘方法
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.185
R. Bayindir, M. Yesilbudak, I. Colak, Ş. Sağiroğlu
Excitation current of a synchronous motor has a key role in reactive power compensation. For this purpose, the k-nearest neighbor (k-NN) classifier designed in this paper predicts the excitation current parameter using n-tupled inputs. Load current, power factor, power factor error and the change of excitation current parameters were utilized in n-tupled inputs. Moreover, Euclidean, Manhattan and Minkowski distance metrics were employed for measuring the closeness among the observations and the nearest neighbor number k was assigned as 1, 2, 3, 4 and 5, respectively. The forecasting results have shown that the k-NN classifier which uses power factor and the change of excitation current parameters achieved the best forecasting accuracy for k=1 in Minkowski distance metric. However, the k-NN classifier which uses load current, power factor and power factor error parameters gave the worst forecasting accuracy for k=5 in Minkowski distance metric.
同步电动机励磁电流在无功补偿中起着关键作用。为此,本文设计的k近邻(k-NN)分类器使用n元输入来预测激励电流参数。将负载电流、功率因数、功率因数误差和励磁电流参数的变化作为n元输入。采用欧几里得距离、曼哈顿距离和闵可夫斯基距离度量来度量观测值之间的接近程度,并将最近邻数k分别定为1、2、3、4和5。预测结果表明,利用功率因数和励磁电流参数变化的k- nn分类器在闵可夫斯基距离度量中k=1时的预测精度最好。然而,使用负载电流、功率因数和功率因数误差参数的k- nn分类器在闵可夫斯基距离度量中对k=5的预测精度最差。
{"title":"Excitation Current Forecasting for Reactive Power Compensation in Synchronous Motors: A Data Mining Approach","authors":"R. Bayindir, M. Yesilbudak, I. Colak, Ş. Sağiroğlu","doi":"10.1109/ICMLA.2012.185","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.185","url":null,"abstract":"Excitation current of a synchronous motor has a key role in reactive power compensation. For this purpose, the k-nearest neighbor (k-NN) classifier designed in this paper predicts the excitation current parameter using n-tupled inputs. Load current, power factor, power factor error and the change of excitation current parameters were utilized in n-tupled inputs. Moreover, Euclidean, Manhattan and Minkowski distance metrics were employed for measuring the closeness among the observations and the nearest neighbor number k was assigned as 1, 2, 3, 4 and 5, respectively. The forecasting results have shown that the k-NN classifier which uses power factor and the change of excitation current parameters achieved the best forecasting accuracy for k=1 in Minkowski distance metric. However, the k-NN classifier which uses load current, power factor and power factor error parameters gave the worst forecasting accuracy for k=5 in Minkowski distance metric.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"42 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114048572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2012 11th International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1