首页 > 最新文献

2012 11th International Conference on Machine Learning and Applications最新文献

英文 中文
Obtaining Full Regularization Paths for Robust Sparse Coding with Applications to Face Recognition 鲁棒稀疏编码的全正则化路径及其在人脸识别中的应用
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.66
J. Chorowski, J. Zurada
The problem of robust sparse coding is considered. It is defined as finding linear reconstruction coefficients that minimize the sum of absolute values of the errors, instead of the more typically used sum of squares of the errors. This change lowers the influence of large errors and enhances the robustness of the solution to noise in the data. Sparsity is enforced by limiting the sum of absolute values of the coefficients. We present an algorithm that finds the path traced by the coefficients when the sparsity-inducing constraint is varied. The optimality conditions are derived and included in the algorithm to speed its execution. The proposed method is validated on the problem of robust face recognition.
研究了鲁棒稀疏编码问题。它被定义为寻找线性重建系数,使误差绝对值的总和最小化,而不是更典型地使用误差的平方和。这种变化降低了大误差的影响,增强了解决方案对数据噪声的鲁棒性。稀疏性是通过限制系数绝对值的总和来实现的。我们提出了一种算法,当稀疏性诱导约束发生变化时,找到由系数跟踪的路径。导出了最优性条件,并将其纳入算法中,以加快算法的执行速度。在鲁棒人脸识别问题上对该方法进行了验证。
{"title":"Obtaining Full Regularization Paths for Robust Sparse Coding with Applications to Face Recognition","authors":"J. Chorowski, J. Zurada","doi":"10.1109/ICMLA.2012.66","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.66","url":null,"abstract":"The problem of robust sparse coding is considered. It is defined as finding linear reconstruction coefficients that minimize the sum of absolute values of the errors, instead of the more typically used sum of squares of the errors. This change lowers the influence of large errors and enhances the robustness of the solution to noise in the data. Sparsity is enforced by limiting the sum of absolute values of the coefficients. We present an algorithm that finds the path traced by the coefficients when the sparsity-inducing constraint is varied. The optimality conditions are derived and included in the algorithm to speed its execution. The proposed method is validated on the problem of robust face recognition.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Taxonomic Dimensionality Reduction in Bayesian Text Classification 贝叶斯文本分类中的分类降维
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.93
Richard A. McAllister, John W. Sheppard
Lexical abstraction hierarchies can be leveraged to provide semantic information that characterizes features of text corpora as a whole. This information may be used to determine the classification utility of the dimensions that describe a dataset. This paper presents a new method for preparing a dataset for probabilistic classification by determining, a priori, the utility of a very small subset of taxonomically-related dimensions via a Discriminative Multinomial Naive Bayes process. We show that this method yields significant improvements over both Discriminative Multinomial Naive Bayes and Bayesian network classifiers alone.
可以利用词汇抽象层次结构来提供语义信息,这些信息将文本语料库的特征作为一个整体来描述。此信息可用于确定描述数据集的维度的分类效用。本文提出了一种通过判别多项式朴素贝叶斯过程先验地确定分类相关维度的极小子集的效用来准备用于概率分类的数据集的新方法。我们证明这种方法比单独的判别多项式朴素贝叶斯和贝叶斯网络分类器都有显著的改进。
{"title":"Taxonomic Dimensionality Reduction in Bayesian Text Classification","authors":"Richard A. McAllister, John W. Sheppard","doi":"10.1109/ICMLA.2012.93","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.93","url":null,"abstract":"Lexical abstraction hierarchies can be leveraged to provide semantic information that characterizes features of text corpora as a whole. This information may be used to determine the classification utility of the dimensions that describe a dataset. This paper presents a new method for preparing a dataset for probabilistic classification by determining, a priori, the utility of a very small subset of taxonomically-related dimensions via a Discriminative Multinomial Naive Bayes process. We show that this method yields significant improvements over both Discriminative Multinomial Naive Bayes and Bayesian network classifiers alone.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115478448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face Recognition in the Virtual World: Recognizing Avatar Faces 虚拟世界中的人脸识别:识别化身的面孔
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.16
Roman V. Yampolskiy, Brendan Klare, Anil K. Jain
Criminal activity in virtual worlds is becoming a major problem for law enforcement agencies. Forensic investigators are becoming interested in being able to accurately and automatically track people in virtual communities. In this paper a set of algorithms capable of verification and recognition of avatar faces with high degree of accuracy are described. Results of experiments aimed at within-virtual-world avatar authentication and inter-reality-based scenarios of tracking a person between real and virtual worlds are reported. In the FERET-to-Avatar face dataset, where an avatar face was generated from every photo in the FERET database, a COTS FR algorithm achieved a near perfect 99.58% accuracy on 725 subjects. On a dataset of avatars from Second Life, the proposed avatar-to-avatar matching algorithm (which uses a fusion of local structural and appearance descriptors) achieved average true accept rates of (i) 96.33% using manual eye detection, and (ii) 86.5% in a fully automated mode at a false accept rate of 1.0%. A combination of the proposed face matcher and a state-of-the art commercial matcher (FaceVACS) resulted in further improvement on the inter-reality-based scenario.
虚拟世界中的犯罪活动正成为执法机构面临的一个主要问题。法医调查人员对能够准确、自动地追踪虚拟社区中的人越来越感兴趣。本文描述了一套能够对虚拟形象人脸进行高精度验证和识别的算法。本文报道了虚拟世界内的虚拟身份认证和基于虚拟世界和真实世界之间跟踪人的跨现实场景的实验结果。在FERET-to- avatar人脸数据集中,从FERET数据库中的每张照片生成头像,COTS FR算法在725个受试者上实现了近乎完美的99.58%的准确率。在《第二人生》的化身数据集上,提出的化身到化身匹配算法(使用局部结构和外观描述符的融合)在使用手动眼睛检测时实现了(i) 96.33%的平均真实接受率,(ii)在完全自动化模式下实现了86.5%的平均真实接受率,错误接受率为1.0%。将拟议的人脸匹配器与最先进的商用匹配器(FaceVACS)结合在一起,进一步改进了基于inter-reality的场景。
{"title":"Face Recognition in the Virtual World: Recognizing Avatar Faces","authors":"Roman V. Yampolskiy, Brendan Klare, Anil K. Jain","doi":"10.1109/ICMLA.2012.16","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.16","url":null,"abstract":"Criminal activity in virtual worlds is becoming a major problem for law enforcement agencies. Forensic investigators are becoming interested in being able to accurately and automatically track people in virtual communities. In this paper a set of algorithms capable of verification and recognition of avatar faces with high degree of accuracy are described. Results of experiments aimed at within-virtual-world avatar authentication and inter-reality-based scenarios of tracking a person between real and virtual worlds are reported. In the FERET-to-Avatar face dataset, where an avatar face was generated from every photo in the FERET database, a COTS FR algorithm achieved a near perfect 99.58% accuracy on 725 subjects. On a dataset of avatars from Second Life, the proposed avatar-to-avatar matching algorithm (which uses a fusion of local structural and appearance descriptors) achieved average true accept rates of (i) 96.33% using manual eye detection, and (ii) 86.5% in a fully automated mode at a false accept rate of 1.0%. A combination of the proposed face matcher and a state-of-the art commercial matcher (FaceVACS) resulted in further improvement on the inter-reality-based scenario.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115485413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Semantic Data Types in Machine Learning from Healthcare Data 医疗保健数据机器学习中的语义数据类型
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.41
Janusz Wojtusiak
Healthcare is particularly rich in semantic information and background knowledge describing data. This paper discusses a number of semantic data types that can be found in healthcare data, presents how the semantics can be extracted from existing sources including the Unified Medical Language System (UMLS), discusses how the semantics can be used in both supervised and unsupervised learning, and presents an example rule learning system that implements several of these types. Results from three example applications in the healthcare domain are used to further exemplify semantic data types.
医疗保健尤其具有丰富的语义信息和描述数据的背景知识。本文讨论了可以在医疗保健数据中找到的许多语义数据类型,介绍了如何从包括统一医学语言系统(UMLS)在内的现有来源中提取语义,讨论了如何在监督学习和无监督学习中使用语义,并提供了实现其中几种类型的示例规则学习系统。来自医疗保健领域的三个示例应用程序的结果用于进一步举例说明语义数据类型。
{"title":"Semantic Data Types in Machine Learning from Healthcare Data","authors":"Janusz Wojtusiak","doi":"10.1109/ICMLA.2012.41","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.41","url":null,"abstract":"Healthcare is particularly rich in semantic information and background knowledge describing data. This paper discusses a number of semantic data types that can be found in healthcare data, presents how the semantics can be extracted from existing sources including the Unified Medical Language System (UMLS), discusses how the semantics can be used in both supervised and unsupervised learning, and presents an example rule learning system that implements several of these types. Results from three example applications in the healthcare domain are used to further exemplify semantic data types.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122932925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Feature Mapping and Fusion for Music Genre Classification 音乐类型分类的特征映射与融合
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.59
H. Balti, H. Frigui
We propose a feature level fusion that is based on mapping the original low-level audio features to histogram descriptors. Our mapping is based on possibilistic membership functions and has two main components. The first one consists of clustering each set of features and identifying a set of representative prototypes. The second component uses the learned prototypes within membership functions to transform the original features into histograms. The mapping transforms features of different dimensions to histograms of fixed dimensions. This makes the fusion of multiple features less biased by the dimensionality and distributions of the different features. Using a standard collection of songs, we show that the transformed features provide higher classification accuracy than the original features. We also show that mapping simple low-level features and using a K-NN classifier provides results comparable to the state-of-the art.
我们提出了一种基于将原始低级音频特征映射到直方图描述符的特征级融合。我们的映射是基于可能性隶属函数的,它有两个主要组成部分。第一种方法包括对每组特征进行聚类,并确定一组具有代表性的原型。第二个组件使用隶属函数中学习到的原型将原始特征转换成直方图。映射将不同维度的特征转换为固定维度的直方图。这使得多特征的融合较少受到不同特征的维数和分布的影响。使用标准的歌曲集,我们证明了转换后的特征比原始特征提供了更高的分类精度。我们还表明,映射简单的低级特征和使用K-NN分类器可以提供与最先进的结果相媲美的结果。
{"title":"Feature Mapping and Fusion for Music Genre Classification","authors":"H. Balti, H. Frigui","doi":"10.1109/ICMLA.2012.59","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.59","url":null,"abstract":"We propose a feature level fusion that is based on mapping the original low-level audio features to histogram descriptors. Our mapping is based on possibilistic membership functions and has two main components. The first one consists of clustering each set of features and identifying a set of representative prototypes. The second component uses the learned prototypes within membership functions to transform the original features into histograms. The mapping transforms features of different dimensions to histograms of fixed dimensions. This makes the fusion of multiple features less biased by the dimensionality and distributions of the different features. Using a standard collection of songs, we show that the transformed features provide higher classification accuracy than the original features. We also show that mapping simple low-level features and using a K-NN classifier provides results comparable to the state-of-the art.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123929704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets 从推特的使用和推特的语言分析预测黑暗人格特质
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.218
Chris Sumner, A. Byers, Rachel Boochever, Gregory J. Park
Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.
社交媒体网站现在是互联网用户最受欢迎的目的地,为社会科学家提供了一个了解在线行为的绝佳机会。与社交媒体相关的研究论文越来越多,其中一小部分专注于人格预测。迄今为止,研究主要集中在人格的五大特征上,但一个相对未被探索的领域是自恋、马基雅维利主义者和精神病等反社会特征,通常被称为黑暗三合一。这项研究探讨了在多大程度上可以根据Twitter的使用情况来确定反社会人格特征。研究人员将2927名推特用户的黑暗人格特质和大五人格特质与他们的个人资料属性和语言使用进行了比较。分析表明,这些变量之间存在一些统计上显著的关系。通过使用众包机器学习算法,我们表明机器学习提供了有用的预测率,但在从Twitter活动预测个人的黑暗三合一特征方面并不完美。虽然预测模型可能不适合预测个人的性格,但当模型应用于大群体时,它们可能仍然具有实际重要性,例如获得观察反社会特征在人群中是增加还是减少的能力。我们的研究结果提出了一些重要的问题,这些问题与不受监管地使用社交媒体分析进行筛选有关。重要的是,对社交媒体网站中嵌入的个人信息得出结论的实践和伦理意义得到更好的理解。
{"title":"Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets","authors":"Chris Sumner, A. Byers, Rachel Boochever, Gregory J. Park","doi":"10.1109/ICMLA.2012.218","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.218","url":null,"abstract":"Social media sites are now the most popular destination for Internet users, providing social scientists with a great opportunity to understand online behaviour. There are a growing number of research papers related to social media, a small number of which focus on personality prediction. To date, studies have typically focused on the Big Five traits of personality, but one area which is relatively unexplored is that of the anti-social traits of narcissism, Machiavellians and psychopathy, commonly referred to as the Dark Triad. This study explored the extent to which it is possible to determine anti-social personality traits based on Twitter use. This was performed by comparing the Dark Triad and Big Five personality traits of 2,927 Twitter users with their profile attributes and use of language. Analysis shows that there are some statistically significant relationships between these variables. Through the use of crowd sourced machine learning algorithms, we show that machine learning provides useful prediction rates, but is imperfect in predicting an individual's Dark Triad traits from Twitter activity. While predictive models may be unsuitable for predicting an individual's personality, they may still be of practical importance when models are applied to large groups of people, such as gaining the ability to see whether anti-social traits are increasing or decreasing over a population. Our results raise important questions related to the unregulated use of social media analysis for screening purposes. It is important that the practical and ethical implications of drawing conclusions about personal information embedded in social media sites are better understood.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126209517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 266
Measuring the Spatial Error in Load Forecasting for Electrical Distribution Planning as a Problem of Transporting the Surplus to the In-Deficit Locations 基于余量输缺问题的配电规划负荷预测空间误差测量
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.203
D. Vieira, M. A. M. Cabral, T. V. Menezes, B. E. Silva, A. C. Lisboa
While there are many functions defined in the literature to measure the error magnitude (how much), the problem of dinning the spatial error (where) is not so well defined. For instance, in a given region it is expected a global growth in the electrical demand of 10MW. For the electrical system planning not only the amount but also the location must be considered. Predicting a growth of 10MW (how much) in the south (where) of a city would lead to complete different polices in terms of resources allocation (for instance a new substation) than predicting the same amount of 10MW in the north. Trying to cope with this difficulty, this paper proposes the concept of spatial error as the cost of transporting the surplus of one region to compensate another region deceit. This conceptual problem was written as an optimization transportation problem. This paper describes conceptually the difference between magnitude and spatial error measures and shows an algorithm to deal efficiently with the defined framework.
虽然在文献中定义了许多函数来测量误差大小(多少),但没有很好地定义空间误差(在哪里)的问题。例如,在某一特定地区,预计全球电力需求将增长10MW。在电力系统规划中,不仅要考虑电量,而且要考虑位置。预测一个城市的南部(在哪里)增长10MW(多少)会导致资源分配方面的完全不同的政策(例如,一个新的变电站),而在北部预测同样数量的10MW。为了解决这一问题,本文提出了空间误差的概念,将空间误差作为一个区域的剩余量运输到另一个区域的补偿成本。这个概念问题被写成一个优化运输问题。本文从概念上描述了幅度和空间误差度量之间的区别,并给出了一种有效处理定义框架的算法。
{"title":"Measuring the Spatial Error in Load Forecasting for Electrical Distribution Planning as a Problem of Transporting the Surplus to the In-Deficit Locations","authors":"D. Vieira, M. A. M. Cabral, T. V. Menezes, B. E. Silva, A. C. Lisboa","doi":"10.1109/ICMLA.2012.203","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.203","url":null,"abstract":"While there are many functions defined in the literature to measure the error magnitude (how much), the problem of dinning the spatial error (where) is not so well defined. For instance, in a given region it is expected a global growth in the electrical demand of 10MW. For the electrical system planning not only the amount but also the location must be considered. Predicting a growth of 10MW (how much) in the south (where) of a city would lead to complete different polices in terms of resources allocation (for instance a new substation) than predicting the same amount of 10MW in the north. Trying to cope with this difficulty, this paper proposes the concept of spatial error as the cost of transporting the surplus of one region to compensate another region deceit. This conceptual problem was written as an optimization transportation problem. This paper describes conceptually the difference between magnitude and spatial error measures and shows an algorithm to deal efficiently with the defined framework.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129667613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Treeboost Model for Software Effort Estimation Based on Use Case Points 基于用例点的软件工作量评估Treeboost模型
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.155
A. B. Nassif, Luiz Fernando Capretz, D. Ho, Mohammad Azzeh
Software effort prediction is an important task in the software development life cycle. Many models including regression models, machine learning models, algorithmic models, expert judgment and estimation by analogy have been widely used to estimate software effort and cost. In this work, a Tree boost (Stochastic Gradient Boosting) model is put forward to predict software effort based on the Use Case Point method. The inputs of the model include software size in use case points, productivity and complexity. A multiple linear regression model was created and the Tree boost model was evaluated against the multiple linear regression model, as well as the use case point model by using four performance criteria: MMRE, PRED, MdMRE and MSE. Experiments show that the Tree boost model can be used with promising results to estimate software effort.
软件工作量预测是软件开发生命周期中的一项重要任务。回归模型、机器学习模型、算法模型、专家判断和类比估计等模型已被广泛用于估算软件的工作量和成本。在这项工作中,提出了一个基于用例点方法预测软件工作量的树增强(随机梯度增强)模型。模型的输入包括用例点中的软件大小、生产力和复杂性。创建了一个多元线性回归模型,并根据多元线性回归模型和用例点模型,通过使用四个性能标准:MMRE、PRED、MdMRE和MSE,对Tree boost模型进行评估。实验表明,Tree boost模型可以用于估算软件工作量,并取得了令人满意的结果。
{"title":"A Treeboost Model for Software Effort Estimation Based on Use Case Points","authors":"A. B. Nassif, Luiz Fernando Capretz, D. Ho, Mohammad Azzeh","doi":"10.1109/ICMLA.2012.155","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.155","url":null,"abstract":"Software effort prediction is an important task in the software development life cycle. Many models including regression models, machine learning models, algorithmic models, expert judgment and estimation by analogy have been widely used to estimate software effort and cost. In this work, a Tree boost (Stochastic Gradient Boosting) model is put forward to predict software effort based on the Use Case Point method. The inputs of the model include software size in use case points, productivity and complexity. A multiple linear regression model was created and the Tree boost model was evaluated against the multiple linear regression model, as well as the use case point model by using four performance criteria: MMRE, PRED, MdMRE and MSE. Experiments show that the Tree boost model can be used with promising results to estimate software effort.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130568040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Web Spam: A Study of the Page Language Effect on the Spam Detection Features 网页垃圾邮件:网页语言对垃圾邮件检测特征的影响研究
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.229
A. Alarifi, Mansour Alsaleh
Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.
尽管搜索引擎已经部署了各种技术来检测和过滤Web垃圾邮件,但Web结结巴巴者继续开发新的策略来影响搜索引擎排名算法的结果,以获得不应得的高排名。本文研究了页面语言对垃圾邮件检测特性的影响。我们研究了一组选定的检测特征的分布如何根据页面语言变化。此外,我们使用一组选定的检测特征研究了页面语言对给定分类器检测率的影响。分析结果表明,为分离垃圾邮件页面的分类器选择合适的特征在很大程度上取决于所检查的Web页面的语言,部分原因是每种类型的口吃者使用不同的Web垃圾邮件机制集。
{"title":"Web Spam: A Study of the Page Language Effect on the Spam Detection Features","authors":"A. Alarifi, Mansour Alsaleh","doi":"10.1109/ICMLA.2012.229","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.229","url":null,"abstract":"Although search engines have deployed various techniques to detect and filter out Web spam, Web stammers continue to develop new tactics to influence the result of search engines ranking algorithms, for the purpose of obtaining an undeservedly high ranks. In this paper, we study the effect of the page language on the spam detection features. We examine how the distribution of a set of selected detection features changes according to the page language. Also, we study the effect of the page language on the detection rate of a given classifier using a selected set of detection features. The analysis results show that selecting suitable features for a classifier that segregates spam pages depends heavily on the language of the examined Web page, due in part to the different set of Web spam mechanisms used by each type of stammers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130878514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Risk Estimation in Spatial Disease Clusters: An RBF Network Approach 基于RBF网络的空间疾病集群风险评估
Pub Date : 2012-12-12 DOI: 10.1109/ICMLA.2012.233
Fernanda C. Takahashi, Ricardo H. C. Takahashi
This paper proposes a method which is suitable for the estimation of the probability of occurrence of a syndrome, as a function of the geographical coordinates of the individuals under risk. The data describing the location of syndrome cases over the population suffers a moving-average filtering, and the resulting values are fitted by an RBF network performing a regression. Some contour curves of the RBF network are then employed in order to establish the boundaries between four kinds of regions: regions of high-incidence, regions of medium incidence, regions of slightly-abnormal incidence, and regions of normal prevalence. In each region, the risk is estimated with three indicators: a nominal risk, an upper bound risk and a lower bound risk. Those indicators are obtained by adjusting the probability employed for the Monte Carlo simulation of syndrome scenarios over the population. The nominal risk is the probability which produces Monte Carlo simulations for which the empirical number of syndrome cases corresponds to the median. The upper bound and the lower bound risks are the probabilities which produce Monte Carlo simulations for which the empirical values of syndrome cases correspond respectively to the 25% percentile and the 75% percentile. The proposed method constitutes an advance in relation to the currently known techniques of spatial cluster detection, which are dedicated to finding clusters of abnormal occurrence of a syndrome, without quantifying the probability associated to such an abnormality, and without performing a stratification of different sub-regions with different associated risks. The proposed method was applied on data which were studied formerly in a paper that was intended to find a cluster of dengue fever. The result determined here is compatible with the cluster that was found in that reference.
本文提出了一种适用于估计某一综合征发生概率的方法,该方法是危险个体地理坐标的函数。描述综合征病例在人群中的位置的数据经过移动平均滤波,结果值由执行回归的RBF网络拟合。然后利用RBF网络的一些轮廓曲线来建立四种区域之间的边界:高发病率区域、中等发病率区域、轻微异常发病率区域和正常患病率区域。在每个地区,用三个指标来估计风险:名义风险、上限风险和下限风险。这些指标是通过调整总体上综合症情景的蒙特卡罗模拟所采用的概率而获得的。名义风险是产生蒙特卡罗模拟的概率,其中综合症病例的经验数对应于中位数。上界和下界风险是产生蒙特卡罗模拟的概率,其中综合症病例的经验值分别对应于25%百分位和75%百分位。与目前已知的空间聚类检测技术相比,所提出的方法是一种进步,这些技术致力于发现综合征异常发生的聚类,而没有量化与这种异常相关的概率,也没有对具有不同相关风险的不同子区域进行分层。所提出的方法应用于以前在一篇旨在找到登革热群集的论文中研究的数据。这里确定的结果与在该引用中找到的集群兼容。
{"title":"Risk Estimation in Spatial Disease Clusters: An RBF Network Approach","authors":"Fernanda C. Takahashi, Ricardo H. C. Takahashi","doi":"10.1109/ICMLA.2012.233","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.233","url":null,"abstract":"This paper proposes a method which is suitable for the estimation of the probability of occurrence of a syndrome, as a function of the geographical coordinates of the individuals under risk. The data describing the location of syndrome cases over the population suffers a moving-average filtering, and the resulting values are fitted by an RBF network performing a regression. Some contour curves of the RBF network are then employed in order to establish the boundaries between four kinds of regions: regions of high-incidence, regions of medium incidence, regions of slightly-abnormal incidence, and regions of normal prevalence. In each region, the risk is estimated with three indicators: a nominal risk, an upper bound risk and a lower bound risk. Those indicators are obtained by adjusting the probability employed for the Monte Carlo simulation of syndrome scenarios over the population. The nominal risk is the probability which produces Monte Carlo simulations for which the empirical number of syndrome cases corresponds to the median. The upper bound and the lower bound risks are the probabilities which produce Monte Carlo simulations for which the empirical values of syndrome cases correspond respectively to the 25% percentile and the 75% percentile. The proposed method constitutes an advance in relation to the currently known techniques of spatial cluster detection, which are dedicated to finding clusters of abnormal occurrence of a syndrome, without quantifying the probability associated to such an abnormality, and without performing a stratification of different sub-regions with different associated risks. The proposed method was applied on data which were studied formerly in a paper that was intended to find a cluster of dengue fever. The result determined here is compatible with the cluster that was found in that reference.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122027312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2012 11th International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1