首页 > 最新文献

Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications最新文献

英文 中文
An Approximative Bayes-Optimal Kernel Classifier Based on Version Space Reduction 基于版本空间约简的近似贝叶斯最优核分类器
Karen Braga Enes, Saulo Moraes Villela, G. Pappa, R. F. Neto
The Bayes-optimal classifier is defined as a classifier that induces an hypothesis able to minimize the prediction error for any given sample in binary classification problems. Finding the Bayes-optimal classifier is an intractable problem. It is known that it is approximately equivalent to the center of mass of the version space, which is given by the set of all classifiers consistent with the training set. Previously solutions to find the center of mass are not feasible, as they present a high computational cost, and do not work properly in non-linear separable problems. Aiming to solve these problems, this paper presents the Dual Version Space Reduction Machine (Dual VSRM), an effective kernel method to approximate the center of mass of the version space. The Dual VSRM algorithm employs successive reductions of the version space based on an oracle's decision. As an oracle, we propose the Ensemble of Dissimilar Balanced Kernel Perceptrons (EBPK). EBPK enhances the accuracy of each individual classifier by balancing the final hyperplane solution while maximizing the diversity of its components by applying a dissimilarity measure. In order to evaluate the proposed methods, we conduct an experimental evaluation on 7 datasets. We compare the performance of our proposed methods against several baselines. Our results for EBKP indicate the strategies for improving individual accuracy and diversity of the ensemble components work properly. Also, the Dual VSRM consistently outperforms the baselines, indicating that the proposed method generates a better approximation to the center of mass.
贝叶斯最优分类器被定义为一种分类器,它可以诱导一个假设,使任何给定样本在二值分类问题中的预测误差最小化。寻找贝叶斯最优分类器是一个棘手的问题。已知它近似等价于版本空间的质心,由与训练集一致的所有分类器的集合给出。以前的求解质心的方法是不可行的,因为它们的计算成本很高,而且在非线性可分离问题中不能正常工作。针对这些问题,本文提出了一种有效的逼近版本空间质心的核方法——双版本空间约简机(Dual VSRM)。Dual VSRM算法基于oracle的决策对版本空间进行连续缩减。作为一种预测,我们提出了不相似平衡核感知器集成(EBPK)。EBPK通过平衡最终的超平面解决方案来提高每个分类器的准确性,同时通过应用不相似性度量来最大化其组件的多样性。为了评估所提出的方法,我们在7个数据集上进行了实验评估。我们将我们提出的方法的性能与几个基线进行比较。我们对EBKP的结果表明,提高集成组件的个体精度和多样性的策略是有效的。此外,Dual VSRM始终优于基线,表明所提出的方法可以更好地逼近质心。
{"title":"An Approximative Bayes-Optimal Kernel Classifier Based on Version Space Reduction","authors":"Karen Braga Enes, Saulo Moraes Villela, G. Pappa, R. F. Neto","doi":"10.1109/ICMLA.2018.00071","DOIUrl":"https://doi.org/10.1109/ICMLA.2018.00071","url":null,"abstract":"The Bayes-optimal classifier is defined as a classifier that induces an hypothesis able to minimize the prediction error for any given sample in binary classification problems. Finding the Bayes-optimal classifier is an intractable problem. It is known that it is approximately equivalent to the center of mass of the version space, which is given by the set of all classifiers consistent with the training set. Previously solutions to find the center of mass are not feasible, as they present a high computational cost, and do not work properly in non-linear separable problems. Aiming to solve these problems, this paper presents the Dual Version Space Reduction Machine (Dual VSRM), an effective kernel method to approximate the center of mass of the version space. The Dual VSRM algorithm employs successive reductions of the version space based on an oracle's decision. As an oracle, we propose the Ensemble of Dissimilar Balanced Kernel Perceptrons (EBPK). EBPK enhances the accuracy of each individual classifier by balancing the final hyperplane solution while maximizing the diversity of its components by applying a dissimilarity measure. In order to evaluate the proposed methods, we conduct an experimental evaluation on 7 datasets. We compare the performance of our proposed methods against several baselines. Our results for EBKP indicate the strategies for improving individual accuracy and diversity of the ensemble components work properly. Also, the Dual VSRM consistently outperforms the baselines, indicating that the proposed method generates a better approximation to the center of mass.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"23 1","pages":"436-442"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84616641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Memory-Enhanced Framework for Financial Fraud Detection 金融欺诈检测的记忆增强框架
Kunlin Yang
{"title":"A Memory-Enhanced Framework for Financial Fraud Detection","authors":"Kunlin Yang","doi":"10.1109/ICMLA.2018.00140","DOIUrl":"https://doi.org/10.1109/ICMLA.2018.00140","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 1","pages":"871-874"},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78501993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AUTOMATIC SCORING OF A NONWORD REPETITION TEST. 非单词重复测试的自动评分。
Meysam Asgari, Jan Van Santen, Katina Papadakis

In this study, we explore the feasibility of speech-based techniques to automatically evaluate a nonword repetition (NWR) test. NWR tests, a useful marker for detecting language impairment, require repetition of pronounceable nonwords, such as "D OY F", presented aurally by an examiner or via a recording. Our proposed method leverages ASR techniques to first transcribe verbal responses. Second, it applies machine learning techniques to ASR output for predicting gold standard scores provided by speech and language pathologists. Our experimental results for a sample of 101 children (42 with autism spectrum disorders, or ASD; 18 with specific language impairment, or SLI; and 41 typically developed, or TD) show that the proposed approach is successful in predicting scores on this test, with averaged product-moment correlations of 0.74 and mean absolute error of 0.06 (on a observed score range from 0.34 to 0.97) between observed and predicted ratings.

在这项研究中,我们探索了基于语音的技术来自动评估非单词重复(NWR)测试的可行性。NWR测试是一种检测语言障碍的有用标记,它要求重复可发音的非单词,如“D OY F”,由考官口头或通过录音呈现。我们提出的方法利用ASR技术首先转录口头反应。其次,它将机器学习技术应用于ASR输出,以预测语音和语言病理学家提供的黄金标准分数。我们对101名儿童样本的实验结果(42名患有自闭症谱系障碍,或ASD;18人患有特殊语言障碍(SLI);和41个典型开发,或TD)表明,所提出的方法在预测该测试的分数方面是成功的,平均积矩相关性为0.74,平均绝对误差为0.06(在观察到的分数范围为0.34至0.97)。
{"title":"AUTOMATIC SCORING OF A NONWORD REPETITION TEST.","authors":"Meysam Asgari,&nbsp;Jan Van Santen,&nbsp;Katina Papadakis","doi":"10.1109/icmla.2017.0-143","DOIUrl":"https://doi.org/10.1109/icmla.2017.0-143","url":null,"abstract":"<p><p>In this study, we explore the feasibility of speech-based techniques to automatically evaluate a nonword repetition (NWR) test. NWR tests, a useful marker for detecting language impairment, require repetition of pronounceable nonwords, such as \"D OY F\", presented aurally by an examiner or via a recording. Our proposed method leverages ASR techniques to first transcribe verbal responses. Second, it applies machine learning techniques to ASR output for predicting gold standard scores provided by speech and language pathologists. Our experimental results for a sample of 101 children (42 with autism spectrum disorders, or ASD; 18 with specific language impairment, or SLI; and 41 typically developed, or TD) show that the proposed approach is successful in predicting scores on this test, with averaged product-moment correlations of 0.74 and mean absolute error of 0.06 (on a observed score range from 0.34 to 0.97) between observed and predicted ratings.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2017 ","pages":"304-308"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/icmla.2017.0-143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38633141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling 用局部预测模型分析慢性肾脏疾病的1H NMR代谢谱
M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon
Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.
代谢谱是研究由亚种群内的生物学差异引起的生物体中代谢物浓度变化,必须处理非常大量的复杂数据。因此,它需要使用强大的数据处理和机器学习方法。为了克服过度拟合,代谢谱中一个常见的问题,特征的数量往往比观测的数量大得多,许多预测分析结合了降维技术和多变量预测线性建模。此外,他们建立了一个全球模型,该模型可以识别生物标志物,并根据它们的总体趋势变化预测感兴趣的输出。然而,这并没有捕捉到隐藏在被试亚群之下的局部生物现象。最近,基于决策树方法的局部探索方法已应用于代谢组学,但它们只探索特征空间的随机部分。在这项研究中,我们使用了一种监督规则挖掘算法,该算法局部和详尽地探索特征空间,以质子核磁共振(1H NMR)数据为基础预测慢性肾脏疾病(CDK)的分期。从此探索获得的判别子区域中,我们提取了局部特征并学习了l2正则化逻辑回归(L2LR)分类器。我们将得到的局部预测模型与标准模型进行了比较,将经典的单变量监督特征选择技术与L2LR相结合,以及混合了全局和局部特征的模型。结果表明,局部预测模型的预测性能优于混合模型和全局模型。此外,它提供了关键的见解,具体到生物变异的亚组科目。
{"title":"Metabolic Profiling of 1H NMR Spectra in Chronic Kidney Disease with Local Predictive Modeling","authors":"M. Luck, A. Yartseva, G. Bertho, E. Thervet, P. Beaune, N. Pallet, C. Damon","doi":"10.1109/ICMLA.2015.155","DOIUrl":"https://doi.org/10.1109/ICMLA.2015.155","url":null,"abstract":"Metabolic profiling, the study of changes in the concentration of the metabolites in the organism induced by biological differences within subpopulations, has to deal with a very large amount of complex data. It therefore requires the use of powerful data processing and machine learning methods. To overcome over-fitting, a common concern in metabolic profiling where the number of features is often much larger than the number of observations, many predictive analyses combined dimension reduction techniques with multivariate predictive linear modeling. Moreover, they built a global model that identifies biomarkers predictive of the output of interest giving their overall trend variations. However, this fails to capture local biological phenomena underlying subgroups of subjects. More recently, local exploration methods based on decision trees approaches have been applied in metabolomics but they only explore random parts of the feature space. In this study, we used a supervised rule-mining algorithm that locally and exhaustively explores the feature space to predict chronic kidney disease (CDK) stages based on proton Nuclear Magnetic Resonance (1H NMR) data. From the discriminant subregions obtained with this exploration, we extracted local features and learned a L2-regularized Logistic regression (L2LR) classifier. We compared the resulting local predictive model with a standard one, combining classical univariate supervised feature selection techniques with a L2LR, and a model mixing both global and local features. Results show that the local predictive model is more powerful in terms of predictive performance than the mixed and global models. Additionally, it gives key insights into biological variations specific to subgroups of subjects.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 1","pages":"176-181"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87707545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ExpertBayes: Automatically refining manually built Bayesian networks. ExpertBayes:自动精炼人工构建的贝叶斯网络。
Ezilda Almeida, Pedro Ferreira, Tiago Vinhoza, Inês Dutra, Jingwei Li, Yirong Wu, Elizabeth Burnside

Bayesian network structures are usually built using only the data and starting from an empty network or from a naïve Bayes structure. Very often, in some domains, like medicine, a prior structure knowledge is already known. This structure can be automatically or manually refined in search for better performance models. In this work, we take Bayesian networks built by specialists and show that minor perturbations to this original network can yield better classifiers with a very small computational cost, while maintaining most of the intended meaning of the original model.

贝叶斯网络结构通常只使用数据,从空网络或naïve贝叶斯结构开始构建。很多时候,在一些领域,比如医学,先验结构知识是已知的。可以自动或手动地对该结构进行细化,以寻找更好的性能模型。在这项工作中,我们采用由专家构建的贝叶斯网络,并表明对该原始网络的微小扰动可以以非常小的计算成本产生更好的分类器,同时保持原始模型的大部分预期意义。
{"title":"ExpertBayes: Automatically refining manually built Bayesian networks.","authors":"Ezilda Almeida,&nbsp;Pedro Ferreira,&nbsp;Tiago Vinhoza,&nbsp;Inês Dutra,&nbsp;Jingwei Li,&nbsp;Yirong Wu,&nbsp;Elizabeth Burnside","doi":"10.1109/ICMLA.2014.64","DOIUrl":"https://doi.org/10.1109/ICMLA.2014.64","url":null,"abstract":"<p><p>Bayesian network structures are usually built using only the data and starting from an empty network or from a naïve Bayes structure. Very often, in some domains, like medicine, a prior structure knowledge is already known. This structure can be automatically or manually refined in search for better performance models. In this work, we take Bayesian networks built by specialists and show that minor perturbations to this original network can yield better classifiers with a very small computational cost, while maintaining most of the intended meaning of the original model.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2014 ","pages":"362-366"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICMLA.2014.64","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34755438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine. 基于短语的生物医学语义信息处理主题建模。
Zhiguo Yu, Todd R Johnson, Ramakanth Kavuluru

Given that unstructured data is increasing exponentially everyday, extracting and understanding the information, themes, and relationships from large collections of documents is increasingly important to researchers in many disciplines including biomedicine. Latent Dirichlet Allocation (LDA) is an unsupervised topic modeling technique based on the "bag-of-words" assumption that has been applied extensively to unveil hidden semantic themes within large sets of textual documents. Recently, it was extended using the "bag-of-n-grams" paradigm to account for word order. In this paper, we present an alternative phrase based LDA model to move from a bag of words or n-grams paradigm to a "bag-of-key-phrases" setting by applying a key phrase extraction technique, the C-value method, to further explore latent themes. We evaluate our approach by using a phrase intrusion user study and demonstrate that our model can help LDA generate better and more interpretable topics than those generated using the bag-of-n-grams approach. Given topic models essentially are statistical tools, an important problem in topic modeling is that of visualizing and interacting with the models to understand and extract new information from a collection. To evaluate our phrase based modeling approach in this context, we incorporate it in an open source interactive topic browser. Qualitative evaluations of this browser with biomedical experts demonstrate that our approach can aid biomedical researchers gain better and faster understanding of their document collections.

鉴于非结构化数据每天都呈指数级增长,从大量文档中提取和理解信息、主题和关系对包括生物医学在内的许多学科的研究人员来说变得越来越重要。潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)是一种基于“词袋”假设的无监督主题建模技术,已被广泛应用于揭示大量文本文档中隐藏的语义主题。最近,它被扩展为使用“n-grams”范式来解释词序。在本文中,我们提出了一种基于短语的替代LDA模型,通过应用关键短语提取技术(c值方法)进一步探索潜在主题,从单词袋或n-grams范式转变为“关键短语袋”设置。我们通过使用短语入侵用户研究来评估我们的方法,并证明我们的模型可以帮助LDA生成比使用n-grams方法生成的更好、更可解释的主题。鉴于主题模型本质上是统计工具,主题建模中的一个重要问题是可视化模型并与模型交互,以便从集合中理解和提取新信息。为了在这种情况下评估基于短语的建模方法,我们将其合并到一个开源交互式主题浏览器中。生物医学专家对该浏览器的定性评估表明,我们的方法可以帮助生物医学研究人员更好、更快地理解他们的文档集合。
{"title":"Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine.","authors":"Zhiguo Yu,&nbsp;Todd R Johnson,&nbsp;Ramakanth Kavuluru","doi":"10.1109/ICMLA.2013.89","DOIUrl":"https://doi.org/10.1109/ICMLA.2013.89","url":null,"abstract":"<p><p>Given that unstructured data is increasing exponentially everyday, extracting and understanding the information, themes, and relationships from large collections of documents is increasingly important to researchers in many disciplines including biomedicine. Latent Dirichlet Allocation (LDA) is an unsupervised topic modeling technique based on the \"bag-of-words\" assumption that has been applied extensively to unveil hidden semantic themes within large sets of textual documents. Recently, it was extended using the \"bag-of-n-grams\" paradigm to account for word order. In this paper, we present an alternative phrase based LDA model to move from a bag of words or n-grams paradigm to a \"bag-of-key-phrases\" setting by applying a key phrase extraction technique, the C-value method, to further explore latent themes. We evaluate our approach by using a phrase intrusion user study and demonstrate that our model can help LDA generate better and more interpretable topics than those generated using the bag-of-n-grams approach. Given topic models essentially are statistical tools, an important problem in topic modeling is that of visualizing and interacting with the models to understand and extract new information from a collection. To evaluate our phrase based modeling approach in this context, we incorporate it in an open source interactive topic browser. Qualitative evaluations of this browser with biomedical experts demonstrate that our approach can aid biomedical researchers gain better and faster understanding of their document collections.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2013 ","pages":"440-445"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICMLA.2013.89","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35192144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Face Recognition Challenge: Object Recognition Approaches for Human/Avatar Classification 人脸识别挑战:人类/化身分类的对象识别方法
T. Yamasaki, Tsuhan Chen
Recently, a novel "completely automated public Turing test to tell computers and humans apart (CAPTCHA)'' system has been proposed, in which users are asked to separate natural faces of humans and artificial faces of virtual world avatars. The system is based on the assumption that computers cannot separate them while it is an easy task for humans. Conventional digital forensics approaches to distinguish natural images from computer graphics images are mostly based on statistical analysis of the images such as noise in CMOS image sensors or Bayer matrix estimation. On the other hand, this paper uses face recognition and object classification based approaches. The experiments show that our approaches work surprisingly well and yields more than 99% accuracy. Our object classification based approach can also tell us how likely the input images are regarded as human/avatar faces.
最近,有人提出了一种新颖的“完全自动化的公共图灵测试来区分计算机和人类(CAPTCHA)”系统,该系统要求用户区分人类的自然面孔和虚拟世界avatar的人造面孔。该系统是基于这样的假设:计算机无法将它们分开,而这对人类来说是一件很容易的事情。传统的数字取证方法将自然图像与计算机图形图像区分开来,主要是基于对图像的统计分析,如CMOS图像传感器中的噪声或拜耳矩阵估计。另一方面,本文采用了基于人脸识别和目标分类的方法。实验表明,我们的方法非常有效,准确率超过99%。我们基于对象分类的方法还可以告诉我们输入图像被视为人类/化身面孔的可能性有多大。
{"title":"Face Recognition Challenge: Object Recognition Approaches for Human/Avatar Classification","authors":"T. Yamasaki, Tsuhan Chen","doi":"10.1109/ICMLA.2012.188","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.188","url":null,"abstract":"Recently, a novel \"completely automated public Turing test to tell computers and humans apart (CAPTCHA)'' system has been proposed, in which users are asked to separate natural faces of humans and artificial faces of virtual world avatars. The system is based on the assumption that computers cannot separate them while it is an easy task for humans. Conventional digital forensics approaches to distinguish natural images from computer graphics images are mostly based on statistical analysis of the images such as noise in CMOS image sensors or Bayer matrix estimation. On the other hand, this paper uses face recognition and object classification based approaches. The experiments show that our approaches work surprisingly well and yields more than 99% accuracy. Our object classification based approach can also tell us how likely the input images are regarded as human/avatar faces.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"18 1","pages":"574-579"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75403149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Feature Extraction and Selection in Ground Penetrating Radar with Experimental Data Set of Inclusions in Concrete Blocks 基于混凝土块体夹杂物实验数据集的探地雷达特征提取与选择
F. Queiroz, D. Vieira, X. L. Travassos, M. F. Pantoja
Ground Penetrating Radar systems have been successfully used to access concrete structures conditions. Moreover, inclusions in concrete can be discriminated by simple models based on traces obtained by GPR. In this work, concrete blocks with different inclusions were probed in controlled conditions. Some features were extracted from Ascans of this experimental data set. To get efficient models, raw data were submitted to features selection and space reduction methods. Without complex data pre-processing, good accuracy and more explainable models with less computational burden were obtained.
探地雷达系统已成功地用于获取混凝土结构状况。此外,根据探地雷达探测到的迹线,可以用简单的模型对混凝土中的夹杂物进行判别。在这项工作中,在控制条件下对不同夹杂物的混凝土块进行了探测。从该实验数据集的Ascans中提取了一些特征。为了得到有效的模型,将原始数据提交到特征选择和空间约简方法中。无需复杂的数据预处理,可获得精度高、可解释性强、计算量少的模型。
{"title":"Feature Extraction and Selection in Ground Penetrating Radar with Experimental Data Set of Inclusions in Concrete Blocks","authors":"F. Queiroz, D. Vieira, X. L. Travassos, M. F. Pantoja","doi":"10.1109/ICMLA.2012.139","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.139","url":null,"abstract":"Ground Penetrating Radar systems have been successfully used to access concrete structures conditions. Moreover, inclusions in concrete can be discriminated by simple models based on traces obtained by GPR. In this work, concrete blocks with different inclusions were probed in controlled conditions. Some features were extracted from Ascans of this experimental data set. To get efficient models, raw data were submitted to features selection and space reduction methods. Without complex data pre-processing, good accuracy and more explainable models with less computational burden were obtained.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"119 1","pages":"48-53"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77316795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Combining Gene Expression Profiles and Protein-Protein Interactions for Identifying Functional Modules 结合基因表达谱和蛋白质相互作用鉴定功能模块
Dingding Wang, M. Ogihara, Erliang Zeng, Tao Li
Identifying functional modules from protein-protein interaction networks is an important and challenging task. This paper presents a new approach called PPIBM which is designed to integrate gene expression data analysis and clustering of protein-protein interactions. The proposed approach relies on a Bayesian model which uses as its base protein-protein interactions given as part of input. The proposed method is evaluated with standard measures and its performance is compared with the state-of-the-art network analysis methods. Experimental results on both real-world data and synthetic data demonstrate the effectiveness of the proposed approach.
从蛋白质相互作用网络中识别功能模块是一项重要而具有挑战性的任务。本文提出了一种名为PPIBM的新方法,该方法旨在整合基因表达数据分析和蛋白质-蛋白质相互作用的聚类。所提出的方法依赖于贝叶斯模型,该模型使用作为输入部分的蛋白质-蛋白质相互作用作为其基础。用标准度量对该方法进行了评价,并将其性能与目前最先进的网络分析方法进行了比较。在实际数据和合成数据上的实验结果表明了该方法的有效性。
{"title":"Combining Gene Expression Profiles and Protein-Protein Interactions for Identifying Functional Modules","authors":"Dingding Wang, M. Ogihara, Erliang Zeng, Tao Li","doi":"10.1109/ICMLA.2012.28","DOIUrl":"https://doi.org/10.1109/ICMLA.2012.28","url":null,"abstract":"Identifying functional modules from protein-protein interaction networks is an important and challenging task. This paper presents a new approach called PPIBM which is designed to integrate gene expression data analysis and clustering of protein-protein interactions. The proposed approach relies on a Bayesian model which uses as its base protein-protein interactions given as part of input. The proposed method is evaluated with standard measures and its performance is compared with the state-of-the-art network analysis methods. Experimental results on both real-world data and synthetic data demonstrate the effectiveness of the proposed approach.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"144 1","pages":"114-119"},"PeriodicalIF":0.0,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77529891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Clinical report classification using Natural Language Processing and Topic Modeling. 使用自然语言处理和主题建模的临床报告分类。
Efsun Sarioglu, Hyeong-Ah Choi, Kabir Yadav

Large amount of electronic clinical data encompasses important information in free text format. To be able to help guide medical decision-making, text needs to be efficiently processed and coded. In this research, we investigate techniques to improve classification of Emergency Department computed tomography (CT) reports. The proposed system uses Natural Language Processing (NLP) to generate structured output from the reports and then machine learning techniques to code for the presence of clinically important injuries for traumatic orbital fracture victims. Topic modeling of the corpora is also utilized as an alternative representation of the patient reports. Our results show that both NLP and topic modeling improves raw text classification results. Within NLP features, filtering the codes using modifiers produces the best performance. Topic modeling shows mixed results. Topic vectors provide good dimensionality reduction and get comparable classification results as with NLP features. However, binary topic classification fails to improve upon raw text classification.

大量的电子临床数据包含了自由文本格式的重要信息。为了能够帮助指导医疗决策,文本需要进行有效的处理和编码。在这项研究中,我们研究了改进急诊科计算机断层扫描(CT)报告分类的技术。所提出的系统使用自然语言处理(NLP)从报告中生成结构化输出,然后使用机器学习技术对创伤性眼眶骨折患者的临床重要损伤进行编码。语料库的主题建模也被用作患者报告的替代表示。我们的结果表明,NLP和主题建模都提高了原始文本分类结果。在NLP功能中,使用修饰符过滤代码可以产生最佳性能。主题建模显示出好坏参半的结果。主题向量提供了良好的降维,并获得了与NLP特征类似的分类结果。然而,二元主题分类并不能改善原始文本分类。
{"title":"Clinical report classification using Natural Language Processing and Topic Modeling.","authors":"Efsun Sarioglu,&nbsp;Hyeong-Ah Choi,&nbsp;Kabir Yadav","doi":"10.1109/icmla.2012.173","DOIUrl":"https://doi.org/10.1109/icmla.2012.173","url":null,"abstract":"<p><p>Large amount of electronic clinical data encompasses important information in free text format. To be able to help guide medical decision-making, text needs to be efficiently processed and coded. In this research, we investigate techniques to improve classification of Emergency Department computed tomography (CT) reports. The proposed system uses Natural Language Processing (NLP) to generate structured output from the reports and then machine learning techniques to code for the presence of clinically important injuries for traumatic orbital fracture victims. Topic modeling of the corpora is also utilized as an alternative representation of the patient reports. Our results show that both NLP and topic modeling improves raw text classification results. Within NLP features, filtering the codes using modifiers produces the best performance. Topic modeling shows mixed results. Topic vectors provide good dimensionality reduction and get comparable classification results as with NLP features. However, binary topic classification fails to improve upon raw text classification.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2012 ","pages":"204-209"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/icmla.2012.173","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41175135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1