Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications最新文献
The identification of relationships between clinical concepts in patient records is a preliminary step for many important applications in medical informatics, ranging from quality of care to hypothesis generation. In this work we describe an approach that facilitates the automatic recognition of relationships defined between two different concepts in text. Unlike the traditional bag-of-words representation, in this work, a relationship is represented with a scheme of five distinct context-blocks based on the position of concepts in the text. This scheme was applied to eight different relationships, between medical problems, treatments and tests, on a set of 349 patient records from the 4th i2b2 challenge. Results show that the context-block representation was very successful (F-Measure = 0.775) compared to the bag-of-words model (F-Measure = 0.402). The advantage of this representation scheme was the correct management of word position information, which may be critical in identifying certain relationships.
{"title":"A textual representation scheme for identifying clinical relationships in patient records.","authors":"Rezarta Islamaj Doğan, Aurélie Névéol, Zhiyong Lu","doi":"10.1109/ICMLA.2010.164","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.164","url":null,"abstract":"<p><p>The identification of relationships between clinical concepts in patient records is a preliminary step for many important applications in medical informatics, ranging from quality of care to hypothesis generation. In this work we describe an approach that facilitates the automatic recognition of relationships defined between two different concepts in text. Unlike the traditional bag-of-words representation, in this work, a relationship is represented with a scheme of five distinct context-blocks based on the position of concepts in the text. This scheme was applied to eight different relationships, between medical problems, treatments and tests, on a set of 349 patient records from the 4th i2b2 challenge. Results show that the context-block representation was very successful (F-Measure = 0.775) compared to the bag-of-words model (F-Measure = 0.402). The advantage of this representation scheme was the correct management of word position information, which may be critical in identifying certain relationships.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"2010 ","pages":"995-998"},"PeriodicalIF":0.0,"publicationDate":"2011-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICMLA.2010.164","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30169275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Arabic Handwriting Recognition Using Concavity Features and Classifier Fusion","authors":"S. Abdelazeem, Maha El Meseery","doi":"10.1109/ICMLA.2011.36","DOIUrl":"https://doi.org/10.1109/ICMLA.2011.36","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"10 1","pages":"200-203"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81612140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developing vigorous mathematical equations and estimating accurate parameters within feasible computational time are two indispensable parts to build reliable system models for representing biological properties of the system and for producing reliable simulation. For a complex biological system with limited observations, one of the daunting tasks is the large number of unknown parameters in the mathematical modeling whose values directly determine the performance of computational modeling. To tackle this problem, we have developed a data-driven global optimization method, nonlinear RANSAC, based on RANdom SAmple Consensus (a.k.a. RANSAC) method for parameter estimation of nonlinear system models. Conventional RANSAC method is sound and simple, but it is oriented for linear system models. We not only adopt the strengths of RANSAC, but also extend the method to nonlinear systems with outstanding performance. As a specific application example, we have targeted understanding phagocyte transmigration which is involved in the fibrosis process for biomedical device implantation. With well-defined mathematical nonlinear equations of the system, nonlinear RANSAC is performed for the parameter estimation. In order to evaluate the general performance of the method, we also applied the method to signalling pathways with ordinary differential equations as a general format.
{"title":"Nonlinear RANSAC Optimization for Parameter Estimation with Applications to Phagocyte Transmigration.","authors":"Mingon Kang, Jean Gao, Liping Tang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Developing vigorous mathematical equations and estimating accurate parameters within feasible computational time are two indispensable parts to build reliable system models for representing biological properties of the system and for producing reliable simulation. For a complex biological system with limited observations, one of the daunting tasks is the large number of unknown parameters in the mathematical modeling whose values directly determine the performance of computational modeling. To tackle this problem, we have developed a data-driven global optimization method, nonlinear RANSAC, based on RANdom SAmple Consensus (a.k.a. RANSAC) method for parameter estimation of nonlinear system models. Conventional RANSAC method is sound and simple, but it is oriented for linear system models. We not only adopt the strengths of RANSAC, but also extend the method to nonlinear systems with outstanding performance. As a specific application example, we have targeted understanding phagocyte transmigration which is involved in the fibrosis process for biomedical device implantation. With well-defined mathematical nonlinear equations of the system, nonlinear RANSAC is performed for the parameter estimation. In order to evaluate the general performance of the method, we also applied the method to signalling pathways with ordinary differential equations as a general format.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"1 ","pages":"501-504"},"PeriodicalIF":0.0,"publicationDate":"2011-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3516195/pdf/nihms336836.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31113556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yajuan Wang, Carolyn Penstein Rosé, Antonio Ferreira, Dennis M McNamara, Robert L Kormos, James F Antaki
The identification of optimal candidates for ventricular assist device (VAD) therapy is of great importance for future widespread application of this life-saving technology. During recent years, numerous traditional statistical models have been developed for this task. In this study, we compared three different supervised machine learning techniques for risk prognosis of patients on VAD: Decision Tree, Support Vector Machine (SVM) and Bayesian Tree-Augmented Network, to facilitate the candidate identification. A predictive (C4.5) decision tree model was ultimately developed based on 6 features identified by SVM with assistance of recursive feature elimination. This model performed better compared to the popular risk score of Lietz et al. with respect to identification of high-risk patients and earlier survival differentiation between high- and low- risk candidates.
{"title":"A Classification Approach for Risk Prognosis of Patients on Mechanical Ventricular Assistance.","authors":"Yajuan Wang, Carolyn Penstein Rosé, Antonio Ferreira, Dennis M McNamara, Robert L Kormos, James F Antaki","doi":"10.1109/ICMLA.2010.50","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.50","url":null,"abstract":"<p><p>The identification of optimal candidates for ventricular assist device (VAD) therapy is of great importance for future widespread application of this life-saving technology. During recent years, numerous traditional statistical models have been developed for this task. In this study, we compared three different supervised machine learning techniques for risk prognosis of patients on VAD: Decision Tree, Support Vector Machine (SVM) and Bayesian Tree-Augmented Network, to facilitate the candidate identification. A predictive (C4.5) decision tree model was ultimately developed based on 6 features identified by SVM with assistance of recursive feature elimination. This model performed better compared to the popular risk score of Lietz et al. with respect to identification of high-risk patients and earlier survival differentiation between high- and low- risk candidates.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":" ","pages":"293-298"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICMLA.2010.50","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30417437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed.
{"title":"The Upper and Lower Bounds of the Prediction Accuracies of Ensemble Methods for Binary Classification.","authors":"Xueyi Wang, Nicholas J Davidson","doi":"10.1109/ICMLA.2010.62","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.62","url":null,"abstract":"<p><p>Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":" ","pages":"373-378"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICMLA.2010.62","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30087047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.
{"title":"Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method.","authors":"Nicholas J Davidson, Xueyi Wang","doi":"10.1109/ICMLA.2010.167","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.167","url":null,"abstract":"<p><p>As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.</p>","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":" ","pages":"546-551"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ICMLA.2010.167","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29883011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a new approach to data selection, a key issue in classification problems. This approach, which is based on a feature selection algorithm and one instance selection algorithm, reduces the original dataset in two dimensions, selecting relevant features and retaining important instances simultaneously. The search processes for the best feature and instance subsets occur separately yet, due to the influence of features in the importance of instances and vice versa, they bias one another. The experiments validate the proposed approach showing that this existing relation between features and instances can be reproduced when constructing data selection algorithms and that it leads to a quality improval comparing to the sequential execution of both algorithms.
{"title":"Empowering Simultaneous Feature and Instance Selection in Classification Problems through the Adaptation of Two Selection Algorithms","authors":"R. D. Carmo, F. Freitas, J. Souza","doi":"10.1109/ICMLA.2010.121","DOIUrl":"https://doi.org/10.1109/ICMLA.2010.121","url":null,"abstract":"This paper proposes a new approach to data selection, a key issue in classification problems. This approach, which is based on a feature selection algorithm and one instance selection algorithm, reduces the original dataset in two dimensions, selecting relevant features and retaining important instances simultaneously. The search processes for the best feature and instance subsets occur separately yet, due to the influence of features in the importance of instances and vice versa, they bias one another. The experiments validate the proposed approach showing that this existing relation between features and instances can be reproduced when constructing data selection algorithms and that it leads to a quality improval comparing to the sequential execution of both algorithms.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"222 1","pages":"793-796"},"PeriodicalIF":0.0,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76730771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boundary Constrained Manifold Unfolding","authors":"Bo Liu, Hongbin Zhang, Wenan Chen","doi":"10.1109/ICMLA.2008.65","DOIUrl":"https://doi.org/10.1109/ICMLA.2008.65","url":null,"abstract":"","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"16 1","pages":"174-181"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82891814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a supervised reinforcement learning (SRL) architecture for robot control problems with high dimensional state spaces. Based on such architecture two new SRL algorithms are proposed. In our algorithms, a behavior model learned from examples is used to dynamically reduce the set of actions available from each state during the early reinforcement learning (RL) process. The creation of such subsets of actions leads the agent to exploit relevant parts of the action space, avoiding the selection of irrelevant actions. Once the agent has exploited the information provided by the behavior model, it keeps improving its value function without any help, by selecting the next actions to be performed from the complete action space. Our experimental work shows clearly how this approach can dramatically speed up the learning process.
{"title":"Supervised Reinforcement Learning Using Behavior Models","authors":"Victor Uc-Cetina","doi":"10.1109/ICMLA.2007.102","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.102","url":null,"abstract":"We introduce a supervised reinforcement learning (SRL) architecture for robot control problems with high dimensional state spaces. Based on such architecture two new SRL algorithms are proposed. In our algorithms, a behavior model learned from examples is used to dynamically reduce the set of actions available from each state during the early reinforcement learning (RL) process. The creation of such subsets of actions leads the agent to exploit relevant parts of the action space, avoiding the selection of irrelevant actions. Once the agent has exploited the information provided by the behavior model, it keeps improving its value function without any help, by selecting the next actions to be performed from the complete action space. Our experimental work shows clearly how this approach can dramatically speed up the learning process.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"65 1","pages":"336-341"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85821451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiongyun Zhang, Chi Zhou, Weimin Xiao, Peter C. Nelson
Gene Expression Programming (GEP) is an evolutionary algorithm that incorporates both the idea of a simple, linear chromosome of fixed length used in Genetic Algorithms (GAs) and the tree structure of different sizes and shapes used in Genetic Programming (GP). As with other GP algorithms, GEP has difficulty finding appropriate numeric constants for terminal nodes in the expression trees. In this work, we describe a new approach of constant generation using Differential Evolution (DE), a real-valued GA robust and efficient at parameter optimization. Our experimental results on two symbolic regression problems show that the approach significantly improves the performance of the GEP algorithm. The proposed approach can be easily extended to other Genetic Programming variations.
{"title":"Improving gene expression programming performance by using differential evolution","authors":"Qiongyun Zhang, Chi Zhou, Weimin Xiao, Peter C. Nelson","doi":"10.1109/ICMLA.2007.55","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.55","url":null,"abstract":"Gene Expression Programming (GEP) is an evolutionary algorithm that incorporates both the idea of a simple, linear chromosome of fixed length used in Genetic Algorithms (GAs) and the tree structure of different sizes and shapes used in Genetic Programming (GP). As with other GP algorithms, GEP has difficulty finding appropriate numeric constants for terminal nodes in the expression trees. In this work, we describe a new approach of constant generation using Differential Evolution (DE), a real-valued GA robust and efficient at parameter optimization. Our experimental results on two symbolic regression problems show that the approach significantly improves the performance of the GEP algorithm. The proposed approach can be easily extended to other Genetic Programming variations.","PeriodicalId":74528,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning and Applications. International Conference on Machine Learning and Applications","volume":"28 1","pages":"31-37"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79307268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}