Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.
{"title":"Learning to extract chemical names based on random text generation and incomplete dictionary","authors":"Su Yan, W. Spangler, Ying Chen","doi":"10.1145/2350176.2350180","DOIUrl":"https://doi.org/10.1145/2350176.2350180","url":null,"abstract":"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"126 1","pages":"21-25"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80602646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-throughput experimental techniques have made available large datasets of experimentally detected protein-protein interactions. However, experimentally determined protein complexes datasets are not exhaustive nor reliable. A protein complex plays a key role in disease development. Therefore, the identification and characterization of protein complexes involved is crucial to the understanding of the molecular events under normal and abnormal physiological conditions. In this paper, we propose a novel graph mining algorithm to identify protein complexes. The algorithm first checks the quality of the interaction data, then predicts protein complexes based on the concept of weighted clustering coefficient. To demonstrate the effectiveness of our proposed method, we present experimental results on yeast protein interaction data. The level of accuracy achieved is a strong argument in favor of the proposed method. Novel protein complexes were also predicted to assist biologists in their search for protein complexes. The datasets and programs are freely available from http://faculty.uaeu.ac.ae/nzaki/PE-WCC.htm.
{"title":"Detecting protein complexes from noisy protein interaction data","authors":"Dmitry Efimov, Nazar Zaki, Jose Berengueres","doi":"10.1145/2350176.2350177","DOIUrl":"https://doi.org/10.1145/2350176.2350177","url":null,"abstract":"High-throughput experimental techniques have made available large datasets of experimentally detected protein-protein interactions. However, experimentally determined protein complexes datasets are not exhaustive nor reliable. A protein complex plays a key role in disease development. Therefore, the identification and characterization of protein complexes involved is crucial to the understanding of the molecular events under normal and abnormal physiological conditions. In this paper, we propose a novel graph mining algorithm to identify protein complexes. The algorithm first checks the quality of the interaction data, then predicts protein complexes based on the concept of weighted clustering coefficient. To demonstrate the effectiveness of our proposed method, we present experimental results on yeast protein interaction data. The level of accuracy achieved is a strong argument in favor of the proposed method. Novel protein complexes were also predicted to assist biologists in their search for protein complexes. The datasets and programs are freely available from http://faculty.uaeu.ac.ae/nzaki/PE-WCC.htm.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"PP 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84534012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_13
Sergio Santander-Jiménez, M. A. Vega-Rodríguez, J. Pulido, J. M. Sánchez-Pérez
{"title":"Inferring Phylogenetic Trees Using a Multiobjective Artificial Bee Colony Algorithm","authors":"Sergio Santander-Jiménez, M. A. Vega-Rodríguez, J. Pulido, J. M. Sánchez-Pérez","doi":"10.1007/978-3-642-29066-4_13","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_13","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"36 1","pages":"144-155"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73523451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_15
K. Kancherla, Srinivas Mukkamala
{"title":"Feature Selection for Lung Cancer Detection Using SVM Based Recursive Feature Elimination Method","authors":"K. Kancherla, Srinivas Mukkamala","doi":"10.1007/978-3-642-29066-4_15","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_15","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"59 1","pages":"168-176"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76658714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_7
Marco S. Nobile, D. Besozzi, P. Cazzaniga, G. Mauri, D. Pescini
{"title":"A GPU-Based Multi-swarm PSO Method for Parameter Estimation in Stochastic Biological Systems Exploiting Discrete-Time Target Series","authors":"Marco S. Nobile, D. Besozzi, P. Cazzaniga, G. Mauri, D. Pescini","doi":"10.1007/978-3-642-29066-4_7","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_7","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"10 1","pages":"74-85"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82148280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_19
C. Pizzuti, Simona E. Rombo, E. Marchiori
{"title":"Complex Detection in Protein-Protein Interaction Networks: A Compact Overview for Researchers and Practitioners","authors":"C. Pizzuti, Simona E. Rombo, E. Marchiori","doi":"10.1007/978-3-642-29066-4_19","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_19","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"16 1","pages":"211-223"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91529321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_5
S. Marini, A. Conversi
{"title":"Understanding Zooplankton Long Term Variability through Genetic Programming","authors":"S. Marini, A. Conversi","doi":"10.1007/978-3-642-29066-4_5","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_5","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"166 1","pages":"50-61"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73135658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_12
E. Holzinger, S. Dudek, A. Frase, B. Fridley, P. Chalise, M. Ritchie
{"title":"Comparison of Methods for Meta-dimensional Data Analysis Using in Silico and Biological Data Sets","authors":"E. Holzinger, S. Dudek, A. Frase, B. Fridley, P. Chalise, M. Ritchie","doi":"10.1007/978-3-642-29066-4_12","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_12","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"19 1","pages":"134-143"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83807907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_6
H. Franken, Alexander Seitz, R. Lehmann, H. Häring, N. Stefan, A. Zell
{"title":"Inferring Disease-Related Metabolite Dependencies with a Bayesian Optimization Algorithm","authors":"H. Franken, Alexander Seitz, R. Lehmann, H. Häring, N. Stefan, A. Zell","doi":"10.1007/978-3-642-29066-4_6","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_6","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"7 1","pages":"62-73"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84493075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-04-11DOI: 10.1007/978-3-642-29066-4_11
Qinxin Pan, Christian Darabos, J. Moore
{"title":"The Role of Mutations in Whole Genome Duplication","authors":"Qinxin Pan, Christian Darabos, J. Moore","doi":"10.1007/978-3-642-29066-4_11","DOIUrl":"https://doi.org/10.1007/978-3-642-29066-4_11","url":null,"abstract":"","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"43 1","pages":"122-133"},"PeriodicalIF":0.0,"publicationDate":"2012-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84134717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}