Juan Carlos Francisco, Frederick M Cohan, Danny Krizanc
{"title":"Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data.","authors":"Juan Carlos Francisco, Frederick M Cohan, Danny Krizanc","doi":"10.1504/IJBRA.2014.062992","DOIUrl":null,"url":null,"abstract":"<p><p>Identification of closely related, ecologically distinct populations of bacteria would benefit microbiologists working in many fields including systematics, epidemiology and biotechnology. Several laboratories have recently developed algorithms aimed at demarcating such 'ecotypes'. We examine the ability of four of these algorithms to correctly identify ecotypes from sequence data. We tested the algorithms on synthetic sequences, with known history and habitat associations, generated under the stable ecotype model and on data from Bacillus strains isolated from Death Valley where previous work has confirmed the existence of multiple ecotypes. We found that one of the algorithms (ecotype simulation) performs significantly better than the others (AdaptML, GMYC, BAPS) in both instances. Unfortunately, it was also shown to be the least efficient of the four. While ecotype simulation is the most accurate, it is by a large margin the slowest of the algorithms tested. Attempts at improving its efficiency are underway. </p>","PeriodicalId":35444,"journal":{"name":"International Journal of Bioinformatics Research and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJBRA.2014.062992","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2014.062992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Health Professions","Score":null,"Total":0}
引用次数: 7
Abstract
Identification of closely related, ecologically distinct populations of bacteria would benefit microbiologists working in many fields including systematics, epidemiology and biotechnology. Several laboratories have recently developed algorithms aimed at demarcating such 'ecotypes'. We examine the ability of four of these algorithms to correctly identify ecotypes from sequence data. We tested the algorithms on synthetic sequences, with known history and habitat associations, generated under the stable ecotype model and on data from Bacillus strains isolated from Death Valley where previous work has confirmed the existence of multiple ecotypes. We found that one of the algorithms (ecotype simulation) performs significantly better than the others (AdaptML, GMYC, BAPS) in both instances. Unfortunately, it was also shown to be the least efficient of the four. While ecotype simulation is the most accurate, it is by a large margin the slowest of the algorithms tested. Attempts at improving its efficiency are underway.
期刊介绍:
Bioinformatics is an interdisciplinary research field that combines biology, computer science, mathematics and statistics into a broad-based field that will have profound impacts on all fields of biology. The emphasis of IJBRA is on basic bioinformatics research methods, tool development, performance evaluation and their applications in biology. IJBRA addresses the most innovative developments, research issues and solutions in bioinformatics and computational biology and their applications. Topics covered include Databases, bio-grid, system biology Biomedical image processing, modelling and simulation Bio-ontology and data mining, DNA assembly, clustering, mapping Computational genomics/proteomics Silico technology: computational intelligence, high performance computing E-health, telemedicine Gene expression, microarrays, identification, annotation Genetic algorithms, fuzzy logic, neural networks, data visualisation Hidden Markov models, machine learning, support vector machines Molecular evolution, phylogeny, modelling, simulation, sequence analysis Parallel algorithms/architectures, computational structural biology Phylogeny reconstruction algorithms, physiome, protein structure prediction Sequence assembly, search, alignment Signalling/computational biomedical data engineering Simulated annealing, statistical analysis, stochastic grammars.