Zuzanna Karwowska, Oliver Aasmets, Tomasz Kosciolek, Elin Org
{"title":"Effects of data transformation and model selection on feature importance in microbiome classification data.","authors":"Zuzanna Karwowska, Oliver Aasmets, Tomasz Kosciolek, Elin Org","doi":"10.1186/s40168-024-01996-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored.</p><p><strong>Results: </strong>Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning-based biomarker detection.</p><p><strong>Conclusions: </strong>Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work.</p>","PeriodicalId":18447,"journal":{"name":"Microbiome","volume":"13 1","pages":"2"},"PeriodicalIF":13.8000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11699698/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbiome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40168-024-01996-6","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored.
Results: Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning-based biomarker detection.
Conclusions: Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work.
期刊介绍:
Microbiome is a journal that focuses on studies of microbiomes in humans, animals, plants, and the environment. It covers both natural and manipulated microbiomes, such as those in agriculture. The journal is interested in research that uses meta-omics approaches or novel bioinformatics tools and emphasizes the community/host interaction and structure-function relationship within the microbiome. Studies that go beyond descriptive omics surveys and include experimental or theoretical approaches will be considered for publication. The journal also encourages research that establishes cause and effect relationships and supports proposed microbiome functions. However, studies of individual microbial isolates/species without exploring their impact on the host or the complex microbiome structures and functions will not be considered for publication. Microbiome is indexed in BIOSIS, Current Contents, DOAJ, Embase, MEDLINE, PubMed, PubMed Central, and Science Citations Index Expanded.