Enhancing metagenomic classification with compression-based features

IF 6.1 2区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence in Medicine Pub Date : 2024-08-14 DOI:10.1016/j.artmed.2024.102948
Jorge Miguel Silva, João Rafael Almeida
{"title":"Enhancing metagenomic classification with compression-based features","authors":"Jorge Miguel Silva,&nbsp;João Rafael Almeida","doi":"10.1016/j.artmed.2024.102948","DOIUrl":null,"url":null,"abstract":"<div><p>Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances.</p><p>In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification.</p><p>This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at <span><span>https://github.com/ieeta-pt/xgTaxonomy</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"156 ","pages":"Article 102948"},"PeriodicalIF":6.1000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0933365724001908/pdfft?md5=303ea4cf8a281f1bfe4eda4a7757ef46&pid=1-s2.0-S0933365724001908-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence in Medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0933365724001908","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances.

In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification.

This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at https://github.com/ieeta-pt/xgTaxonomy.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于压缩的特征增强元基因组分类。
元基因组学是一个迅速发展的领域,它利用下一代测序技术分析环境样本的基因构成。然而,准确识别元基因组样本中的生物可能很复杂,在某些情况下,传统的基于参考的方法可能需要更加有效。在本研究中,我们提出了一种新的元基因组鉴定方法,利用数据压缩器作为分类学分类的特征。通过评估一整套压缩器(包括通用压缩器和基因组专用压缩器),我们证明了这种方法在准确鉴定元基因组样本中的生物体方面的有效性。结果表明,使用来自多个压缩器的特征有助于识别生物分类。使用这种方法,在样本有限的不平衡类数据集上,总体准确率达到 95%。研究还表明,压缩与分类之间的相关性并不显著,这突出表明需要一种多方面的元基因组识别方法。这种方法是元基因组学领域的一大进步,它提供了一种无需参考的分类鉴定方法,既有效又高效,同时还揭示了基因组数据的统计和算法本质。验证这项研究的代码可在 https://github.com/ieeta-pt/xgTaxonomy 上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine 工程技术-工程:生物医学
CiteScore
15.00
自引率
2.70%
发文量
143
审稿时长
6.3 months
期刊介绍: Artificial Intelligence in Medicine publishes original articles from a wide variety of interdisciplinary perspectives concerning the theory and practice of artificial intelligence (AI) in medicine, medically-oriented human biology, and health care. Artificial intelligence in medicine may be characterized as the scientific discipline pertaining to research studies, projects, and applications that aim at supporting decision-based medical tasks through knowledge- and/or data-intensive computer-based solutions that ultimately support and improve the performance of a human care provider.
期刊最新文献
Hyperbolic multivariate feature learning in higher-order heterogeneous networks for drug–disease prediction Editorial Board BDFormer: Boundary-aware dual-decoder transformer for skin lesion segmentation Finger-aware Artificial Neural Network for predicting arthritis in Patients with hand pain Artificial intelligence-driven approaches in antibiotic stewardship programs and optimizing prescription practices: A systematic review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1