{"title":"TransGINmer:利用自我关注和图同构网络从元基因组中识别病毒序列","authors":"","doi":"10.1016/j.future.2024.07.025","DOIUrl":null,"url":null,"abstract":"<div><p>Viruses, abundant across diverse environments, play pivotal roles in microbial ecosystems and impact human health. Traditional virus studies are limited by their reliance on culture cultivation, which has been mitigated by metagenomics. It obtains nucleotide sequences of all microorganisms from the environment samples through the next-generation sequencing technology. This advancement prompts the need for efficient viral identification methods. To identify viruses accurately and quickly, We propose TransGINmer, a novel deep learning model to identify viral sequences directly from metagenomes. It encodes sequences by a k-mer frequency embedding model, constructs graphs from significant codon token correlations, and classifies them using graph isomorphism neural networks. In comparative tests against some SOTA methods DeepVirFinder, VirSorter2 and PhaMer on the testing dataset, the Amazon River dataset, the Sharon dataset and the CAMI Strain dataset, TransGINmer demonstrates superior accuracy, sensitivity, specificity, and AUC values, showcasing its potential as a robust tool for viral identification from metagenomes. TransGINmer is freely available at Github (<span><span>https://github.com/xizhilangcc/TransGINmer</span><svg><path></path></svg></span>).</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TransGINmer: Identifying viral sequences from metagenomes with self-attention and Graph Isomorphism Network\",\"authors\":\"\",\"doi\":\"10.1016/j.future.2024.07.025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Viruses, abundant across diverse environments, play pivotal roles in microbial ecosystems and impact human health. Traditional virus studies are limited by their reliance on culture cultivation, which has been mitigated by metagenomics. It obtains nucleotide sequences of all microorganisms from the environment samples through the next-generation sequencing technology. This advancement prompts the need for efficient viral identification methods. To identify viruses accurately and quickly, We propose TransGINmer, a novel deep learning model to identify viral sequences directly from metagenomes. It encodes sequences by a k-mer frequency embedding model, constructs graphs from significant codon token correlations, and classifies them using graph isomorphism neural networks. In comparative tests against some SOTA methods DeepVirFinder, VirSorter2 and PhaMer on the testing dataset, the Amazon River dataset, the Sharon dataset and the CAMI Strain dataset, TransGINmer demonstrates superior accuracy, sensitivity, specificity, and AUC values, showcasing its potential as a robust tool for viral identification from metagenomes. TransGINmer is freely available at Github (<span><span>https://github.com/xizhilangcc/TransGINmer</span><svg><path></path></svg></span>).</p></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X24003893\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24003893","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
摘要
病毒大量存在于各种环境中,在微生物生态系统中发挥着关键作用,并影响着人类健康。传统的病毒研究受限于对培养的依赖,而元基因组学则缓解了这一问题。它通过新一代测序技术从环境样本中获取所有微生物的核苷酸序列。这一进步促使人们需要高效的病毒鉴定方法。为了准确、快速地识别病毒,我们提出了一种新型深度学习模型--TransGINmer,用于直接从元基因组中识别病毒序列。它通过 k-mer 频率嵌入模型对序列进行编码,从重要的密码子标记相关性中构建图,并使用图同构神经网络对其进行分类。在测试数据集、亚马逊河数据集、沙龙数据集和 CAMI 菌株数据集上,TransGINmer 与一些 SOTA 方法 DeepVirFinder、VirSorter2 和 PhaMer 进行了对比测试,结果表明 TransGINmer 在准确性、灵敏度、特异性和 AUC 值方面都更胜一筹,展示了它作为从元基因组中识别病毒的强大工具的潜力。TransGINmer 可在 Github(https://github.com/xizhilangcc/TransGINmer)上免费获取。
TransGINmer: Identifying viral sequences from metagenomes with self-attention and Graph Isomorphism Network
Viruses, abundant across diverse environments, play pivotal roles in microbial ecosystems and impact human health. Traditional virus studies are limited by their reliance on culture cultivation, which has been mitigated by metagenomics. It obtains nucleotide sequences of all microorganisms from the environment samples through the next-generation sequencing technology. This advancement prompts the need for efficient viral identification methods. To identify viruses accurately and quickly, We propose TransGINmer, a novel deep learning model to identify viral sequences directly from metagenomes. It encodes sequences by a k-mer frequency embedding model, constructs graphs from significant codon token correlations, and classifies them using graph isomorphism neural networks. In comparative tests against some SOTA methods DeepVirFinder, VirSorter2 and PhaMer on the testing dataset, the Amazon River dataset, the Sharon dataset and the CAMI Strain dataset, TransGINmer demonstrates superior accuracy, sensitivity, specificity, and AUC values, showcasing its potential as a robust tool for viral identification from metagenomes. TransGINmer is freely available at Github (https://github.com/xizhilangcc/TransGINmer).
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.