TransGINmer: Identifying viral sequences from metagenomes with self-attention and Graph Isomorphism Network

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-07-16 DOI:10.1016/j.future.2024.07.025
{"title":"TransGINmer: Identifying viral sequences from metagenomes with self-attention and Graph Isomorphism Network","authors":"","doi":"10.1016/j.future.2024.07.025","DOIUrl":null,"url":null,"abstract":"<div><p>Viruses, abundant across diverse environments, play pivotal roles in microbial ecosystems and impact human health. Traditional virus studies are limited by their reliance on culture cultivation, which has been mitigated by metagenomics. It obtains nucleotide sequences of all microorganisms from the environment samples through the next-generation sequencing technology. This advancement prompts the need for efficient viral identification methods. To identify viruses accurately and quickly, We propose TransGINmer, a novel deep learning model to identify viral sequences directly from metagenomes. It encodes sequences by a k-mer frequency embedding model, constructs graphs from significant codon token correlations, and classifies them using graph isomorphism neural networks. In comparative tests against some SOTA methods DeepVirFinder, VirSorter2 and PhaMer on the testing dataset, the Amazon River dataset, the Sharon dataset and the CAMI Strain dataset, TransGINmer demonstrates superior accuracy, sensitivity, specificity, and AUC values, showcasing its potential as a robust tool for viral identification from metagenomes. TransGINmer is freely available at Github (<span><span>https://github.com/xizhilangcc/TransGINmer</span><svg><path></path></svg></span>).</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24003893","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Viruses, abundant across diverse environments, play pivotal roles in microbial ecosystems and impact human health. Traditional virus studies are limited by their reliance on culture cultivation, which has been mitigated by metagenomics. It obtains nucleotide sequences of all microorganisms from the environment samples through the next-generation sequencing technology. This advancement prompts the need for efficient viral identification methods. To identify viruses accurately and quickly, We propose TransGINmer, a novel deep learning model to identify viral sequences directly from metagenomes. It encodes sequences by a k-mer frequency embedding model, constructs graphs from significant codon token correlations, and classifies them using graph isomorphism neural networks. In comparative tests against some SOTA methods DeepVirFinder, VirSorter2 and PhaMer on the testing dataset, the Amazon River dataset, the Sharon dataset and the CAMI Strain dataset, TransGINmer demonstrates superior accuracy, sensitivity, specificity, and AUC values, showcasing its potential as a robust tool for viral identification from metagenomes. TransGINmer is freely available at Github (https://github.com/xizhilangcc/TransGINmer).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TransGINmer:利用自我关注和图同构网络从元基因组中识别病毒序列
病毒大量存在于各种环境中,在微生物生态系统中发挥着关键作用,并影响着人类健康。传统的病毒研究受限于对培养的依赖,而元基因组学则缓解了这一问题。它通过新一代测序技术从环境样本中获取所有微生物的核苷酸序列。这一进步促使人们需要高效的病毒鉴定方法。为了准确、快速地识别病毒,我们提出了一种新型深度学习模型--TransGINmer,用于直接从元基因组中识别病毒序列。它通过 k-mer 频率嵌入模型对序列进行编码,从重要的密码子标记相关性中构建图,并使用图同构神经网络对其进行分类。在测试数据集、亚马逊河数据集、沙龙数据集和 CAMI 菌株数据集上,TransGINmer 与一些 SOTA 方法 DeepVirFinder、VirSorter2 和 PhaMer 进行了对比测试,结果表明 TransGINmer 在准确性、灵敏度、特异性和 AUC 值方面都更胜一筹,展示了它作为从元基因组中识别病毒的强大工具的潜力。TransGINmer 可在 Github(https://github.com/xizhilangcc/TransGINmer)上免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
期刊最新文献
Analyzing inference workloads for spatiotemporal modeling An efficient federated learning solution for the artificial intelligence of things Generative adversarial networks to detect intrusion and anomaly in IP flow-based networks Blockchain-based conditional privacy-preserving authentication scheme using PUF for vehicular ad hoc networks UAV-IRS-assisted energy harvesting for edge computing based on deep reinforcement learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1