首页 > 最新文献

Current Bioinformatics最新文献

英文 中文
DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers DeepPTM:通过将深度蛋白质语言模型与视觉变换器相结合,从蛋白质序列预测蛋白质翻译后修饰
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-02-02 DOI: 10.2174/0115748936283134240109054157
Necla Nisa Soylu, Emre Sefer
Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
简介::最近的自监督深度语言模型,如来自变换器的双向编码器表示(BERT),通过上下文词嵌入以获得更好的动态表示,在一些语言任务中表现最佳。它们的蛋白质特定版本(如 ProtBERT)生成了动态蛋白质序列嵌入,从而在一些生物信息学任务中取得了更好的性能。此外,一些不同的蛋白质翻译后修饰在细胞任务(如发育和分化)中非常突出。目前的生物实验可以检测这些修饰,但持续时间较长,成本较高。方法为了更简洁、更快速地理解伴随的生物过程,我们在本文中提出了 DEEPPTM,以更高效地从蛋白质序列中预测蛋白质翻译后修饰(PTM)位点。与现有方法不同,DEEPPTM 通过整合基于 ProtBERT 的专业蛋白质嵌入和基于注意力的视觉转换器(ViT),提高了修饰预测性能,并揭示了不同修饰类型与蛋白质序列内容之间的关联。此外,它还能推断出不同物种的多种不同修饰。结果应用 10 倍交叉验证后,人类和小鼠琥珀酰化修饰预测的 ROC AUC 分别为 0.988 和 0.965。同样,我们在推断泛素化、巴豆酰化和糖化位点时也分别获得了 0.982、0.955 和 0.953 的 ROC AUC 分数。根据详细的计算实验,DEEPPTM 减少了实验室实验所花费的时间,同时在推断所有 4 个修饰位点方面优于竞争方法和基线方法。在我们的案例中,与传统的深度学习和机器学习技术相比,基于注意力的深度学习方法(如视觉转换器)更有利于从 ProtBERT 特征中学习。结论此外,在 PTM 预测任务中,蛋白质特异性 ProtBERT 模型比原始 BERT 嵌入更有效。我们的代码和数据集见 https://github.com/seferlab/deepptm。
{"title":"DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers","authors":"Necla Nisa Soylu, Emre Sefer","doi":"10.2174/0115748936283134240109054157","DOIUrl":"https://doi.org/10.2174/0115748936283134240109054157","url":null,"abstract":"Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"3 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer STNMDA:利用结构感知变压器预测潜在微生物与药物关联的新型模型
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-02-02 DOI: 10.2174/0115748936272939231212102627
Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu
Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.
导言:微生物与许多疾病的生理和病理过程密切相关。临床上迫切需要新药来防治微生物引起的疾病。因此,预测潜在的微生物-药物关联对于疾病治疗和新药发现都至关重要。然而,通过传统的湿实验室方法来验证这些关系既费钱又费时。方法:我们提出了一种高效的计算模型 STNMDA,它集成了结构感知转换器(SAT)和深度神经网络(DNN)分类器,用于推断潜在的微生物药物关联。STNMDA 首先采用 "重启随机漫步 "方法,利用微生物和药物的高斯核相似性和功能相似性度量构建异构网络。然后将该异构网络输入 SAT,以提取每个药物和微生物节点的属性特征和图结构。最后,DNN 分类器计算微生物与药物之间的关联概率。结果广泛的实验结果表明,STNMDA 在 MDAD 和 aBiofilm 数据库上的性能超过了现有的最先进模型。此外,通过案例验证,证明了 STNMDA 在确认微生物与药物之间关联方面的可行性。结论因此,STNMDA有望成为未来预测微生物与药物关联的重要工具。
{"title":"STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer","authors":"Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu","doi":"10.2174/0115748936272939231212102627","DOIUrl":"https://doi.org/10.2174/0115748936272939231212102627","url":null,"abstract":"Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"9 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs P4PC:piRNA 和 circRNA 生物信息学资源门户网站
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-02-02 DOI: 10.2174/0115748936289420240117100823
Yajun Liu, Ru Li, Yulian Ding, Xin Hong Hei, Fang-Xiang Wu
Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.
背景:PIWI互作RNA(piRNA)和环状RNA(circRNA)是两种非编码RNA(ncRNA),它们在表观遗传调控、转录调控、转录后调控等多种生物过程中发挥着重要作用。虽然目前已有各种资源,但如何为特定的 ncRNA 研究项目选择这些资源仍是一项挑战。方法:为了帮助研究人员找到合适的生物信息学资源来研究 ncRNAs,我们创建了一个名为 P4PC 的新门户网站,提供 piRNAs 和 circRNAs 的计算工具和数据源。结果:P4PC 中人工编辑了 249 种计算工具、126 个数据库和 420 篇论文。P4PC 中的所有条目分为 5 组和 26 个子组。每组的第一页都有资源清单汇总 结论:用户可以根据自己的研究建议,通过查看 P4PC 中的详细信息和评论,快速为自己的研究项目选择合适的资源。数据库网址为 http://www.ibiomedical.net/Portal4PC/ 和 http://43.138.46.5:8080/Portal4PC/。
{"title":"P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs","authors":"Yajun Liu, Ru Li, Yulian Ding, Xin Hong Hei, Fang-Xiang Wu","doi":"10.2174/0115748936289420240117100823","DOIUrl":"https://doi.org/10.2174/0115748936289420240117100823","url":null,"abstract":"Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"39 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences 增强 DNA 序列中蛋白质编码区识别的改进型混合方法
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-02-01 DOI: 10.2174/0115748936287244240117065325
Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie
Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.
简介识别和预测 DNA 序列中的蛋白质编码区在基因组研究中起着举足轻重的作用。本文介绍了一种识别 DNA 序列中蛋白质编码区的方法,该方法采用了一种混合方法,将数字带通滤波器与小波变换和各种光谱估算技术相结合,以提高外显子预测能力。具体来说,该方法采用了哈尔和道贝奇斯小波变换来提高蛋白质编码区(外显子)预测的准确性,从而能够提取原始 DNA 序列中可能被掩盖的复杂细节:DNA 序列中蛋白质编码区的识别和预测在基因组研究中起着举足轻重的作用。方法:这项研究展示了哈尔和道贝奇斯小波变换、非参数和参数谱估计方法的实用性,以及数字带通滤波器在检测外显子区域峰值方面的应用。此外,应用电子-离子相互作用势(EIIP)方法将符号 DNA 序列转换为数值,以及利用具有优化参数的总和-正弦曲线(SoS)数学模型,进一步丰富了 DNA 序列分析工具箱,确保所提出的方法能够成功地对 DNA 序列进行优化建模并准确识别基因:利用小波变换加强 DNA 序列中蛋白质编码区的识别 结果:该方法的结果表明,蛋白质编码区的识别准确率大幅提高。在峰值位置检测方面,应用 Haar 和 Daubechies 小波变换可将峰值定位精度提高约 (0.01, 3-5 dB)。在采用非参数和参数频谱估计技术时,与原始信号相比,峰值定位精度提高了约 (0.01, 4 dB)。方法:将数字带通滤波器、小波变换和各种频谱估计技术相结合的混合方法,以提高外显子预测能力。结论:这些发现不仅弥补了 DNA 序列分析中的不足,还为基因组学研究中的外显子区域预测和基因鉴定提供了一条前景广阔的途径。所提出的混合方法是对不断发展的基因组分析技术的有力贡献:通过该方法获得的结果表明,识别的准确性显著提高。这些发现为 DNA 序列分析、外显子区域预测和基因鉴定提供了一条前景广阔的途径。
{"title":"Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences","authors":"Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie","doi":"10.2174/0115748936287244240117065325","DOIUrl":"https://doi.org/10.2174/0115748936287244240117065325","url":null,"abstract":"Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"7 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review 深度学习辅助药物发现方法的进展:自我回顾
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-01-29 DOI: 10.2174/0115748936285690240101041704
Haiping Zhang, Konda Mani Saravanan
Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.
人工智能是计算机科学的一个领域,致力于复制人脑固有的复杂结构和运行机制。机器学习是人工智能的一个子领域,侧重于通过分析训练数据来开发模型。深度学习是人工智能中一个独特的子领域,其特点是使用跨多层的几何变换模型。深度学习在包括健康和生命科学在内的各个领域都大有可为。近来,深度学习已成功应用于药物发现领域。在这篇自述中,我们介绍了借助深度学习开发的最新方法。目的是简要介绍我们小组目前在药物发现方面取得的前沿进展。我们系统地讨论了基于深度学习开发的模型(如 Deep-BindBC、DeepPep 和 DeepBindRG)的实验证据和概念验证实例。这些进展不仅揭示了现有的挑战,也强调了未来药物发现和开发进展的成就和前景。
{"title":"Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review","authors":"Haiping Zhang, Konda Mani Saravanan","doi":"10.2174/0115748936285690240101041704","DOIUrl":"https://doi.org/10.2174/0115748936285690240101041704","url":null,"abstract":"Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"175 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data FMDVSerPred:利用 VP1 核苷酸序列数据对亚洲流行的口蹄疫病毒进行分类和血清型预测的新型计算解决方案
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-01-29 DOI: 10.2174/0115748936278851231213110653
Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh
Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.
背景:口蹄疫病毒有三种血清型在亚洲流行,通常通过血清学检测进行鉴定。此类检测耗时较长,而且需要生物隔离设施来执行检测。据我们所知,文献中还没有预测口蹄疫病毒血清型的计算解决方案。因此,我们迫切需要便于使用的工具来进行口蹄疫病毒血清型鉴定。方法:我们提出了一种基于机器学习模型的计算解决方案,用于口蹄疫病毒分类和血清型预测。此外,该方法还采用了各种数据预处理技术,以便更好地进行模型预测。我们使用了从印度和其他七个亚洲口蹄疫流行国家报告的 2509 株口蹄疫病毒分离物的序列数据进行模型训练、测试和验证。我们还通过收集印度报告的 12 个病毒分离物并对其进行测序,在湿实验室设置中研究了所开发计算解决方案的实用性。在此,我们将计算解决方案应用于两个用户友好型工具,即在线网络预测服务器 (https://nifmd-bbf.icar.gov.in/FMDVSerPred) 和 R 统计软件包 (https://github.com/sam-dfmd/FMDVSerPred)。结果:计算解决方案中采用了随机森林机器学习模型,在十个独立测试数据集上进行评估时,该模型的表现优于其他七个机器学习模型。此外,所开发的计算解决方案在测试数据上的验证准确率高达 99.87%,在来自亚洲国家(包括印度及其七个邻国)的独立数据上的验证准确率高达 98.64% 和 90.24%。此外,我们的方法还成功地用于预测印度各地报告的现场口蹄疫病毒分离物的血清型。结论因此,高通量测序与机器学习相结合为口蹄疫病毒血清型鉴定提供了一种前景广阔的解决方案。
{"title":"FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data","authors":"Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh","doi":"10.2174/0115748936278851231213110653","DOIUrl":"https://doi.org/10.2174/0115748936278851231213110653","url":null,"abstract":"Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"38 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs 利用药物的多种特性预测基于药物途径的疾病类别
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-01-29 DOI: 10.2174/0115748936284973240105115444
Lei Chen, Linyang Li
Background:: Drug repositioning now is an important research area in drug discovery as it can accelerate the procedures of discovering novel effects of existing drugs. However, it is challenging to screen out possible effects for given drugs. Designing computational methods are a quick and cheap way to complete this task. Most existing computational methods infer the relationships between drugs and diseases. The pathway-based disease classification reported in KEGG provides us a new way to investigate drug repositioning as such classification can be applied to drugs. A predicted class of a given drug suggests latent diseases it can treat. Objective:: The purpose of this study is to set up efficient multi-label classifiers to predict the classes of drugs. Method:: We adopt three types of drug information to generate drug features, including drug pathway information, label information and drug network. For the first two types, drugs are first encoded into binary vectors, which are further processed by singular value decomposition. For the third type, the network embedding algorithm, Mashup, is employed to yield drug features. Above features are combined and fed into RAndom k-labELsets (RAKEL) to construct multi-label classifiers, where support vector machine is selected as the base classification algorithm. Results:: The ten-fold cross-validation results show that the classifiers provide high performance with accuracy higher than 0.95 and absolute true higher than 0.92. The case study indicates the novel effects of three drugs, i.e., they may treat new diseases. Conclusion:: The proposed classifiers have high performance and are superiority to the classifiers with other classic algorithms and drug information. Furthermore, they have the ability to discover new effects of drugs.
背景目前,药物重新定位是药物发现的一个重要研究领域,因为它可以加快发现现有药物新作用的程序。然而,要筛选出特定药物的可能作用是一项挑战。设计计算方法是完成这项任务的快速而廉价的途径。现有的大多数计算方法都是推断药物与疾病之间的关系。KEGG 中报告的基于通路的疾病分类为我们提供了一种研究药物重新定位的新方法,因为这种分类可以应用于药物。某种药物的预测类别暗示了它可以治疗的潜在疾病。研究目的本研究的目的是建立高效的多标签分类器来预测药物类别。方法:我们采用三种药物信息来生成药物特征,包括药物路径信息、标签信息和药物网络。对于前两种类型,首先将药物编码为二进制向量,然后对其进行奇异值分解处理。对于第三种类型,则采用网络嵌入算法 Mashup 来生成药物特征。上述特征经组合后输入 RAndom k-labELsets (RAKEL) 以构建多标签分类器,并选择支持向量机作为基础分类算法。结果十倍交叉验证结果表明,分类器具有较高的性能,准确率高于 0.95,绝对真实度高于 0.92。案例研究表明了三种药物的新作用,即它们可以治疗新的疾病。结论所提出的分类器具有很高的性能,优于使用其他经典算法和药物信息的分类器。此外,它们还具有发现药物新功效的能力。
{"title":"Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs","authors":"Lei Chen, Linyang Li","doi":"10.2174/0115748936284973240105115444","DOIUrl":"https://doi.org/10.2174/0115748936284973240105115444","url":null,"abstract":"Background:: Drug repositioning now is an important research area in drug discovery as it can accelerate the procedures of discovering novel effects of existing drugs. However, it is challenging to screen out possible effects for given drugs. Designing computational methods are a quick and cheap way to complete this task. Most existing computational methods infer the relationships between drugs and diseases. The pathway-based disease classification reported in KEGG provides us a new way to investigate drug repositioning as such classification can be applied to drugs. A predicted class of a given drug suggests latent diseases it can treat. Objective:: The purpose of this study is to set up efficient multi-label classifiers to predict the classes of drugs. Method:: We adopt three types of drug information to generate drug features, including drug pathway information, label information and drug network. For the first two types, drugs are first encoded into binary vectors, which are further processed by singular value decomposition. For the third type, the network embedding algorithm, Mashup, is employed to yield drug features. Above features are combined and fed into RAndom k-labELsets (RAKEL) to construct multi-label classifiers, where support vector machine is selected as the base classification algorithm. Results:: The ten-fold cross-validation results show that the classifiers provide high performance with accuracy higher than 0.95 and absolute true higher than 0.92. The case study indicates the novel effects of three drugs, i.e., they may treat new diseases. Conclusion:: The proposed classifiers have high performance and are superiority to the classifiers with other classic algorithms and drug information. Furthermore, they have the ability to discover new effects of drugs.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"222 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prospects of Identifying Alternative Splicing Events from Single-Cell RNA Sequencing Data 从单细胞 RNA 测序数据中识别替代剪接事件的前景
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-01-26 DOI: 10.2174/0115748936279561231214072041
Jiacheng Wang, Lei Yuan
Background: The advent of single-cell RNA sequencing (scRNA-seq) technology has offered unprecedented opportunities to unravel cellular heterogeneity and functions. Yet, despite its success in unraveling gene expression heterogeneity, accurately identifying and interpreting alternative splicing events from scRNA-seq data remains a formidable challenge. With advancing technology and algorithmic innovations, the prospect of accurately identifying alternative splicing events from scRNA-seq data is becoming increasingly promising Objective: This perspective aims to uncover the intricacies of splicing at the single-cell level and their potential implications for health and disease. It seeks to harness scRNA-seq's transformative power in revealing cell-specific alternative splicing dynamics and aims to propel our understanding of gene regulation within individual cells to new heights. Methods: The perspective grounds its method on recent literature along with the experimental protocols of single-cell RNA-seq and methods to identify and quantify the alternative splicing events from scRNA-seq data. Results: This perspective outlines the promising potential, challenges, and methodologies for leveraging different scRNA-seq technologies to identify and study alternative splicing events, with a focus on advancing our understanding of gene regulation at the single-cell level. Conclusion: This perspective explores the prospects of utilizing scRNA-seq data to identify and study alternative splicing events, highlighting their potential, challenges, methodologies, biological insights, and future directions.
背景:单细胞 RNA 测序(scRNA-seq)技术的出现为揭示细胞异质性和功能提供了前所未有的机会。然而,尽管该技术在揭示基因表达异质性方面取得了成功,但从 scRNA-seq 数据中准确识别和解释替代剪接事件仍然是一项艰巨的挑战。随着技术的进步和算法的创新,从 scRNA-seq 数据中准确识别替代剪接事件的前景正变得越来越光明:本视角旨在揭示单细胞水平剪接的复杂性及其对健康和疾病的潜在影响。它试图利用 scRNA-seq 在揭示细胞特异性替代剪接动态方面的变革能力,并将我们对单个细胞内基因调控的理解推向新的高度。方法:该视角的研究方法基于最新的文献、单细胞 RNA-seq 实验方案以及从 scRNA-seq 数据中识别和量化替代剪接事件的方法。结果:本视角概述了利用不同 scRNA-seq 技术识别和研究替代剪接事件的巨大潜力、挑战和方法,重点是推进我们对单细胞水平基因调控的理解。结论:本视角探讨了利用 scRNA-seq 数据识别和研究替代剪接事件的前景,强调了其潜力、挑战、方法、生物学见解和未来方向。
{"title":"Prospects of Identifying Alternative Splicing Events from Single-Cell RNA Sequencing Data","authors":"Jiacheng Wang, Lei Yuan","doi":"10.2174/0115748936279561231214072041","DOIUrl":"https://doi.org/10.2174/0115748936279561231214072041","url":null,"abstract":"Background: The advent of single-cell RNA sequencing (scRNA-seq) technology has offered unprecedented opportunities to unravel cellular heterogeneity and functions. Yet, despite its success in unraveling gene expression heterogeneity, accurately identifying and interpreting alternative splicing events from scRNA-seq data remains a formidable challenge. With advancing technology and algorithmic innovations, the prospect of accurately identifying alternative splicing events from scRNA-seq data is becoming increasingly promising Objective: This perspective aims to uncover the intricacies of splicing at the single-cell level and their potential implications for health and disease. It seeks to harness scRNA-seq's transformative power in revealing cell-specific alternative splicing dynamics and aims to propel our understanding of gene regulation within individual cells to new heights. Methods: The perspective grounds its method on recent literature along with the experimental protocols of single-cell RNA-seq and methods to identify and quantify the alternative splicing events from scRNA-seq data. Results: This perspective outlines the promising potential, challenges, and methodologies for leveraging different scRNA-seq technologies to identify and study alternative splicing events, with a focus on advancing our understanding of gene regulation at the single-cell level. Conclusion: This perspective explores the prospects of utilizing scRNA-seq data to identify and study alternative splicing events, highlighting their potential, challenges, methodologies, biological insights, and future directions.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Deep Learning Neural Networks in Computer-aided Drug Discovery: A Review 深度学习神经网络在计算机辅助药物发现中的应用:综述
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-01-25 DOI: 10.2174/0115748936276510231123121404
Jay Shree Mathivanan, Victor Violet Dhayabaran, Mary Rajathei David, Muthugobal Bagayalakshmi Karuna Nidhi, Karuppasamy Muthuvel Prasath, Suvaiyarasan Suvaithenamudhan
: Computer-aided drug design has an important role in drug development and design. It has become a thriving area of research in the pharmaceutical industry to accelerate the drug discovery process. Deep learning, a subdivision of artificial intelligence, is widely applied to advance new drug development and design opportunities. This article reviews the recent technology that uses deep learning techniques to ameliorate the understanding of drug-target interactions in computer-aided drug discovery based on the prior knowledge acquired from various literature. In general, deep learning models can be trained to predict the binding affinity between the protein-ligand complexes and protein structures or generate protein-ligand complexes in structure-based drug discovery. In other words, artificial neural networks and deep learning algorithms, especially graph convolutional neural networks and generative adversarial networks, can be applied to drug discovery. Graph convolutional neural network effectively captures the interactions and structural information between atoms and molecules, which can be enforced to predict the binding affinity between protein and ligand. Also, the ligand molecules with the desired properties can be generated using generative adversarial networks.
:计算机辅助药物设计在药物开发和设计中发挥着重要作用。它已成为制药业中一个蓬勃发展的研究领域,以加速药物发现过程。深度学习作为人工智能的一个分支,被广泛应用于推动新药开发和设计机会。本文综述了在计算机辅助药物发现过程中,基于从各种文献中获取的先验知识,利用深度学习技术改善对药物与靶点相互作用的理解的最新技术。一般来说,深度学习模型可以通过训练来预测蛋白质配体复合物与蛋白质结构之间的结合亲和力,或在基于结构的药物发现中生成蛋白质配体复合物。换句话说,人工神经网络和深度学习算法,尤其是图卷积神经网络和生成对抗网络,可以应用于药物发现。图卷积神经网络能有效捕捉原子和分子之间的相互作用和结构信息,并以此预测蛋白质和配体之间的结合亲和力。此外,还可利用生成式对抗网络生成具有所需特性的配体分子。
{"title":"Application of Deep Learning Neural Networks in Computer-aided Drug Discovery: A Review","authors":"Jay Shree Mathivanan, Victor Violet Dhayabaran, Mary Rajathei David, Muthugobal Bagayalakshmi Karuna Nidhi, Karuppasamy Muthuvel Prasath, Suvaiyarasan Suvaithenamudhan","doi":"10.2174/0115748936276510231123121404","DOIUrl":"https://doi.org/10.2174/0115748936276510231123121404","url":null,"abstract":": Computer-aided drug design has an important role in drug development and design. It has become a thriving area of research in the pharmaceutical industry to accelerate the drug discovery process. Deep learning, a subdivision of artificial intelligence, is widely applied to advance new drug development and design opportunities. This article reviews the recent technology that uses deep learning techniques to ameliorate the understanding of drug-target interactions in computer-aided drug discovery based on the prior knowledge acquired from various literature. In general, deep learning models can be trained to predict the binding affinity between the protein-ligand complexes and protein structures or generate protein-ligand complexes in structure-based drug discovery. In other words, artificial neural networks and deep learning algorithms, especially graph convolutional neural networks and generative adversarial networks, can be applied to drug discovery. Graph convolutional neural network effectively captures the interactions and structural information between atoms and molecules, which can be enforced to predict the binding affinity between protein and ligand. Also, the ligand molecules with the desired properties can be generated using generative adversarial networks.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"21 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data 人工智能、机器学习和深度学习技术在基因组学中的整合:DNA 和 RNA Seq 数据 NGS 分析的计算视角综述
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-01-24 DOI: 10.2174/0115748936284044240108074937
Chandrashekar K, Vidya Niranjan, Adarsh Vishal, Anagha S Setlur
: In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data’s volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.
:在当前的基因组学和生物医学研究领域,人工智能(AI)、机器学习(ML)和深度学习(DL)的应用已成为范式的转变者。虽然传统的 NGS DNA 和 RNA 测序分析流水线在解码遗传信息方面表现出色,但测序数据的数量和复杂性却激增。人们需要更高效、更准确的分析方法。这导致了对人工智能/ML 和 DL 方法的依赖。本文重点介绍了这些工具方法,以缓解局限性并产生更好的结果。在管道自动化的帮助下,将这些工具集成到 NGS DNA 和 RNA-seq 管道中,我们可以提高研究质量,因为可以使用深度学习工具处理大型数据集。自动化有助于减少劳动密集型任务,帮助研究人员专注于其他前沿研究。在传统流水线中,从质量检查到 SNP 检测中的变异识别,所有任务都需要大量的计算时间,研究人员还必须手动输入代码,以防止人为手动错误,但借助自动化的力量,我们可以在相对较短的时间内顺利完成整个流程,因为自动化流水线可以运行多个文件,而不是传统流水线中的单个文件。总之,本综述论文揭示了将 DL 集成到传统管道中的变革性影响及其在优化计算时间方面的作用。此外,它还强调了人工智能驱动的解决方案在推进基因组学研究和实现数据密集型生物医学应用方面日益增长的重要性。
{"title":"Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data","authors":"Chandrashekar K, Vidya Niranjan, Adarsh Vishal, Anagha S Setlur","doi":"10.2174/0115748936284044240108074937","DOIUrl":"https://doi.org/10.2174/0115748936284044240108074937","url":null,"abstract":": In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data’s volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"159 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Current Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1