Pub Date : 2024-02-02DOI: 10.2174/0115748936283134240109054157
Necla Nisa Soylu, Emre Sefer
Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
{"title":"DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers","authors":"Necla Nisa Soylu, Emre Sefer","doi":"10.2174/0115748936283134240109054157","DOIUrl":"https://doi.org/10.2174/0115748936283134240109054157","url":null,"abstract":"Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"3 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-02DOI: 10.2174/0115748936272939231212102627
Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu
Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.
{"title":"STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer","authors":"Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu","doi":"10.2174/0115748936272939231212102627","DOIUrl":"https://doi.org/10.2174/0115748936272939231212102627","url":null,"abstract":"Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"9 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.
{"title":"P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs","authors":"Yajun Liu, Ru Li, Yulian Ding, Xin Hong Hei, Fang-Xiang Wu","doi":"10.2174/0115748936289420240117100823","DOIUrl":"https://doi.org/10.2174/0115748936289420240117100823","url":null,"abstract":"Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"39 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01DOI: 10.2174/0115748936287244240117065325
Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie
Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.
简介识别和预测 DNA 序列中的蛋白质编码区在基因组研究中起着举足轻重的作用。本文介绍了一种识别 DNA 序列中蛋白质编码区的方法,该方法采用了一种混合方法,将数字带通滤波器与小波变换和各种光谱估算技术相结合,以提高外显子预测能力。具体来说,该方法采用了哈尔和道贝奇斯小波变换来提高蛋白质编码区(外显子)预测的准确性,从而能够提取原始 DNA 序列中可能被掩盖的复杂细节:DNA 序列中蛋白质编码区的识别和预测在基因组研究中起着举足轻重的作用。方法:这项研究展示了哈尔和道贝奇斯小波变换、非参数和参数谱估计方法的实用性,以及数字带通滤波器在检测外显子区域峰值方面的应用。此外,应用电子-离子相互作用势(EIIP)方法将符号 DNA 序列转换为数值,以及利用具有优化参数的总和-正弦曲线(SoS)数学模型,进一步丰富了 DNA 序列分析工具箱,确保所提出的方法能够成功地对 DNA 序列进行优化建模并准确识别基因:利用小波变换加强 DNA 序列中蛋白质编码区的识别 结果:该方法的结果表明,蛋白质编码区的识别准确率大幅提高。在峰值位置检测方面,应用 Haar 和 Daubechies 小波变换可将峰值定位精度提高约 (0.01, 3-5 dB)。在采用非参数和参数频谱估计技术时,与原始信号相比,峰值定位精度提高了约 (0.01, 4 dB)。方法:将数字带通滤波器、小波变换和各种频谱估计技术相结合的混合方法,以提高外显子预测能力。结论:这些发现不仅弥补了 DNA 序列分析中的不足,还为基因组学研究中的外显子区域预测和基因鉴定提供了一条前景广阔的途径。所提出的混合方法是对不断发展的基因组分析技术的有力贡献:通过该方法获得的结果表明,识别的准确性显著提高。这些发现为 DNA 序列分析、外显子区域预测和基因鉴定提供了一条前景广阔的途径。
{"title":"Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences","authors":"Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie","doi":"10.2174/0115748936287244240117065325","DOIUrl":"https://doi.org/10.2174/0115748936287244240117065325","url":null,"abstract":"Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"7 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.2174/0115748936285690240101041704
Haiping Zhang, Konda Mani Saravanan
Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.
{"title":"Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review","authors":"Haiping Zhang, Konda Mani Saravanan","doi":"10.2174/0115748936285690240101041704","DOIUrl":"https://doi.org/10.2174/0115748936285690240101041704","url":null,"abstract":"Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"175 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.2174/0115748936278851231213110653
Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh
Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.
{"title":"FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data","authors":"Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh","doi":"10.2174/0115748936278851231213110653","DOIUrl":"https://doi.org/10.2174/0115748936278851231213110653","url":null,"abstract":"Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"38 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.2174/0115748936284973240105115444
Lei Chen, Linyang Li
Background:: Drug repositioning now is an important research area in drug discovery as it can accelerate the procedures of discovering novel effects of existing drugs. However, it is challenging to screen out possible effects for given drugs. Designing computational methods are a quick and cheap way to complete this task. Most existing computational methods infer the relationships between drugs and diseases. The pathway-based disease classification reported in KEGG provides us a new way to investigate drug repositioning as such classification can be applied to drugs. A predicted class of a given drug suggests latent diseases it can treat. Objective:: The purpose of this study is to set up efficient multi-label classifiers to predict the classes of drugs. Method:: We adopt three types of drug information to generate drug features, including drug pathway information, label information and drug network. For the first two types, drugs are first encoded into binary vectors, which are further processed by singular value decomposition. For the third type, the network embedding algorithm, Mashup, is employed to yield drug features. Above features are combined and fed into RAndom k-labELsets (RAKEL) to construct multi-label classifiers, where support vector machine is selected as the base classification algorithm. Results:: The ten-fold cross-validation results show that the classifiers provide high performance with accuracy higher than 0.95 and absolute true higher than 0.92. The case study indicates the novel effects of three drugs, i.e., they may treat new diseases. Conclusion:: The proposed classifiers have high performance and are superiority to the classifiers with other classic algorithms and drug information. Furthermore, they have the ability to discover new effects of drugs.
背景目前,药物重新定位是药物发现的一个重要研究领域,因为它可以加快发现现有药物新作用的程序。然而,要筛选出特定药物的可能作用是一项挑战。设计计算方法是完成这项任务的快速而廉价的途径。现有的大多数计算方法都是推断药物与疾病之间的关系。KEGG 中报告的基于通路的疾病分类为我们提供了一种研究药物重新定位的新方法,因为这种分类可以应用于药物。某种药物的预测类别暗示了它可以治疗的潜在疾病。研究目的本研究的目的是建立高效的多标签分类器来预测药物类别。方法:我们采用三种药物信息来生成药物特征,包括药物路径信息、标签信息和药物网络。对于前两种类型,首先将药物编码为二进制向量,然后对其进行奇异值分解处理。对于第三种类型,则采用网络嵌入算法 Mashup 来生成药物特征。上述特征经组合后输入 RAndom k-labELsets (RAKEL) 以构建多标签分类器,并选择支持向量机作为基础分类算法。结果十倍交叉验证结果表明,分类器具有较高的性能,准确率高于 0.95,绝对真实度高于 0.92。案例研究表明了三种药物的新作用,即它们可以治疗新的疾病。结论所提出的分类器具有很高的性能,优于使用其他经典算法和药物信息的分类器。此外,它们还具有发现药物新功效的能力。
{"title":"Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs","authors":"Lei Chen, Linyang Li","doi":"10.2174/0115748936284973240105115444","DOIUrl":"https://doi.org/10.2174/0115748936284973240105115444","url":null,"abstract":"Background:: Drug repositioning now is an important research area in drug discovery as it can accelerate the procedures of discovering novel effects of existing drugs. However, it is challenging to screen out possible effects for given drugs. Designing computational methods are a quick and cheap way to complete this task. Most existing computational methods infer the relationships between drugs and diseases. The pathway-based disease classification reported in KEGG provides us a new way to investigate drug repositioning as such classification can be applied to drugs. A predicted class of a given drug suggests latent diseases it can treat. Objective:: The purpose of this study is to set up efficient multi-label classifiers to predict the classes of drugs. Method:: We adopt three types of drug information to generate drug features, including drug pathway information, label information and drug network. For the first two types, drugs are first encoded into binary vectors, which are further processed by singular value decomposition. For the third type, the network embedding algorithm, Mashup, is employed to yield drug features. Above features are combined and fed into RAndom k-labELsets (RAKEL) to construct multi-label classifiers, where support vector machine is selected as the base classification algorithm. Results:: The ten-fold cross-validation results show that the classifiers provide high performance with accuracy higher than 0.95 and absolute true higher than 0.92. The case study indicates the novel effects of three drugs, i.e., they may treat new diseases. Conclusion:: The proposed classifiers have high performance and are superiority to the classifiers with other classic algorithms and drug information. Furthermore, they have the ability to discover new effects of drugs.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"222 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-26DOI: 10.2174/0115748936279561231214072041
Jiacheng Wang, Lei Yuan
Background: The advent of single-cell RNA sequencing (scRNA-seq) technology has offered unprecedented opportunities to unravel cellular heterogeneity and functions. Yet, despite its success in unraveling gene expression heterogeneity, accurately identifying and interpreting alternative splicing events from scRNA-seq data remains a formidable challenge. With advancing technology and algorithmic innovations, the prospect of accurately identifying alternative splicing events from scRNA-seq data is becoming increasingly promising Objective: This perspective aims to uncover the intricacies of splicing at the single-cell level and their potential implications for health and disease. It seeks to harness scRNA-seq's transformative power in revealing cell-specific alternative splicing dynamics and aims to propel our understanding of gene regulation within individual cells to new heights. Methods: The perspective grounds its method on recent literature along with the experimental protocols of single-cell RNA-seq and methods to identify and quantify the alternative splicing events from scRNA-seq data. Results: This perspective outlines the promising potential, challenges, and methodologies for leveraging different scRNA-seq technologies to identify and study alternative splicing events, with a focus on advancing our understanding of gene regulation at the single-cell level. Conclusion: This perspective explores the prospects of utilizing scRNA-seq data to identify and study alternative splicing events, highlighting their potential, challenges, methodologies, biological insights, and future directions.
{"title":"Prospects of Identifying Alternative Splicing Events from Single-Cell RNA Sequencing Data","authors":"Jiacheng Wang, Lei Yuan","doi":"10.2174/0115748936279561231214072041","DOIUrl":"https://doi.org/10.2174/0115748936279561231214072041","url":null,"abstract":"Background: The advent of single-cell RNA sequencing (scRNA-seq) technology has offered unprecedented opportunities to unravel cellular heterogeneity and functions. Yet, despite its success in unraveling gene expression heterogeneity, accurately identifying and interpreting alternative splicing events from scRNA-seq data remains a formidable challenge. With advancing technology and algorithmic innovations, the prospect of accurately identifying alternative splicing events from scRNA-seq data is becoming increasingly promising Objective: This perspective aims to uncover the intricacies of splicing at the single-cell level and their potential implications for health and disease. It seeks to harness scRNA-seq's transformative power in revealing cell-specific alternative splicing dynamics and aims to propel our understanding of gene regulation within individual cells to new heights. Methods: The perspective grounds its method on recent literature along with the experimental protocols of single-cell RNA-seq and methods to identify and quantify the alternative splicing events from scRNA-seq data. Results: This perspective outlines the promising potential, challenges, and methodologies for leveraging different scRNA-seq technologies to identify and study alternative splicing events, with a focus on advancing our understanding of gene regulation at the single-cell level. Conclusion: This perspective explores the prospects of utilizing scRNA-seq data to identify and study alternative splicing events, highlighting their potential, challenges, methodologies, biological insights, and future directions.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-25DOI: 10.2174/0115748936276510231123121404
Jay Shree Mathivanan, Victor Violet Dhayabaran, Mary Rajathei David, Muthugobal Bagayalakshmi Karuna Nidhi, Karuppasamy Muthuvel Prasath, Suvaiyarasan Suvaithenamudhan
: Computer-aided drug design has an important role in drug development and design. It has become a thriving area of research in the pharmaceutical industry to accelerate the drug discovery process. Deep learning, a subdivision of artificial intelligence, is widely applied to advance new drug development and design opportunities. This article reviews the recent technology that uses deep learning techniques to ameliorate the understanding of drug-target interactions in computer-aided drug discovery based on the prior knowledge acquired from various literature. In general, deep learning models can be trained to predict the binding affinity between the protein-ligand complexes and protein structures or generate protein-ligand complexes in structure-based drug discovery. In other words, artificial neural networks and deep learning algorithms, especially graph convolutional neural networks and generative adversarial networks, can be applied to drug discovery. Graph convolutional neural network effectively captures the interactions and structural information between atoms and molecules, which can be enforced to predict the binding affinity between protein and ligand. Also, the ligand molecules with the desired properties can be generated using generative adversarial networks.
{"title":"Application of Deep Learning Neural Networks in Computer-aided Drug Discovery: A Review","authors":"Jay Shree Mathivanan, Victor Violet Dhayabaran, Mary Rajathei David, Muthugobal Bagayalakshmi Karuna Nidhi, Karuppasamy Muthuvel Prasath, Suvaiyarasan Suvaithenamudhan","doi":"10.2174/0115748936276510231123121404","DOIUrl":"https://doi.org/10.2174/0115748936276510231123121404","url":null,"abstract":": Computer-aided drug design has an important role in drug development and design. It has become a thriving area of research in the pharmaceutical industry to accelerate the drug discovery process. Deep learning, a subdivision of artificial intelligence, is widely applied to advance new drug development and design opportunities. This article reviews the recent technology that uses deep learning techniques to ameliorate the understanding of drug-target interactions in computer-aided drug discovery based on the prior knowledge acquired from various literature. In general, deep learning models can be trained to predict the binding affinity between the protein-ligand complexes and protein structures or generate protein-ligand complexes in structure-based drug discovery. In other words, artificial neural networks and deep learning algorithms, especially graph convolutional neural networks and generative adversarial networks, can be applied to drug discovery. Graph convolutional neural network effectively captures the interactions and structural information between atoms and molecules, which can be enforced to predict the binding affinity between protein and ligand. Also, the ligand molecules with the desired properties can be generated using generative adversarial networks.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"21 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-24DOI: 10.2174/0115748936284044240108074937
Chandrashekar K, Vidya Niranjan, Adarsh Vishal, Anagha S Setlur
: In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data’s volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.
:在当前的基因组学和生物医学研究领域,人工智能(AI)、机器学习(ML)和深度学习(DL)的应用已成为范式的转变者。虽然传统的 NGS DNA 和 RNA 测序分析流水线在解码遗传信息方面表现出色,但测序数据的数量和复杂性却激增。人们需要更高效、更准确的分析方法。这导致了对人工智能/ML 和 DL 方法的依赖。本文重点介绍了这些工具方法,以缓解局限性并产生更好的结果。在管道自动化的帮助下,将这些工具集成到 NGS DNA 和 RNA-seq 管道中,我们可以提高研究质量,因为可以使用深度学习工具处理大型数据集。自动化有助于减少劳动密集型任务,帮助研究人员专注于其他前沿研究。在传统流水线中,从质量检查到 SNP 检测中的变异识别,所有任务都需要大量的计算时间,研究人员还必须手动输入代码,以防止人为手动错误,但借助自动化的力量,我们可以在相对较短的时间内顺利完成整个流程,因为自动化流水线可以运行多个文件,而不是传统流水线中的单个文件。总之,本综述论文揭示了将 DL 集成到传统管道中的变革性影响及其在优化计算时间方面的作用。此外,它还强调了人工智能驱动的解决方案在推进基因组学研究和实现数据密集型生物医学应用方面日益增长的重要性。
{"title":"Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data","authors":"Chandrashekar K, Vidya Niranjan, Adarsh Vishal, Anagha S Setlur","doi":"10.2174/0115748936284044240108074937","DOIUrl":"https://doi.org/10.2174/0115748936284044240108074937","url":null,"abstract":": In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data’s volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"159 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}