首页 > 最新文献

Current Bioinformatics最新文献

英文 中文
Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data 基于多组学统计数据的基因型与表型关联分析
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-07 DOI: 10.2174/0115748936276861240109045208
Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang
Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.
背景:在利用临床数据进行多组学分析时,由于患者隐私的保护、各机构对数据管理的要求以及各组学数据特征相对较多等原因,存在组学数据类型不够多、样本量相对较小等问题。本文介绍了在没有临床数据的情况下,利用统计数据对多组学通路关系进行分析的方法。方法:我们提出了一种利用公共数据库中易于获取的统计数据的新方法。这种方法引入了临床数据中未包含的表型关联,并利用这些数据构建了一个三层异构网络。为简化分析,我们将三层网络分解为双层网络,以预测层间关联的权重。通过添加一个超参数 β,合并两层网络的权重,然后使用 k 倍交叉验证来评估这种方法的准确性。在计算两层网络的权重时,将具有固定重启概率的 RWR 与 PBMDA 和 CIPHER 结合起来,生成了具有偏置权重的 PCRWR,并提高了准确性。结果带有初始权重的 RWR 的接收器工作特征曲线下面积增加了约 7%。结论利用多组学统计数据建立基因型和表型相关网络进行分析,其效果与临床多组学分析相似。
{"title":"Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data","authors":"Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang","doi":"10.2174/0115748936276861240109045208","DOIUrl":"https://doi.org/10.2174/0115748936276861240109045208","url":null,"abstract":"Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139760088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration 通过深度学习和蛋白质二级结构整合加强药物与靶点的结合亲和力预测
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-07 DOI: 10.2174/0115748936285519240110070209
Runhua Zhang, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu
Background: Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets. Objective: This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins. Methods: In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively Results: We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset. Conclusion: In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.
背景:传统的药物发现方法通常具有过程冗长、成本高昂的特点。为了加快新药的发现,人工智能(AI)在预测药物与靶点结合亲和力(DTA)方面的整合已成为一种重要方法。尽管用于 DTA 预测的深度学习方法层出不穷,但其中许多方法主要集中于蛋白质的氨基酸序列。然而,药物化合物与靶点之间的相互作用发生在蛋白质结构的不同片段中,而主序列主要捕捉的是蛋白质的整体特征。因此,这种方法无法完全阐明药物与各自靶标之间错综复杂的关系。研究目的本研究旨在采用先进的深度学习技术预测 DTA,同时纳入蛋白质二级结构的相关信息。研究方法在我们的研究中,蛋白质的一级序列和二级结构都被用来表示蛋白质。一级序列是总体特征,二级结构则是局部特征。我们利用卷积神经网络和图神经网络对目标蛋白质和药物化合物的复杂特征进行独立建模。这种方法提高了我们更有效地捕捉药物与目标相互作用的能力:我们推出了一种预测 DTA 的新方法。与 DeepDTA 相比,我们的方法有了显著提高,在 KIBA 数据集上进行评估时,一致性指数 (CI) 提高了 3.9%,平均平方误差 (MSE) 显著降低了 34%。结论总之,我们的研究结果清楚地表明,通过将蛋白质的二级结构作为局部特征来增强 DTA 预测,与仅仅依赖一级结构相比,准确率有了显著提高。
{"title":"Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration","authors":"Runhua Zhang, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu","doi":"10.2174/0115748936285519240110070209","DOIUrl":"https://doi.org/10.2174/0115748936285519240110070209","url":null,"abstract":"Background: Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets. Objective: This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins. Methods: In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively Results: We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset. Conclusion: In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139759749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring Gene Regulatory Networks from Single-Cell Time-Course Data Based on Temporal Convolutional Networks 基于时序卷积网络从单细胞时程数据推断基因调控网络
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-04 DOI: 10.2174/0115748936282613231211112920
Dayu Tan, Jing Wang, Zhaolong Cheng, Yansen Su, Chunhou Zheng
Objective: This work aims to infer causal relationships between genes and construct dynamic gene regulatory networks using time-course scRNA-seq data. Methods: We propose an analytical method for inferring GRNs from single-cell time-course data based on temporal convolutional networks (scTGRN), which provides a supervised learning approach to infer causal relationships among genes. scTGRN constructs a 4D tensor representing gene expression features for each gene pair, then inputs the constructed 4D tensor into the temporal convolutional network to train and infer the causal relationship between genes. Results: We validate the performance of scTGRN on five real datasets and four simulated datasets, and the experimental results show that scTGRN outperforms existing models in constructing GRNs. In addition, we test the performance of scTGRN on gene function assignment, and scTGRN outperforms other models. Conclusion: The analysis shows that scTGRN can not only accurately identify the causal relationship between genes, but also can be used to achieve gene function assignment.
目的:本研究旨在利用时序 scRNA-seq 数据推断基因之间的因果关系并构建动态基因调控网络。方法:我们提出了一种基于时序卷积网络(scTGRN)从单细胞时序数据中推断基因调控网络(GRN)的分析方法,该方法提供了一种监督学习方法来推断基因之间的因果关系。结果我们在五个真实数据集和四个模拟数据集上验证了 scTGRN 的性能,实验结果表明 scTGRN 在构建 GRN 方面优于现有模型。此外,我们还测试了 scTGRN 在基因功能分配方面的性能,结果表明 scTGRN 优于其他模型。结论分析表明,scTGRN 不仅能准确识别基因之间的因果关系,还能用于实现基因功能分配。
{"title":"Inferring Gene Regulatory Networks from Single-Cell Time-Course Data Based on Temporal Convolutional Networks","authors":"Dayu Tan, Jing Wang, Zhaolong Cheng, Yansen Su, Chunhou Zheng","doi":"10.2174/0115748936282613231211112920","DOIUrl":"https://doi.org/10.2174/0115748936282613231211112920","url":null,"abstract":"Objective: This work aims to infer causal relationships between genes and construct dynamic gene regulatory networks using time-course scRNA-seq data. Methods: We propose an analytical method for inferring GRNs from single-cell time-course data based on temporal convolutional networks (scTGRN), which provides a supervised learning approach to infer causal relationships among genes. scTGRN constructs a 4D tensor representing gene expression features for each gene pair, then inputs the constructed 4D tensor into the temporal convolutional network to train and infer the causal relationship between genes. Results: We validate the performance of scTGRN on five real datasets and four simulated datasets, and the experimental results show that scTGRN outperforms existing models in constructing GRNs. In addition, we test the performance of scTGRN on gene function assignment, and scTGRN outperforms other models. Conclusion: The analysis shows that scTGRN can not only accurately identify the causal relationship between genes, but also can be used to achieve gene function assignment.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139689088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers 通过正向无标记学习和 KL 正则,基于变换器的临床癌症药物毒性命名实体识别技术
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-04 DOI: 10.2174/0115748936278299231213045441
Weixin Xie, Jiayu Xu, Chengkui Zhao, Jin Li, Shuangze Han, Tianyu Shao, Limei Wang, Weixing Feng
Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient. background: With increasing rates of polypharmacy, clinical drug toxicity has been closely monitored. Named Entity Recognition (NER) is a vital task for extracting valuable drug safety information from biomedical literature. Recently, many deep learning models in biomedical domain have made great progress for NER, especially pre-trained language models. However, these NER methods require large amounts of high-quality manually annotated data with named entities, which is labor intensive and inefficient. Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries. objective: To improve the performance of prediction Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships. method: In this work, instead of relying on the manually labeled data, a transformer-based Positive-Unlabeled Learning (PULearning) is proposed with adaptive learning and applied on the clinical cancer drug toxicity corpus. To improve the precision of prediction, relative position embeddings are used in transformer encoder. And then, a mixed loss is designed with two Kullback-Leibler (KL) regularizers for PULearning assumptions. Through adaptive sampling, our approach meets the expected performance for NER task only using unlabeled data and named entity dictionaries. result: The overall NER performance of our model obtains 0
背景:随着多种药物使用率的增加,对临床药物毒性的警惕性监测已成为一个重要的关注点。命名实体识别(NER)是一项不可或缺的工作,对于从生物医学文献中提取有关药物安全性的宝贵见解至关重要。近年来,深度学习模型在 NER 任务方面取得了重大进展。然而,这些核糖核酸(NER)技术的有效性依赖于大量注释数据的可用性,而这是一项劳动密集型且效率低下的工作:随着多种药物的使用率越来越高,临床药物毒性受到了密切关注。命名实体识别(NER)是从生物医学文献中提取有价值的药物安全性信息的重要任务。最近,生物医学领域的许多深度学习模型在 NER 方面取得了很大进展,尤其是预训练语言模型。然而,这些 NER 方法需要大量高质量的人工标注命名实体的数据,劳动强度大且效率低。方法本研究引入了一种新方法,与传统的依赖人工标注数据的方法不同。它采用了一种被称为正向无标注学习(PULearning)的基于转换器的技术,该技术结合了自适应学习,并应用于临床癌症药物毒性语料库。为了提高预测精度,我们在变换器编码器中采用了相对位置嵌入技术。此外,我们还制定了一个复合损失函数,其中整合了两个库尔巴克-莱伯勒(KL)正则,以符合 PULearning 假设。结果表明,我们的方法仅依靠未标注数据和命名实体字典就能实现 NER 任务的目标性能:提高预测性能 结论:我们的模型实现了整体 NER 性能的提高:我们的模型实现了整体 NER 性能,F1 为 0.819。具体来说,它对 DRUG、CANCER 和 TOXI 实体的 F1 分别为 0.841、0.801 和 0.815。对结果的综合分析验证了我们的方法与现有的 PULearning 方法相比在生物医学 NER 任务中的有效性。此外,我们还提供了三个已识别实体之间关联的可视化方法,为查询它们之间的相互关系提供了有价值的参考:在这项工作中,我们提出了一种基于转换器的正向无标注学习(PULearning)方法,并将其应用于临床癌症药物毒性语料库。为了提高预测精度,在变换器编码器中使用了相对位置嵌入。然后,针对 PULearning 假设,设计了带有两个 Kullback-Leibler (KL) 正则的混合损失。通过自适应采样,我们的方法仅在使用未标记数据和命名实体词典的情况下就达到了 NER 任务的预期性能:我们模型的总体 NER 性能获得了 0.819 的 F1 分数,而在 DRUG、CANCER 和 TOXI 上的 F1 分数分别为 0.841、0.801 和 0.815:无
{"title":"Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers","authors":"Weixin Xie, Jiayu Xu, Chengkui Zhao, Jin Li, Shuangze Han, Tianyu Shao, Limei Wang, Weixing Feng","doi":"10.2174/0115748936278299231213045441","DOIUrl":"https://doi.org/10.2174/0115748936278299231213045441","url":null,"abstract":"Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient. background: With increasing rates of polypharmacy, clinical drug toxicity has been closely monitored. Named Entity Recognition (NER) is a vital task for extracting valuable drug safety information from biomedical literature. Recently, many deep learning models in biomedical domain have made great progress for NER, especially pre-trained language models. However, these NER methods require large amounts of high-quality manually annotated data with named entities, which is labor intensive and inefficient. Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries. objective: To improve the performance of prediction Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships. method: In this work, instead of relying on the manually labeled data, a transformer-based Positive-Unlabeled Learning (PULearning) is proposed with adaptive learning and applied on the clinical cancer drug toxicity corpus. To improve the precision of prediction, relative position embeddings are used in transformer encoder. And then, a mixed loss is designed with two Kullback-Leibler (KL) regularizers for PULearning assumptions. Through adaptive sampling, our approach meets the expected performance for NER task only using unlabeled data and named entity dictionaries. result: The overall NER performance of our model obtains 0","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers DeepPTM:通过将深度蛋白质语言模型与视觉变换器相结合,从蛋白质序列预测蛋白质翻译后修饰
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-02 DOI: 10.2174/0115748936283134240109054157
Necla Nisa Soylu, Emre Sefer
Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
简介::最近的自监督深度语言模型,如来自变换器的双向编码器表示(BERT),通过上下文词嵌入以获得更好的动态表示,在一些语言任务中表现最佳。它们的蛋白质特定版本(如 ProtBERT)生成了动态蛋白质序列嵌入,从而在一些生物信息学任务中取得了更好的性能。此外,一些不同的蛋白质翻译后修饰在细胞任务(如发育和分化)中非常突出。目前的生物实验可以检测这些修饰,但持续时间较长,成本较高。方法为了更简洁、更快速地理解伴随的生物过程,我们在本文中提出了 DEEPPTM,以更高效地从蛋白质序列中预测蛋白质翻译后修饰(PTM)位点。与现有方法不同,DEEPPTM 通过整合基于 ProtBERT 的专业蛋白质嵌入和基于注意力的视觉转换器(ViT),提高了修饰预测性能,并揭示了不同修饰类型与蛋白质序列内容之间的关联。此外,它还能推断出不同物种的多种不同修饰。结果应用 10 倍交叉验证后,人类和小鼠琥珀酰化修饰预测的 ROC AUC 分别为 0.988 和 0.965。同样,我们在推断泛素化、巴豆酰化和糖化位点时也分别获得了 0.982、0.955 和 0.953 的 ROC AUC 分数。根据详细的计算实验,DEEPPTM 减少了实验室实验所花费的时间,同时在推断所有 4 个修饰位点方面优于竞争方法和基线方法。在我们的案例中,与传统的深度学习和机器学习技术相比,基于注意力的深度学习方法(如视觉转换器)更有利于从 ProtBERT 特征中学习。结论此外,在 PTM 预测任务中,蛋白质特异性 ProtBERT 模型比原始 BERT 嵌入更有效。我们的代码和数据集见 https://github.com/seferlab/deepptm。
{"title":"DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers","authors":"Necla Nisa Soylu, Emre Sefer","doi":"10.2174/0115748936283134240109054157","DOIUrl":"https://doi.org/10.2174/0115748936283134240109054157","url":null,"abstract":"Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer STNMDA:利用结构感知变压器预测潜在微生物与药物关联的新型模型
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-02 DOI: 10.2174/0115748936272939231212102627
Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu
Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.
导言:微生物与许多疾病的生理和病理过程密切相关。临床上迫切需要新药来防治微生物引起的疾病。因此,预测潜在的微生物-药物关联对于疾病治疗和新药发现都至关重要。然而,通过传统的湿实验室方法来验证这些关系既费钱又费时。方法:我们提出了一种高效的计算模型 STNMDA,它集成了结构感知转换器(SAT)和深度神经网络(DNN)分类器,用于推断潜在的微生物药物关联。STNMDA 首先采用 "重启随机漫步 "方法,利用微生物和药物的高斯核相似性和功能相似性度量构建异构网络。然后将该异构网络输入 SAT,以提取每个药物和微生物节点的属性特征和图结构。最后,DNN 分类器计算微生物与药物之间的关联概率。结果广泛的实验结果表明,STNMDA 在 MDAD 和 aBiofilm 数据库上的性能超过了现有的最先进模型。此外,通过案例验证,证明了 STNMDA 在确认微生物与药物之间关联方面的可行性。结论因此,STNMDA有望成为未来预测微生物与药物关联的重要工具。
{"title":"STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer","authors":"Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu","doi":"10.2174/0115748936272939231212102627","DOIUrl":"https://doi.org/10.2174/0115748936272939231212102627","url":null,"abstract":"Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs P4PC:piRNA 和 circRNA 生物信息学资源门户网站
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-02 DOI: 10.2174/0115748936289420240117100823
Yajun Liu, Ru Li, Yulian Ding, Xin Hong Hei, Fang-Xiang Wu
Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.
背景:PIWI互作RNA(piRNA)和环状RNA(circRNA)是两种非编码RNA(ncRNA),它们在表观遗传调控、转录调控、转录后调控等多种生物过程中发挥着重要作用。虽然目前已有各种资源,但如何为特定的 ncRNA 研究项目选择这些资源仍是一项挑战。方法:为了帮助研究人员找到合适的生物信息学资源来研究 ncRNAs,我们创建了一个名为 P4PC 的新门户网站,提供 piRNAs 和 circRNAs 的计算工具和数据源。结果:P4PC 中人工编辑了 249 种计算工具、126 个数据库和 420 篇论文。P4PC 中的所有条目分为 5 组和 26 个子组。每组的第一页都有资源清单汇总 结论:用户可以根据自己的研究建议,通过查看 P4PC 中的详细信息和评论,快速为自己的研究项目选择合适的资源。数据库网址为 http://www.ibiomedical.net/Portal4PC/ 和 http://43.138.46.5:8080/Portal4PC/。
{"title":"P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs","authors":"Yajun Liu, Ru Li, Yulian Ding, Xin Hong Hei, Fang-Xiang Wu","doi":"10.2174/0115748936289420240117100823","DOIUrl":"https://doi.org/10.2174/0115748936289420240117100823","url":null,"abstract":"Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences 增强 DNA 序列中蛋白质编码区识别的改进型混合方法
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-02-01 DOI: 10.2174/0115748936287244240117065325
Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie
Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.
简介识别和预测 DNA 序列中的蛋白质编码区在基因组研究中起着举足轻重的作用。本文介绍了一种识别 DNA 序列中蛋白质编码区的方法,该方法采用了一种混合方法,将数字带通滤波器与小波变换和各种光谱估算技术相结合,以提高外显子预测能力。具体来说,该方法采用了哈尔和道贝奇斯小波变换来提高蛋白质编码区(外显子)预测的准确性,从而能够提取原始 DNA 序列中可能被掩盖的复杂细节:DNA 序列中蛋白质编码区的识别和预测在基因组研究中起着举足轻重的作用。方法:这项研究展示了哈尔和道贝奇斯小波变换、非参数和参数谱估计方法的实用性,以及数字带通滤波器在检测外显子区域峰值方面的应用。此外,应用电子-离子相互作用势(EIIP)方法将符号 DNA 序列转换为数值,以及利用具有优化参数的总和-正弦曲线(SoS)数学模型,进一步丰富了 DNA 序列分析工具箱,确保所提出的方法能够成功地对 DNA 序列进行优化建模并准确识别基因:利用小波变换加强 DNA 序列中蛋白质编码区的识别 结果:该方法的结果表明,蛋白质编码区的识别准确率大幅提高。在峰值位置检测方面,应用 Haar 和 Daubechies 小波变换可将峰值定位精度提高约 (0.01, 3-5 dB)。在采用非参数和参数频谱估计技术时,与原始信号相比,峰值定位精度提高了约 (0.01, 4 dB)。方法:将数字带通滤波器、小波变换和各种频谱估计技术相结合的混合方法,以提高外显子预测能力。结论:这些发现不仅弥补了 DNA 序列分析中的不足,还为基因组学研究中的外显子区域预测和基因鉴定提供了一条前景广阔的途径。所提出的混合方法是对不断发展的基因组分析技术的有力贡献:通过该方法获得的结果表明,识别的准确性显著提高。这些发现为 DNA 序列分析、外显子区域预测和基因鉴定提供了一条前景广阔的途径。
{"title":"Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences","authors":"Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie","doi":"10.2174/0115748936287244240117065325","DOIUrl":"https://doi.org/10.2174/0115748936287244240117065325","url":null,"abstract":"Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review 深度学习辅助药物发现方法的进展:自我回顾
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-01-29 DOI: 10.2174/0115748936285690240101041704
Haiping Zhang, Konda Mani Saravanan
Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.
人工智能是计算机科学的一个领域,致力于复制人脑固有的复杂结构和运行机制。机器学习是人工智能的一个子领域,侧重于通过分析训练数据来开发模型。深度学习是人工智能中一个独特的子领域,其特点是使用跨多层的几何变换模型。深度学习在包括健康和生命科学在内的各个领域都大有可为。近来,深度学习已成功应用于药物发现领域。在这篇自述中,我们介绍了借助深度学习开发的最新方法。目的是简要介绍我们小组目前在药物发现方面取得的前沿进展。我们系统地讨论了基于深度学习开发的模型(如 Deep-BindBC、DeepPep 和 DeepBindRG)的实验证据和概念验证实例。这些进展不仅揭示了现有的挑战,也强调了未来药物发现和开发进展的成就和前景。
{"title":"Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review","authors":"Haiping Zhang, Konda Mani Saravanan","doi":"10.2174/0115748936285690240101041704","DOIUrl":"https://doi.org/10.2174/0115748936285690240101041704","url":null,"abstract":"Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data FMDVSerPred:利用 VP1 核苷酸序列数据对亚洲流行的口蹄疫病毒进行分类和血清型预测的新型计算解决方案
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2024-01-29 DOI: 10.2174/0115748936278851231213110653
Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh
Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.
背景:口蹄疫病毒有三种血清型在亚洲流行,通常通过血清学检测进行鉴定。此类检测耗时较长,而且需要生物隔离设施来执行检测。据我们所知,文献中还没有预测口蹄疫病毒血清型的计算解决方案。因此,我们迫切需要便于使用的工具来进行口蹄疫病毒血清型鉴定。方法:我们提出了一种基于机器学习模型的计算解决方案,用于口蹄疫病毒分类和血清型预测。此外,该方法还采用了各种数据预处理技术,以便更好地进行模型预测。我们使用了从印度和其他七个亚洲口蹄疫流行国家报告的 2509 株口蹄疫病毒分离物的序列数据进行模型训练、测试和验证。我们还通过收集印度报告的 12 个病毒分离物并对其进行测序,在湿实验室设置中研究了所开发计算解决方案的实用性。在此,我们将计算解决方案应用于两个用户友好型工具,即在线网络预测服务器 (https://nifmd-bbf.icar.gov.in/FMDVSerPred) 和 R 统计软件包 (https://github.com/sam-dfmd/FMDVSerPred)。结果:计算解决方案中采用了随机森林机器学习模型,在十个独立测试数据集上进行评估时,该模型的表现优于其他七个机器学习模型。此外,所开发的计算解决方案在测试数据上的验证准确率高达 99.87%,在来自亚洲国家(包括印度及其七个邻国)的独立数据上的验证准确率高达 98.64% 和 90.24%。此外,我们的方法还成功地用于预测印度各地报告的现场口蹄疫病毒分离物的血清型。结论因此,高通量测序与机器学习相结合为口蹄疫病毒血清型鉴定提供了一种前景广阔的解决方案。
{"title":"FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data","authors":"Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh","doi":"10.2174/0115748936278851231213110653","DOIUrl":"https://doi.org/10.2174/0115748936278851231213110653","url":null,"abstract":"Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Current Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1