Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.
{"title":"Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data","authors":"Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang","doi":"10.2174/0115748936276861240109045208","DOIUrl":"https://doi.org/10.2174/0115748936276861240109045208","url":null,"abstract":"Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139760088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets. Objective: This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins. Methods: In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively Results: We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset. Conclusion: In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.
{"title":"Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration","authors":"Runhua Zhang, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu","doi":"10.2174/0115748936285519240110070209","DOIUrl":"https://doi.org/10.2174/0115748936285519240110070209","url":null,"abstract":"Background: Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets. Objective: This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins. Methods: In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively Results: We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset. Conclusion: In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139759749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-04DOI: 10.2174/0115748936282613231211112920
Dayu Tan, Jing Wang, Zhaolong Cheng, Yansen Su, Chunhou Zheng
Objective: This work aims to infer causal relationships between genes and construct dynamic gene regulatory networks using time-course scRNA-seq data. Methods: We propose an analytical method for inferring GRNs from single-cell time-course data based on temporal convolutional networks (scTGRN), which provides a supervised learning approach to infer causal relationships among genes. scTGRN constructs a 4D tensor representing gene expression features for each gene pair, then inputs the constructed 4D tensor into the temporal convolutional network to train and infer the causal relationship between genes. Results: We validate the performance of scTGRN on five real datasets and four simulated datasets, and the experimental results show that scTGRN outperforms existing models in constructing GRNs. In addition, we test the performance of scTGRN on gene function assignment, and scTGRN outperforms other models. Conclusion: The analysis shows that scTGRN can not only accurately identify the causal relationship between genes, but also can be used to achieve gene function assignment.
{"title":"Inferring Gene Regulatory Networks from Single-Cell Time-Course Data Based on Temporal Convolutional Networks","authors":"Dayu Tan, Jing Wang, Zhaolong Cheng, Yansen Su, Chunhou Zheng","doi":"10.2174/0115748936282613231211112920","DOIUrl":"https://doi.org/10.2174/0115748936282613231211112920","url":null,"abstract":"Objective: This work aims to infer causal relationships between genes and construct dynamic gene regulatory networks using time-course scRNA-seq data. Methods: We propose an analytical method for inferring GRNs from single-cell time-course data based on temporal convolutional networks (scTGRN), which provides a supervised learning approach to infer causal relationships among genes. scTGRN constructs a 4D tensor representing gene expression features for each gene pair, then inputs the constructed 4D tensor into the temporal convolutional network to train and infer the causal relationship between genes. Results: We validate the performance of scTGRN on five real datasets and four simulated datasets, and the experimental results show that scTGRN outperforms existing models in constructing GRNs. In addition, we test the performance of scTGRN on gene function assignment, and scTGRN outperforms other models. Conclusion: The analysis shows that scTGRN can not only accurately identify the causal relationship between genes, but also can be used to achieve gene function assignment.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139689088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient. background: With increasing rates of polypharmacy, clinical drug toxicity has been closely monitored. Named Entity Recognition (NER) is a vital task for extracting valuable drug safety information from biomedical literature. Recently, many deep learning models in biomedical domain have made great progress for NER, especially pre-trained language models. However, these NER methods require large amounts of high-quality manually annotated data with named entities, which is labor intensive and inefficient. Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries. objective: To improve the performance of prediction Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships. method: In this work, instead of relying on the manually labeled data, a transformer-based Positive-Unlabeled Learning (PULearning) is proposed with adaptive learning and applied on the clinical cancer drug toxicity corpus. To improve the precision of prediction, relative position embeddings are used in transformer encoder. And then, a mixed loss is designed with two Kullback-Leibler (KL) regularizers for PULearning assumptions. Through adaptive sampling, our approach meets the expected performance for NER task only using unlabeled data and named entity dictionaries. result: The overall NER performance of our model obtains 0
背景:随着多种药物使用率的增加,对临床药物毒性的警惕性监测已成为一个重要的关注点。命名实体识别(NER)是一项不可或缺的工作,对于从生物医学文献中提取有关药物安全性的宝贵见解至关重要。近年来,深度学习模型在 NER 任务方面取得了重大进展。然而,这些核糖核酸(NER)技术的有效性依赖于大量注释数据的可用性,而这是一项劳动密集型且效率低下的工作:随着多种药物的使用率越来越高,临床药物毒性受到了密切关注。命名实体识别(NER)是从生物医学文献中提取有价值的药物安全性信息的重要任务。最近,生物医学领域的许多深度学习模型在 NER 方面取得了很大进展,尤其是预训练语言模型。然而,这些 NER 方法需要大量高质量的人工标注命名实体的数据,劳动强度大且效率低。方法本研究引入了一种新方法,与传统的依赖人工标注数据的方法不同。它采用了一种被称为正向无标注学习(PULearning)的基于转换器的技术,该技术结合了自适应学习,并应用于临床癌症药物毒性语料库。为了提高预测精度,我们在变换器编码器中采用了相对位置嵌入技术。此外,我们还制定了一个复合损失函数,其中整合了两个库尔巴克-莱伯勒(KL)正则,以符合 PULearning 假设。结果表明,我们的方法仅依靠未标注数据和命名实体字典就能实现 NER 任务的目标性能:提高预测性能 结论:我们的模型实现了整体 NER 性能的提高:我们的模型实现了整体 NER 性能,F1 为 0.819。具体来说,它对 DRUG、CANCER 和 TOXI 实体的 F1 分别为 0.841、0.801 和 0.815。对结果的综合分析验证了我们的方法与现有的 PULearning 方法相比在生物医学 NER 任务中的有效性。此外,我们还提供了三个已识别实体之间关联的可视化方法,为查询它们之间的相互关系提供了有价值的参考:在这项工作中,我们提出了一种基于转换器的正向无标注学习(PULearning)方法,并将其应用于临床癌症药物毒性语料库。为了提高预测精度,在变换器编码器中使用了相对位置嵌入。然后,针对 PULearning 假设,设计了带有两个 Kullback-Leibler (KL) 正则的混合损失。通过自适应采样,我们的方法仅在使用未标记数据和命名实体词典的情况下就达到了 NER 任务的预期性能:我们模型的总体 NER 性能获得了 0.819 的 F1 分数,而在 DRUG、CANCER 和 TOXI 上的 F1 分数分别为 0.841、0.801 和 0.815:无
{"title":"Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers","authors":"Weixin Xie, Jiayu Xu, Chengkui Zhao, Jin Li, Shuangze Han, Tianyu Shao, Limei Wang, Weixing Feng","doi":"10.2174/0115748936278299231213045441","DOIUrl":"https://doi.org/10.2174/0115748936278299231213045441","url":null,"abstract":"Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient. background: With increasing rates of polypharmacy, clinical drug toxicity has been closely monitored. Named Entity Recognition (NER) is a vital task for extracting valuable drug safety information from biomedical literature. Recently, many deep learning models in biomedical domain have made great progress for NER, especially pre-trained language models. However, these NER methods require large amounts of high-quality manually annotated data with named entities, which is labor intensive and inefficient. Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries. objective: To improve the performance of prediction Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships. method: In this work, instead of relying on the manually labeled data, a transformer-based Positive-Unlabeled Learning (PULearning) is proposed with adaptive learning and applied on the clinical cancer drug toxicity corpus. To improve the precision of prediction, relative position embeddings are used in transformer encoder. And then, a mixed loss is designed with two Kullback-Leibler (KL) regularizers for PULearning assumptions. Through adaptive sampling, our approach meets the expected performance for NER task only using unlabeled data and named entity dictionaries. result: The overall NER performance of our model obtains 0","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-02DOI: 10.2174/0115748936283134240109054157
Necla Nisa Soylu, Emre Sefer
Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
{"title":"DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers","authors":"Necla Nisa Soylu, Emre Sefer","doi":"10.2174/0115748936283134240109054157","DOIUrl":"https://doi.org/10.2174/0115748936283134240109054157","url":null,"abstract":"Introduction:: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods:: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results:: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.988 and 0.965 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.982, 0.955, and 0.953 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion:: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-02DOI: 10.2174/0115748936272939231212102627
Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu
Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.
{"title":"STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer","authors":"Liu Fan, Xiaoyu Yang, Lei Wang, Xianyou Zhu","doi":"10.2174/0115748936272939231212102627","DOIUrl":"https://doi.org/10.2174/0115748936272939231212102627","url":null,"abstract":"Introduction: Microbes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches. Methods: We proposed an efficient computational model, STNMDA, that integrated a StructureAware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbedrug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs. Results: Extensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations. Conclusion: Hence, STNMDA showed promise as a valuable tool for future prediction of microbedrug associations.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139666558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.
{"title":"P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs","authors":"Yajun Liu, Ru Li, Yulian Ding, Xin Hong Hei, Fang-Xiang Wu","doi":"10.2174/0115748936289420240117100823","DOIUrl":"https://doi.org/10.2174/0115748936289420240117100823","url":null,"abstract":"Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Method: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Result: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and http://43.138.46.5:8080/Portal4PC/.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01DOI: 10.2174/0115748936287244240117065325
Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie
Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.
简介识别和预测 DNA 序列中的蛋白质编码区在基因组研究中起着举足轻重的作用。本文介绍了一种识别 DNA 序列中蛋白质编码区的方法,该方法采用了一种混合方法,将数字带通滤波器与小波变换和各种光谱估算技术相结合,以提高外显子预测能力。具体来说,该方法采用了哈尔和道贝奇斯小波变换来提高蛋白质编码区(外显子)预测的准确性,从而能够提取原始 DNA 序列中可能被掩盖的复杂细节:DNA 序列中蛋白质编码区的识别和预测在基因组研究中起着举足轻重的作用。方法:这项研究展示了哈尔和道贝奇斯小波变换、非参数和参数谱估计方法的实用性,以及数字带通滤波器在检测外显子区域峰值方面的应用。此外,应用电子-离子相互作用势(EIIP)方法将符号 DNA 序列转换为数值,以及利用具有优化参数的总和-正弦曲线(SoS)数学模型,进一步丰富了 DNA 序列分析工具箱,确保所提出的方法能够成功地对 DNA 序列进行优化建模并准确识别基因:利用小波变换加强 DNA 序列中蛋白质编码区的识别 结果:该方法的结果表明,蛋白质编码区的识别准确率大幅提高。在峰值位置检测方面,应用 Haar 和 Daubechies 小波变换可将峰值定位精度提高约 (0.01, 3-5 dB)。在采用非参数和参数频谱估计技术时,与原始信号相比,峰值定位精度提高了约 (0.01, 4 dB)。方法:将数字带通滤波器、小波变换和各种频谱估计技术相结合的混合方法,以提高外显子预测能力。结论:这些发现不仅弥补了 DNA 序列分析中的不足,还为基因组学研究中的外显子区域预测和基因鉴定提供了一条前景广阔的途径。所提出的混合方法是对不断发展的基因组分析技术的有力贡献:通过该方法获得的结果表明,识别的准确性显著提高。这些发现为 DNA 序列分析、外显子区域预测和基因鉴定提供了一条前景广阔的途径。
{"title":"Improved Hybrid Approach for Enhancing Protein Coding Regions Identification in DNA Sequences","authors":"Emad S. Hassan, Ahmed M. Dessouky, Hesham Fathi, Gerges M. Salama, Ahmed S. Oshaba, Atef El-Emary, Fathi E. Abd El‑Samie","doi":"10.2174/0115748936287244240117065325","DOIUrl":"https://doi.org/10.2174/0115748936287244240117065325","url":null,"abstract":"Introduction: Identifying and predicting protein-coding regions within DNA sequences play a pivotal role in genomic research. This paper introduces an approach for identifying proteincoding regions in DNA sequences, employing a hybrid methodology that combines a digital bandpass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Specifically, the Haar and Daubechies wavelet transforms are applied to improve the accuracy of protein-coding region (exon) prediction, enabling the extraction of intricate details that may be obscured in the original DNA sequences. background: The identification and prediction of protein-coding regions within DNA sequences play a pivotal role in genomic research. Methods: This research showcases the utility of Haar and Daubechies wavelet transforms, both nonparametric and parametric spectral estimation methods, and the deployment of a digital band pass filter for detecting peaks in exon regions. Additionally, the application of the Electron-Ion Interaction Potential (EIIP) method for converting symbolic DNA sequences into numerical values and the utilization of sum-of-sinusoids (SoS) mathematical models with optimized parameters further enrich the toolbox for DNA sequence analysis, ensuring the success of this proposed method in modeling DNA sequences optimally and accurately identifying genes. objective: Enhanced Protein-Coding Region Identification in DNA Sequences Using Wavelet Transforms Results: The outcomes of this approach showcase a substantial enhancement in identification accuracy for protein-coding regions. In terms of peak location detection, the application of Haar and Daubechies wavelet transforms enhances the accuracy of peak localization by approximately (0.01, 3-5 dB). When employing non-parametric and parametric spectral estimation techniques, there is an improvement in peak location by approximately (0.01, 4 dB) compared to the original signal. The proposed approach also achieves higher accuracy when compared with existing methods. method: hybrid methodology that combines a digital band-pass filter with wavelet transforms and various spectral estimation techniques to enhance exon prediction. Conclusion: These findings not only bridge gaps in DNA sequence analysis but also offer a promising pathway for advancing exonic region prediction and gene identification in genomics research. The hybrid methodology presented stands as a robust contribution to the evolving landscape of genomic analysis techniques. result: The results obtained through this proposed method demonstrate significantly improved identification accuracy. These findings offer a promising avenue for DNA sequence analysis, exonic region prediction, and gene identification.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139662697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.2174/0115748936285690240101041704
Haiping Zhang, Konda Mani Saravanan
Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.
{"title":"Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review","authors":"Haiping Zhang, Konda Mani Saravanan","doi":"10.2174/0115748936285690240101041704","DOIUrl":"https://doi.org/10.2174/0115748936285690240101041704","url":null,"abstract":"Artificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as Deep- BindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-29DOI: 10.2174/0115748936278851231213110653
Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh
Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.
{"title":"FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia using VP1 Nucleotide Sequence Data","authors":"Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo, Rabindra Prasad Singh","doi":"10.2174/0115748936278851231213110653","DOIUrl":"https://doi.org/10.2174/0115748936278851231213110653","url":null,"abstract":"Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution of the assays. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: Therefore, the high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}