首页 > 最新文献

Current Bioinformatics最新文献

英文 中文
iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking iProm-Yeast:基于ML堆叠的酵母启动子预测工具
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-12-04 DOI: 10.2174/0115748936256869231019113616
Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong
Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.
背景与目的:基因启动子作为转录起始位点附近的DNA调控元件,在基因转录调控中起着至关重要的作用。尽管有许多方法,包括比对信号和基于内容的启动子预测方法,但由于启动子序列缺乏明确的特征,准确识别启动子仍然具有挑战性。因此,已经提出了许多用于启动子识别的机器学习和深度学习模型,但这些工具的性能并不精确。最近的研究大多集中在鉴定sigma或植物启动子上。而酿酒酵母启动子的准确鉴定仍然是一个未被充分探索的领域。在这项研究中,我们介绍了一种酵母启动子的鉴定方法“iPromyeast”。利用真核酵母的基因组序列,我们研究了载体编码和启动子分类。此外,我们通过使用启动子序列而不是基因组的非启动子区域开发了一个更困难的阴性集。新开发的负重构方法改进了分类,并最大限度地减少了假阳性预测的数量。方法:为了克服与启动子预测相关的问题,我们研究了替代向量编码和特征提取方法。接下来,这些策略与几种机器学习算法和一维卷积神经网络模型相结合。我们的研究结果表明,伪二核苷酸组合更适合用于特征编码,而机器学习叠加方法对于精确的启动子分类是非常好的。此外,我们提供了一种使用启动子序列而不是非启动子区域的负重构方法,从而获得更高的分类性能和更少的假阳性预测。结果:基于5倍交叉验证的结果,所提出的预测因子iProm-Yeast具有很好的检测酿酒酵母启动子的潜力。准确度(Acc)为86.27%,灵敏度(Sn)为82.29%,特异性(Sp)为89.47%,马修斯相关系数(MCC)为0.72,受试者工作特征曲线下面积(AUROC)为0.98。我们还进行了跨物种分析,以确定iProm-Yeast在其他物种中的普遍性。结论:iProm-Yeast是一种准确鉴定酿酒酵母启动子的可靠方法。采用先进的矢量编码技术和负重构方法,提高了分类精度,减少了误报预测。此外,它还为研究人员提供了一个可靠和精确的网站服务器来研究不同生物的基因调控。
{"title":"iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking","authors":"Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong","doi":"10.2174/0115748936256869231019113616","DOIUrl":"https://doi.org/10.2174/0115748936256869231019113616","url":null,"abstract":"Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Super-enhancers Based on Mean-shift Undersampling 基于均值偏移欠采样的超增强子预测
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-12-01 DOI: 10.2174/0115748936268302231110111456
Han Cheng, Shumei Ding, Cangzhi Jia
Background:: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective:: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods:: This work adopted mean-shift to cluster majority class samples and selected five sets of balanced datasets for mouse and three sets of balanced datasets for humans to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results:: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion:: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.
背景:超级增强子是基于主转录因子、染色质调节因子或染色质标记的结合占用而定义的增强子簇。据报道,超级增强子在转录上比常规增强子更活跃,细胞类型特异性更强。因此,有必要从常规增强剂中识别超级增强剂。已经提出了各种计算方法来识别超增强剂作为辅助工具。然而,大多数方法使用ChIP-seq数据,缺少这部分数据会使预测器无法执行或无法达到令人满意的性能。目的:本研究的目的是提出一种基于多特征融合的叠加计算模型来识别人类和小鼠物种的超级增强子。方法:采用mean-shift对多数类样本进行聚类,选择5组小鼠平衡数据集和3组人类平衡数据集训练堆叠模型。将5类序列信息作为XGBoost分类器的输入,设计各分类器概率输出的平均值作为最终分类结果。结果:10倍交叉验证和跨细胞系验证的结果证明,与其他现有方法相比,我们的方法具有优越的性能。源代码和数据集可从https://github.com/Cheng-Han-max/SE_voting获得。结论:特征重要性分析表明,失配在前20个重要特征中所占比例最高。
{"title":"Prediction of Super-enhancers Based on Mean-shift Undersampling","authors":"Han Cheng, Shumei Ding, Cangzhi Jia","doi":"10.2174/0115748936268302231110111456","DOIUrl":"https://doi.org/10.2174/0115748936268302231110111456","url":null,"abstract":"Background:: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective:: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods:: This work adopted mean-shift to cluster majority class samples and selected five sets of balanced datasets for mouse and three sets of balanced datasets for humans to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results:: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion:: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"25 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCV Filter: A Hybrid Deep Learning Model for SARS-CoV-2 Variants Classification SCV滤波器:一种用于SARS-CoV-2变体分类的混合深度学习模型
IF 4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-11-22 DOI: 10.2174/1574893618666230809121509
Han Wang, Jingyang Gao
Background: The high mutability of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) makes it easy for mutations to occur during transmission. As the epidemic continues to develop, several mutated strains have been produced. Researchers worldwide are working on the effective identification of SARS-CoV-2. Objective: In this paper, we propose a new deep learning method that can effectively identify SARSCoV- 2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Methods: Deep learning is effective in extracting rich features from sequence data, which has significant implications for the study of Coronavirus Disease 2019 (COVID-19), which has become prevalent in recent years. In this paper, we propose a new deep learning method that can effectively identify SARS-CoV-2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Results: The accuracy of the SCVfilter is 93.833% on Dataset-I consisting of different variant strains; 90.367% on Dataset-II consisting of data collected from China, Taiwan, and Hong Kong; and 79.701% on Dataset-III consisting of data collected from six continents (Africa, Asia, Europe, North America, Oceania, and South America). Conclusion: When using the SCV filter to process lengthy and high-homology SARS-CoV-2 data, it can automatically select features and accurately detect different variant strains of SARS-CoV-2. In addition, the SCV filter is sufficiently robust to handle the problems caused by sample imbalance and sequence incompleteness. Other: The SCVfilter is an open-source method available at https://github.com/deconvolutionw/ SCVfilter.
背景:严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的高易变性使其在传播过程中容易发生突变。随着疫情的继续发展,已经产生了几种变异菌株。世界各地的研究人员正在努力有效识别SARS-CoV-2。目的:本文提出了一种能够有效识别SARSCoV- 2变异序列的深度学习新方法——SCVfilter,它是一种以嵌入、注意残差网络和长短期记忆为组成部分的深度混合模型。方法:深度学习可以有效地从序列数据中提取丰富的特征,这对近年来流行的2019冠状病毒病(COVID-19)的研究具有重要意义。本文提出了一种能够有效识别SARS-CoV-2变异序列的深度学习新方法——SCVfilter,它是一种以嵌入、注意残差网络和长短期记忆为组成部分的深度混合模型。结果:在由不同变异菌株组成的Dataset-I上,SCVfilter的准确率为93.833%;来自中国、台湾和香港的数据在Dataset-II上占90.367%;在Dataset-III上占79.701%,包括来自六大洲(非洲、亚洲、欧洲、北美、大洋洲和南美洲)的数据。结论:利用SCV过滤器对冗长、高同源性的SARS-CoV-2数据进行处理时,可自动选择特征,准确检测出不同的SARS-CoV-2变异株。此外,SCV滤波器具有足够的鲁棒性,可以处理由样本不平衡和序列不完整引起的问题。其他:SCVfilter是一种开源方法,可在https://github.com/deconvolutionw/ SCVfilter上获得。
{"title":"SCV Filter: A Hybrid Deep Learning Model for SARS-CoV-2 Variants Classification","authors":"Han Wang, Jingyang Gao","doi":"10.2174/1574893618666230809121509","DOIUrl":"https://doi.org/10.2174/1574893618666230809121509","url":null,"abstract":"Background: The high mutability of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) makes it easy for mutations to occur during transmission. As the epidemic continues to develop, several mutated strains have been produced. Researchers worldwide are working on the effective identification of SARS-CoV-2. Objective: In this paper, we propose a new deep learning method that can effectively identify SARSCoV- 2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Methods: Deep learning is effective in extracting rich features from sequence data, which has significant implications for the study of Coronavirus Disease 2019 (COVID-19), which has become prevalent in recent years. In this paper, we propose a new deep learning method that can effectively identify SARS-CoV-2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Results: The accuracy of the SCVfilter is 93.833% on Dataset-I consisting of different variant strains; 90.367% on Dataset-II consisting of data collected from China, Taiwan, and Hong Kong; and 79.701% on Dataset-III consisting of data collected from six continents (Africa, Asia, Europe, North America, Oceania, and South America). Conclusion: When using the SCV filter to process lengthy and high-homology SARS-CoV-2 data, it can automatically select features and accurately detect different variant strains of SARS-CoV-2. In addition, the SCV filter is sufficiently robust to handle the problems caused by sample imbalance and sequence incompleteness. Other: The SCVfilter is an open-source method available at https://github.com/deconvolutionw/ SCVfilter.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors 利用机器学习和分子描述子预测皮毛样蛋白转录因子的dna结合位点
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-27 DOI: 10.2174/0115748936264122231016094702
Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain
Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.
转录因子因其在基因表达调控中的关键作用而在生物技术领域引起了极大的兴趣。Fur是革兰氏阴性菌中最重要的转录因子之一,是一种全球性的调节因子,被研究作为抗菌药物设计的治疗靶点。它的dna结合域,包含一个螺旋-螺旋-螺旋基序,是其最相关的特征之一。方法:在本研究中,我们评估了几种基于Fur超家族蛋白和其他螺旋-转-螺旋转录因子预测dna结合位点的机器学习算法,包括支持向量机(SVM)、随机森林(RF)、决策树(DT)和朴素贝叶斯(NB)。我们还测试了使用来自氨基酸序列和结合DNA的蛋白质片段结构的几个分子描述符的功效。在保持良好分类性能的前提下,采用特征选择过程在每种情况下选择较少的描述符。结果:使用12个序列衍生属性的SVM模型和使用9个结构衍生特征的DT模型获得了最好的结果,分别达到82%和76%的准确率。结论:所获得的性能表明,我们使用的描述符与预测dna结合位点相关,因为它们可以区分蛋白质的结合区和非结合区。
{"title":"Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors","authors":"Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain","doi":"10.2174/0115748936264122231016094702","DOIUrl":"https://doi.org/10.2174/0115748936264122231016094702","url":null,"abstract":"Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"25 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136318847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NaProGraph: Network Analyzer for Interactions between Nucleic Acids and Proteins 核酸与蛋白质相互作用的网络分析仪
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-20 DOI: 10.2174/0115748936266189231004110412
Sajjad nematzadeh, Nizamettin Aydin, Zeyneb Kurt, Mahsa Torkamanian-Afshar
abstract: Interactions of RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and nucleic acids (NAs) and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. The tool we have developed is publicly available at https://naprolink.com/NaProGraph/ background: Interactions between RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. method: This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and NAs and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. conclusion: The NaProGraph tool serves as an effective online resource for researchers interested in studying interactions between nucleic acids and proteins. By leveraging a comprehensive dataset and providing various visualization and extraction capabilities, NaProGraph facilitates the exploration of macromolecular relationships and aids in understanding intracellular processes in living organisms. other: -
RNA和DNA与蛋白质的相互作用对于阐明生物体的细胞内过程、诊断疾病、设计适体药物以及其他应用至关重要。因此,研究这些大分子之间的关系对生命科学研究至关重要。本研究提出了一个在线网络提供者工具(naprogram),该工具提供了一个直观和用户友好的界面,用于研究核酸(NA)与蛋白质之间的相互作用。naprogram利用结构生物信息学研究合作实验室(RCSB)蛋白质数据库(PDB)中几乎所有相互作用的大分子的综合和精心策划的数据集。研究人员可以使用这个在线工具来关注PDB的特定部分,调查其相关关系,并可视化和提取相关信息。该工具提供了蛋白质和核酸(NAs)之间原子和残基的频率以及大分子初级结构的相似性的见解。此外,蛋白质的功能相似性可以通过Pfam的蛋白质家族和氏族来推断。我们开发的工具可在https://naprolink.com/NaProGraph/上公开获得。背景:RNA和DNA与蛋白质之间的相互作用对于阐明活生物体的细胞内过程、诊断疾病、设计适体药物和其他应用至关重要。因此,研究这些大分子之间的关系对生命科学研究至关重要。方法:本研究提出了一个在线网络提供者工具(NaProGraph),该工具为研究核酸(NA)与蛋白质之间的相互作用提供了一个直观和用户友好的界面。naprogram利用结构生物信息学研究合作实验室(RCSB)蛋白质数据库(PDB)中几乎所有相互作用的大分子的综合和精心策划的数据集。研究人员可以使用这个在线工具来关注PDB的特定部分,调查其相关关系,并可视化和提取相关信息。该工具提供了蛋白质和NAs之间原子和残基的频率以及大分子初级结构的相似性的见解。此外,蛋白质的功能相似性可以通过Pfam的蛋白质家族和氏族来推断。结论:napgraph工具为研究核酸与蛋白质相互作用的研究人员提供了一个有效的在线资源。通过利用一个全面的数据集,提供各种可视化和提取功能,nagraph促进了大分子关系的探索,并有助于理解活生物体的细胞内过程。其他:-
{"title":"NaProGraph: Network Analyzer for Interactions between Nucleic Acids and Proteins","authors":"Sajjad nematzadeh, Nizamettin Aydin, Zeyneb Kurt, Mahsa Torkamanian-Afshar","doi":"10.2174/0115748936266189231004110412","DOIUrl":"https://doi.org/10.2174/0115748936266189231004110412","url":null,"abstract":"abstract: Interactions of RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and nucleic acids (NAs) and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. The tool we have developed is publicly available at https://naprolink.com/NaProGraph/ background: Interactions between RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. method: This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and NAs and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. conclusion: The NaProGraph tool serves as an effective online resource for researchers interested in studying interactions between nucleic acids and proteins. By leveraging a comprehensive dataset and providing various visualization and extraction capabilities, NaProGraph facilitates the exploration of macromolecular relationships and aids in understanding intracellular processes in living organisms. other: -","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance 基于特征重要性Logistic回归的药物致免疫性血小板减少毒性预测
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-18 DOI: 10.2174/0115748936269606231001140647
Osphanie Mentari, Muhammad Shujaat, Hilal Tayara, Kil To Chong
Background: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia.
背景:药物发现中可以通过人工智能解决的问题之一是毒性预测。在药物性免疫性血小板减少症中,由于药物依赖性抗体引起的大量出血,患者在5至10天后可能出现毒性。在临床试验中,当这种情况发生时,患者应停止服用所有药物,尽管有时这是不可能的,特别是对于依赖药物的老年患者。因此,能够预测药物性免疫性血小板减少症的毒性是非常重要的。计算技术,如机器学习,可以比经验技术更好地预测毒性,因为成本更低,处理速度更快。目的:以往的研究采用KNN方法。然而,这些方法的性能需要得到提高。本研究提出一个逻辑回归,以提高准确性得分。方法:在本研究中,我们利用机器学习方法提出了一种新的药物性免疫性血小板减少模型。我们的模型从简化分子输入线输入系统(SMILES)中提取了几个特征。对这些特征进行融合和清理,并使用SelectKBest方法选择重要特征。该模型使用由网格搜索交叉验证优化和调整的逻辑回归。结果:使用PADEL、CDK、RDKIT、MORDRED、BLUEDESC组合的特征时准确率最高,达到80%。结论:我们提出的模型在准确率类别上优于以往的研究。信息和源代码可在Github上在线访问:https://github.com/Osphanie/Thrombocytopenia。
{"title":"Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance","authors":"Osphanie Mentari, Muhammad Shujaat, Hilal Tayara, Kil To Chong","doi":"10.2174/0115748936269606231001140647","DOIUrl":"https://doi.org/10.2174/0115748936269606231001140647","url":null,"abstract":"Background: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135889328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network propagation-based identification of oligometastatic biomarkers in metastatic colorectal cancer 基于网络传播的转移性结直肠癌低转移性生物标志物鉴定
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-16 DOI: 10.2174/1574893618666230913110025
Qing Jin, Kexin Yu, Xianze Zhang, Diwei Huo, Denan Zhang, Lei Liu, Hongbo Xie, Binhua Liang, Xiujie Chen
Background: The oligometastatic disease has been proposed as an intermediate state between primary tumor and systemically metastatic disease, which has great potential curable with locoregional therapies. However, since no biomarker for the identification of patients with true oligometastatic disease is clinically available, the diagnosis of oligometastatic disease remains controversial. Objective: We aim to identify potential biomarkers of colorectal cancer patients with true oligometastatic states, who will benefit most from local therapy. Methods: This study retrospectively analyzed the transcriptome profiles and clinical parameters of 307 metastatic colorectal cancer patients. A novel network propagation method and network-based strategy were combined to identify oligometastatic biomarkers to predict the prognoses of metastatic colorectal cancer patients. Results: We defined two metastatic risk groups according to twelve oligometastatic biomarkers, which exhibit distinct prognoses, clinicopathological features, immunological characteristics, and biological mechanisms. The metastatic risk assessment model exhibited a more powerful capacity for survival prediction compared to traditional clinicopathological features. The low-MRS group was most consistent with an oligometastatic state, while the high-MRS might be a potential polymetastatic state, which leads to the divergence of their prognostic outcomes and response to treatments. We also identified 22 significant immune check genes between the high-MRS and low- MRS groups. The difference in molecular mechanism between the two metastatic risk groups was associated with focal adhesion, nucleocytoplasmic transport, Hippo, PI3K-Akt, TGF-β, and EMCreceptor interaction signaling pathways. Conclusion: Our study provided a molecular definition of the oligometastatic state in colorectal cancer, which contributes to precise treatment decision-making for advanced patients.
背景:寡转移性疾病被认为是介于原发肿瘤和全身转移性疾病之间的一种中间状态,通过局部治疗具有很大的治愈潜力。然而,由于临床上没有识别真正的少转移性疾病患者的生物标志物,因此对少转移性疾病的诊断仍然存在争议。目的:我们的目标是确定具有真正少转移状态的结直肠癌患者的潜在生物标志物,这些患者将从局部治疗中获益最多。方法:回顾性分析307例转移性结直肠癌患者的转录组谱和临床参数。结合一种新的网络传播方法和基于网络的策略来识别低转移性生物标志物,以预测转移性结直肠癌患者的预后。结果:我们根据12个低转移性生物标志物定义了两个转移风险组,这些生物标志物表现出不同的预后、临床病理特征、免疫学特征和生物学机制。与传统的临床病理特征相比,转移性风险评估模型显示出更强大的生存预测能力。低mrs组最符合低转移状态,而高mrs组可能是潜在的多转移状态,这导致了他们的预后结果和对治疗的反应的差异。我们还在高MRS组和低MRS组之间鉴定了22个显著的免疫检查基因。两个转移风险组的分子机制差异与局灶黏附、核质转运、Hippo、PI3K-Akt、TGF-β和emc受体相互作用信号通路有关。结论:我们的研究提供了结直肠癌低转移状态的分子定义,有助于晚期患者的精确治疗决策。
{"title":"Network propagation-based identification of oligometastatic biomarkers in metastatic colorectal cancer","authors":"Qing Jin, Kexin Yu, Xianze Zhang, Diwei Huo, Denan Zhang, Lei Liu, Hongbo Xie, Binhua Liang, Xiujie Chen","doi":"10.2174/1574893618666230913110025","DOIUrl":"https://doi.org/10.2174/1574893618666230913110025","url":null,"abstract":"Background: The oligometastatic disease has been proposed as an intermediate state between primary tumor and systemically metastatic disease, which has great potential curable with locoregional therapies. However, since no biomarker for the identification of patients with true oligometastatic disease is clinically available, the diagnosis of oligometastatic disease remains controversial. Objective: We aim to identify potential biomarkers of colorectal cancer patients with true oligometastatic states, who will benefit most from local therapy. Methods: This study retrospectively analyzed the transcriptome profiles and clinical parameters of 307 metastatic colorectal cancer patients. A novel network propagation method and network-based strategy were combined to identify oligometastatic biomarkers to predict the prognoses of metastatic colorectal cancer patients. Results: We defined two metastatic risk groups according to twelve oligometastatic biomarkers, which exhibit distinct prognoses, clinicopathological features, immunological characteristics, and biological mechanisms. The metastatic risk assessment model exhibited a more powerful capacity for survival prediction compared to traditional clinicopathological features. The low-MRS group was most consistent with an oligometastatic state, while the high-MRS might be a potential polymetastatic state, which leads to the divergence of their prognostic outcomes and response to treatments. We also identified 22 significant immune check genes between the high-MRS and low- MRS groups. The difference in molecular mechanism between the two metastatic risk groups was associated with focal adhesion, nucleocytoplasmic transport, Hippo, PI3K-Akt, TGF-β, and EMCreceptor interaction signaling pathways. Conclusion: Our study provided a molecular definition of the oligometastatic state in colorectal cancer, which contributes to precise treatment decision-making for advanced patients.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136078556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QLDTI: A Novel Reinforcement Learning-based Prediction Model for Drug-Target Interaction QLDTI:一种基于强化学习的药物-靶标相互作用预测模型
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-16 DOI: 10.2174/0115748936264731230928112936
Jie Gao, Qiming Fu, Jiacheng Sun, Yunzhe Wang, Youbing Xia, You Lu, Hongjie Wu, Jianping Chen
Background: Predicting drug-target interaction (DTI) plays a crucial role in drug research and development. More and more researchers pay attention to the problem of developing more powerful prediction methods. Traditional DTI prediction methods are basically realized by biochemical experiments, which are time-consuming, risky, and costly. Nowadays, DTI prediction is often solved by using a single information source and a single model, or by combining some models, but the prediction results are still not accurate enough. Objective: The study aimed to utilize existing data and machine learning models to integrate heterogeneous data sources and different models, further improving the accuracy of DTI prediction. Methods: This paper has proposed a novel prediction method based on reinforcement learning, called QLDTI (predicting drug-target interaction based on Q-learning), which can be mainly divided into two parts: data fusion and model fusion. Firstly, it fuses the drug and target similarity matrices calculated by different calculation methods through Q-learning. Secondly, the new similarity matrix is inputted into five models, NRLMF, CMF, BLM-NII, NetLapRLS, and WNN-GIP, for further training. Then, all sub-model weights are continuously optimized again by Q-learning, which can be used to linearly weight all sub-model prediction results to output the final prediction result. Results: QLDTI achieved AUC accuracy of 99.04%, 99.12%, 98.28%, and 98.35% on E, NR, IC, and GPCR datasets, respectively. Compared to the existing five models NRLMF, CMF, BLM-NII, NetLapRLS, and WNN-GIP, the QLDTI method has achieved better results on four benchmark datasets of E, NR, IC, and GPCR. Conclusion: Data fusion and model fusion have been proven effective for DTI prediction, further improving the prediction accuracy of DTI.
背景:药物-靶标相互作用预测(DTI)在药物研究和开发中起着至关重要的作用。如何开发更强大的预测方法已成为越来越多研究者关注的问题。传统的DTI预测方法基本通过生化实验实现,耗时长、风险大、成本高。目前,DTI预测通常采用单一信息源和单一模型来解决,或者将多个模型组合起来解决,但预测结果仍然不够准确。目的:利用现有数据和机器学习模型,整合异构数据源和不同模型,进一步提高DTI预测的准确性。方法:本文提出了一种基于强化学习的新型预测方法,称为QLDTI(基于Q-learning的药物-靶标相互作用预测),该方法主要分为数据融合和模型融合两部分。首先,通过Q-learning将不同计算方法计算出的药物和目标相似度矩阵进行融合。其次,将新的相似矩阵输入到NRLMF、CMF、BLM-NII、NetLapRLS和WNN-GIP五个模型中进行进一步训练。然后,通过Q-learning再次连续优化所有子模型的权重,利用Q-learning对所有子模型的预测结果进行线性加权,输出最终的预测结果。结果:QLDTI在E、NR、IC和GPCR数据集上的AUC准确率分别为99.04%、99.12%、98.28%和98.35%。与现有的NRLMF、CMF、BLM-NII、NetLapRLS和WNN-GIP 5种模型相比,QLDTI方法在E、NR、IC和GPCR 4个基准数据集上取得了更好的结果。结论:数据融合和模型融合对DTI预测是有效的,进一步提高了DTI的预测精度。
{"title":"QLDTI: A Novel Reinforcement Learning-based Prediction Model for Drug-Target Interaction","authors":"Jie Gao, Qiming Fu, Jiacheng Sun, Yunzhe Wang, Youbing Xia, You Lu, Hongjie Wu, Jianping Chen","doi":"10.2174/0115748936264731230928112936","DOIUrl":"https://doi.org/10.2174/0115748936264731230928112936","url":null,"abstract":"Background: Predicting drug-target interaction (DTI) plays a crucial role in drug research and development. More and more researchers pay attention to the problem of developing more powerful prediction methods. Traditional DTI prediction methods are basically realized by biochemical experiments, which are time-consuming, risky, and costly. Nowadays, DTI prediction is often solved by using a single information source and a single model, or by combining some models, but the prediction results are still not accurate enough. Objective: The study aimed to utilize existing data and machine learning models to integrate heterogeneous data sources and different models, further improving the accuracy of DTI prediction. Methods: This paper has proposed a novel prediction method based on reinforcement learning, called QLDTI (predicting drug-target interaction based on Q-learning), which can be mainly divided into two parts: data fusion and model fusion. Firstly, it fuses the drug and target similarity matrices calculated by different calculation methods through Q-learning. Secondly, the new similarity matrix is inputted into five models, NRLMF, CMF, BLM-NII, NetLapRLS, and WNN-GIP, for further training. Then, all sub-model weights are continuously optimized again by Q-learning, which can be used to linearly weight all sub-model prediction results to output the final prediction result. Results: QLDTI achieved AUC accuracy of 99.04%, 99.12%, 98.28%, and 98.35% on E, NR, IC, and GPCR datasets, respectively. Compared to the existing five models NRLMF, CMF, BLM-NII, NetLapRLS, and WNN-GIP, the QLDTI method has achieved better results on four benchmark datasets of E, NR, IC, and GPCR. Conclusion: Data fusion and model fusion have been proven effective for DTI prediction, further improving the prediction accuracy of DTI.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136182037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bioinformatics Perspective of Drug Repurposing 药物再利用的生物信息学观点
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-10 DOI: 10.2174/0115748936264692230921071504
Binita Patel, Brijesh Gelat, Mehul Soni, Pooja Rathaur, Kaid Johar SR
Abstract: Different diseases can be treated with various therapeutic agents. Drug discovery aims to find potential molecules for existing and emerging diseases. However, factors, such as increasing development cost, generic competition due to the patent expiry of several drugs, increase in conservative regulatory policies, and insufficient breakthrough innovations impairs the development of new drugs and the learning productivity of pharmaceutical industries. Drug repurposing is the process of finding new therapeutic applications for already approved, withdrawn from use, abandoned, and experimental drugs. Drug repurposing is another method that may partially overcome the hurdles related to drug discovery and hence appears to be a wise attempt. However, drug repurposing being not a standard regulatory process, leads to administrative concerns and problems. The drug repurposing also requires expensive, high-risk clinical trials to establish the safety and efficacy of the repurposed drug. Recent innovations in the field of bioinformatics can accelerate the new drug repurposing studies by identifying new targets of the existing drugs along with drug candidate screening and refinement. Recent advancements in the field of comprehensive high throughput data in genomics, epigenetics, chromosome architecture, transcriptomic, proteomics, and metabolomics may also contribute to the understanding of molecular mechanisms involved in drug-target interaction. The present review describes the current scenario in the field of drug repurposing along with the application of various bioinformatic tools for the identification of new targets for the existing drug.
摘要:不同的疾病可以用不同的治疗剂治疗。药物发现的目的是为现有和新出现的疾病找到潜在的分子。然而,开发成本增加、多药专利到期导致的仿制药竞争、保守监管政策增加、突破性创新不足等因素影响了新药开发和制药行业的学习效率。药物再利用是为已经批准、退出使用、废弃和实验性药物寻找新的治疗应用的过程。药物再利用是另一种方法,可以部分克服与药物发现有关的障碍,因此似乎是一种明智的尝试。然而,药物再利用不是一个标准的监管过程,导致管理方面的担忧和问题。药物再利用还需要昂贵、高风险的临床试验来确定药物的安全性和有效性。生物信息学领域的最新创新可以通过识别现有药物的新靶点以及候选药物的筛选和改进来加速新药再利用研究。基因组学、表观遗传学、染色体结构、转录组学、蛋白质组学和代谢组学等领域的综合高通量数据的最新进展也可能有助于理解药物-靶标相互作用的分子机制。本综述描述了药物再利用领域的现状,以及各种生物信息学工具在现有药物新靶点识别中的应用。
{"title":"Bioinformatics Perspective of Drug Repurposing","authors":"Binita Patel, Brijesh Gelat, Mehul Soni, Pooja Rathaur, Kaid Johar SR","doi":"10.2174/0115748936264692230921071504","DOIUrl":"https://doi.org/10.2174/0115748936264692230921071504","url":null,"abstract":"Abstract: Different diseases can be treated with various therapeutic agents. Drug discovery aims to find potential molecules for existing and emerging diseases. However, factors, such as increasing development cost, generic competition due to the patent expiry of several drugs, increase in conservative regulatory policies, and insufficient breakthrough innovations impairs the development of new drugs and the learning productivity of pharmaceutical industries. Drug repurposing is the process of finding new therapeutic applications for already approved, withdrawn from use, abandoned, and experimental drugs. Drug repurposing is another method that may partially overcome the hurdles related to drug discovery and hence appears to be a wise attempt. However, drug repurposing being not a standard regulatory process, leads to administrative concerns and problems. The drug repurposing also requires expensive, high-risk clinical trials to establish the safety and efficacy of the repurposed drug. Recent innovations in the field of bioinformatics can accelerate the new drug repurposing studies by identifying new targets of the existing drugs along with drug candidate screening and refinement. Recent advancements in the field of comprehensive high throughput data in genomics, epigenetics, chromosome architecture, transcriptomic, proteomics, and metabolomics may also contribute to the understanding of molecular mechanisms involved in drug-target interaction. The present review describes the current scenario in the field of drug repurposing along with the application of various bioinformatic tools for the identification of new targets for the existing drug.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136358655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systematic Review of Medical Expert Systems for Cardiac Arrest Prediction 心脏骤停预测医学专家系统综述
3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-10-10 DOI: 10.2174/0115748936251658231002043812
Ishleen Kaur, Tanvir Ahmad, M.N. Doja
Background:: Predicting cardiac arrest is crucial for timely intervention and improved patient outcomes. Machine learning has yielded astounding results by offering tailored prediction analyses on complex data. Despite advancements in medical expert systems, there remains a need for a comprehensive analysis of their effectiveness and limitations in cardiac arrest prediction. This need arises because there are not enough existing studies that thoroughly cover the topic. Objective:: The systematic review aims to analyze the existing literature on medical expert systems for cardiac arrest prediction, filling the gaps in knowledge and identifying key challenges. Methods:: This paper adopts the PRISMA methodology to conduct a systematic review of 37 publications obtained from PubMed, Springer, ScienceDirect, and IEEE, published within the last decade. Careful inclusion and exclusion criteria were applied during the selection process, resulting in a comprehensive analysis that utilizes five integrated layers- research objectives, data collection, feature set generation, model training and validation employing various machine learning techniques. Results and Conclusion:: The findings indicate that current studies frequently use ensemble and deep learning methods to improve machine learning predictions’ accuracy. However, they lack adequate implementation of proper pre-processing techniques. Further research is needed to address challenges related to external validation, implementation, and adoption of machine learning models in real clinical settings, as well as integrating machine learning with AI technologies like NLP. This review aims to be a valuable resource for both novice and experienced researchers, offering insights into current methods and potential future recommendations.
背景:预测心脏骤停对于及时干预和改善患者预后至关重要。机器学习通过对复杂数据提供量身定制的预测分析,产生了惊人的结果。尽管医学专家系统取得了进步,但仍需要对其在心脏骤停预测中的有效性和局限性进行全面分析。这一需求的出现是因为没有足够的现有研究,彻底涵盖了这一主题。目的:本系统综述旨在分析心脏骤停预测医学专家系统的现有文献,填补知识空白,找出关键挑战。方法:本文采用PRISMA方法对PubMed、Springer、ScienceDirect和IEEE近十年发表的37篇论文进行系统评价。在选择过程中,采用了仔细的纳入和排除标准,从而进行了综合分析,利用五个集成层-研究目标,数据收集,特征集生成,模型训练和使用各种机器学习技术的验证。结果与结论:研究结果表明,目前的研究经常使用集成和深度学习方法来提高机器学习预测的准确性。然而,它们缺乏适当的预处理技术的充分实现。需要进一步的研究来解决与外部验证、实施和在实际临床环境中采用机器学习模型相关的挑战,以及将机器学习与人工智能技术(如NLP)集成。本综述旨在为新手和有经验的研究人员提供宝贵的资源,提供对当前方法的见解和潜在的未来建议。
{"title":"A Systematic Review of Medical Expert Systems for Cardiac Arrest Prediction","authors":"Ishleen Kaur, Tanvir Ahmad, M.N. Doja","doi":"10.2174/0115748936251658231002043812","DOIUrl":"https://doi.org/10.2174/0115748936251658231002043812","url":null,"abstract":"Background:: Predicting cardiac arrest is crucial for timely intervention and improved patient outcomes. Machine learning has yielded astounding results by offering tailored prediction analyses on complex data. Despite advancements in medical expert systems, there remains a need for a comprehensive analysis of their effectiveness and limitations in cardiac arrest prediction. This need arises because there are not enough existing studies that thoroughly cover the topic. Objective:: The systematic review aims to analyze the existing literature on medical expert systems for cardiac arrest prediction, filling the gaps in knowledge and identifying key challenges. Methods:: This paper adopts the PRISMA methodology to conduct a systematic review of 37 publications obtained from PubMed, Springer, ScienceDirect, and IEEE, published within the last decade. Careful inclusion and exclusion criteria were applied during the selection process, resulting in a comprehensive analysis that utilizes five integrated layers- research objectives, data collection, feature set generation, model training and validation employing various machine learning techniques. Results and Conclusion:: The findings indicate that current studies frequently use ensemble and deep learning methods to improve machine learning predictions’ accuracy. However, they lack adequate implementation of proper pre-processing techniques. Further research is needed to address challenges related to external validation, implementation, and adoption of machine learning models in real clinical settings, as well as integrating machine learning with AI technologies like NLP. This review aims to be a valuable resource for both novice and experienced researchers, offering insights into current methods and potential future recommendations.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136358104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Current Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1