首页 > 最新文献

Current Bioinformatics最新文献

英文 中文
Optimized Hybrid Deep Learning for Real-Time Pandemic Data Forecasting: Long and Short-Term Perspectives 用于实时流行病数据预测的优化混合深度学习:长期和短期视角
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-07 DOI: 10.2174/0115748936257412231120113648
Sujata Dash, Sourav Kumar Giri, Subhendu Kumar Pani, Saurav Mallik, Mingqiang Wang, Hong Qin
Background:: With new variants of COVID-19 causing challenges, we need to focus on integrating multiple deep-learning frameworks to develop intelligent healthcare systems for early detection and diagnosis. Objective:: This article suggests three hybrid deep learning models, namely CNN-LSTM, CNN-Bi- LSTM, and CNN-GRU, to address the pressing need for an intelligent healthcare system. These models are designed to capture spatial and temporal patterns in COVID-19 data, thereby improving the accuracy and timeliness of predictions. An output forecasting framework integrates these models, and an optimization algorithm automatically selects the hyperparameters for the 13 baselines and the three proposed hybrid models. Methods:: Real-time time series data from the five most affected countries were used to test the effectiveness of the proposed models. Baseline models were compared, and optimization algorithms were employed to improve forecasting capabilities. Results:: CNN-GRU and CNN-LSTM are the top short- and long-term forecasting models. CNNGRU had the best performance with the lowest SMAPE and MAPE values for long-term forecasting in India at 3.07% and 3.17%, respectively, and impressive results for short-term forecasting with SMAPE and MAPE values of 1.46% and 1.47%. Conclusion:: Hybrid deep learning models, like CNN-GRU, can aid in early COVID-19 assessment and diagnosis. They detect patterns in data for effective governmental strategies and forecasting. This helps manage and mitigate the pandemic faster and more accurately.
背景::随着 COVID-19 的新变种带来挑战,我们需要专注于整合多种深度学习框架,以开发用于早期检测和诊断的智能医疗系统。目标本文提出了三种混合深度学习模型,即 CNN-LSTM、CNN-Bi-LSTM 和 CNN-GRU,以满足对智能医疗系统的迫切需求。这些模型旨在捕捉 COVID-19 数据中的空间和时间模式,从而提高预测的准确性和及时性。一个输出预测框架集成了这些模型,一个优化算法自动为 13 个基线模型和三个拟议的混合模型选择超参数。方法使用五个受影响最严重国家的实时时间序列数据来测试所提模型的有效性。对基线模型进行比较,并采用优化算法提高预测能力。结果CNN-GRU 和 CNN-LSTM 是最优秀的短期和长期预测模型。CNNGRU 性能最佳,在印度的长期预测中 SMAPE 和 MAPE 值最低,分别为 3.07% 和 3.17%,在短期预测中 SMAPE 和 MAPE 值分别为 1.46% 和 1.47%,结果令人印象深刻。结论混合深度学习模型(如 CNN-GRU)可以帮助进行早期 COVID-19 评估和诊断。它们能检测数据中的模式,从而制定有效的政府战略并进行预测。这有助于更快、更准确地管理和缓解大流行病。
{"title":"Optimized Hybrid Deep Learning for Real-Time Pandemic Data Forecasting: Long and Short-Term Perspectives","authors":"Sujata Dash, Sourav Kumar Giri, Subhendu Kumar Pani, Saurav Mallik, Mingqiang Wang, Hong Qin","doi":"10.2174/0115748936257412231120113648","DOIUrl":"https://doi.org/10.2174/0115748936257412231120113648","url":null,"abstract":"Background:: With new variants of COVID-19 causing challenges, we need to focus on integrating multiple deep-learning frameworks to develop intelligent healthcare systems for early detection and diagnosis. Objective:: This article suggests three hybrid deep learning models, namely CNN-LSTM, CNN-Bi- LSTM, and CNN-GRU, to address the pressing need for an intelligent healthcare system. These models are designed to capture spatial and temporal patterns in COVID-19 data, thereby improving the accuracy and timeliness of predictions. An output forecasting framework integrates these models, and an optimization algorithm automatically selects the hyperparameters for the 13 baselines and the three proposed hybrid models. Methods:: Real-time time series data from the five most affected countries were used to test the effectiveness of the proposed models. Baseline models were compared, and optimization algorithms were employed to improve forecasting capabilities. Results:: CNN-GRU and CNN-LSTM are the top short- and long-term forecasting models. CNNGRU had the best performance with the lowest SMAPE and MAPE values for long-term forecasting in India at 3.07% and 3.17%, respectively, and impressive results for short-term forecasting with SMAPE and MAPE values of 1.46% and 1.47%. Conclusion:: Hybrid deep learning models, like CNN-GRU, can aid in early COVID-19 assessment and diagnosis. They detect patterns in data for effective governmental strategies and forecasting. This helps manage and mitigate the pandemic faster and more accurately.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138556751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Pathological Myopia Associated Genes with A Random Walk- Based Method in Protein-Protein Interaction Network 用基于随机漫步的方法在蛋白质-蛋白质相互作用网络中识别病理性近视相关基因
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-07 DOI: 10.2174/0115748936268218231114070754
Jiyu Zhang, Tao Huang, Qiao Sun, Jian Zhang
Background:: Pathological myopia, a severe variant of myopia, extends beyond the typical refractive error associated with nearsightedness. While the condition has a strong genetic component, the intricate mechanisms of inheritance remain elusive. Some genes have been associated with the development of pathological myopia, but their exact roles are not fully understood. Objective:: This study aimed to identify novel genes associated with pathological myopia Methods:: Our study leveraged DisGeNET to identify 184 genes linked with high myopia and 39 genes related to degenerative myopia. To uncover additional pathological myopia-associated genes, we employed the random walk with restart algorithm to investigate the protein-protein interactions network. We used the previously identified 184 high myopia and 39 degenerative myopia genes as seed nodes. Results:: Through subsequent screening tests, we discarded genes with weak associations, yielding 103 new genes for high myopia and 33 for degenerative myopia. Conclusion:: We confirmed the association of certain genes, including six genes that were confirmed to be associated with both high and degenerative myopia. The newly discovered genes are helpful to uncover and understand the pathogenesis of myopia.
背景病理性近视是近视的一种严重变异,超出了与近视相关的典型屈光不正。虽然这种病症有很强的遗传因素,但其复杂的遗传机制仍然难以捉摸。一些基因与病理性近视的发生有关,但其确切作用尚不完全清楚。研究目的本研究旨在确定与病理性近视相关的新基因 方法::我们的研究利用 DisGeNET 发现了 184 个与高度近视相关的基因和 39 个与退化性近视相关的基因。为了发现更多的病理性近视相关基因,我们采用了随机行走与重启算法来研究蛋白质-蛋白质相互作用网络。我们将之前确定的 184 个高度近视基因和 39 个退化性近视基因作为种子节点。结果通过随后的筛选测试,我们剔除了关联性较弱的基因,得到了 103 个新的高度近视基因和 33 个新的退化性近视基因。结论我们证实了某些基因之间的关联,其中有 6 个基因被证实与高度近视和退化性近视都有关联。新发现的基因有助于揭示和了解近视的发病机理。
{"title":"Identifying Pathological Myopia Associated Genes with A Random Walk- Based Method in Protein-Protein Interaction Network","authors":"Jiyu Zhang, Tao Huang, Qiao Sun, Jian Zhang","doi":"10.2174/0115748936268218231114070754","DOIUrl":"https://doi.org/10.2174/0115748936268218231114070754","url":null,"abstract":"Background:: Pathological myopia, a severe variant of myopia, extends beyond the typical refractive error associated with nearsightedness. While the condition has a strong genetic component, the intricate mechanisms of inheritance remain elusive. Some genes have been associated with the development of pathological myopia, but their exact roles are not fully understood. Objective:: This study aimed to identify novel genes associated with pathological myopia Methods:: Our study leveraged DisGeNET to identify 184 genes linked with high myopia and 39 genes related to degenerative myopia. To uncover additional pathological myopia-associated genes, we employed the random walk with restart algorithm to investigate the protein-protein interactions network. We used the previously identified 184 high myopia and 39 degenerative myopia genes as seed nodes. Results:: Through subsequent screening tests, we discarded genes with weak associations, yielding 103 new genes for high myopia and 33 for degenerative myopia. Conclusion:: We confirmed the association of certain genes, including six genes that were confirmed to be associated with both high and degenerative myopia. The newly discovered genes are helpful to uncover and understand the pathogenesis of myopia.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138555910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering Microbe-disease Associations with Weighted Graph Convolution Networks and Taxonomy Common Tree 利用加权图卷积网络和分类公共树发现微生物与疾病的关联
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-04 DOI: 10.2174/0115748936270441231116093650
Jieqi Xing, Yu Shi, Xiaoquan Su, Shunyao Wu
Background:: Microbe-disease associations are integral to understanding complex dis-eases and their screening procedures. Objective:: While numerous computational methods have been developed to detect these associa-tions, their performance remains limited due to inadequate utilization of weighted inherent similari-ties and microbial taxonomy hierarchy. To address this limitation, we have introduced WTHMDA (weighted taxonomic heterogeneous network-based microbe-disease association), a novel deep learning framework. Methods:: WTHMDA combines a weighted graph convolution network and the microbial taxono-my common tree to predict microbe-disease associations effectively. The framework extracts mul-tiple microbe similarities from the taxonomy common tree, facilitating the construction of a mi-crobe-disease heterogeneous interaction network. Utilizing a weighted DeepWalk algorithm, node embeddings in the network incorporate weight information from the similarities. Subsequently, a deep neural network (DNN) model accurately predicts microbe-disease associations based on this interaction network. Results:: Extensive experiments on multiple datasets and case studies demonstrate WTHMDA's su-periority over existing approaches, particularly in predicting unknown associations. Conclusion:: Our proposed method offers a new strategy for discovering microbe-disease linkages, showcasing remarkable performance and enhancing the feasibility of identifying disease risk.
背景:微生物与疾病的关联是理解复杂疾病及其筛查程序的必要条件。虽然已经开发了许多计算方法来检测这些关联,但由于加权固有相似性和微生物分类层次的利用不足,它们的性能仍然有限。为了解决这一限制,我们引入了一种新的深度学习框架WTHMDA(加权分类异构网络微生物-疾病关联)。方法:WTHMDA将加权图卷积网络与微生物分类树相结合,有效预测微生物与疾病的关联。该框架从分类共同树中提取多个微生物相似性,促进了微生物-微生物-疾病异质相互作用网络的构建。利用加权DeepWalk算法,网络中的节点嵌入结合了相似度的权重信息。随后,深度神经网络(DNN)模型基于该相互作用网络准确预测微生物与疾病的关联。结果:对多个数据集的广泛实验和案例研究表明,WTHMDA优于现有方法,特别是在预测未知关联方面。结论:我们提出的方法为发现微生物与疾病的联系提供了一种新的策略,表现出显著的性能,并提高了识别疾病风险的可行性。
{"title":"Discovering Microbe-disease Associations with Weighted Graph Convolution Networks and Taxonomy Common Tree","authors":"Jieqi Xing, Yu Shi, Xiaoquan Su, Shunyao Wu","doi":"10.2174/0115748936270441231116093650","DOIUrl":"https://doi.org/10.2174/0115748936270441231116093650","url":null,"abstract":"Background:: Microbe-disease associations are integral to understanding complex dis-eases and their screening procedures. Objective:: While numerous computational methods have been developed to detect these associa-tions, their performance remains limited due to inadequate utilization of weighted inherent similari-ties and microbial taxonomy hierarchy. To address this limitation, we have introduced WTHMDA (weighted taxonomic heterogeneous network-based microbe-disease association), a novel deep learning framework. Methods:: WTHMDA combines a weighted graph convolution network and the microbial taxono-my common tree to predict microbe-disease associations effectively. The framework extracts mul-tiple microbe similarities from the taxonomy common tree, facilitating the construction of a mi-crobe-disease heterogeneous interaction network. Utilizing a weighted DeepWalk algorithm, node embeddings in the network incorporate weight information from the similarities. Subsequently, a deep neural network (DNN) model accurately predicts microbe-disease associations based on this interaction network. Results:: Extensive experiments on multiple datasets and case studies demonstrate WTHMDA's su-periority over existing approaches, particularly in predicting unknown associations. Conclusion:: Our proposed method offers a new strategy for discovering microbe-disease linkages, showcasing remarkable performance and enhancing the feasibility of identifying disease risk.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metabolomics: Recent Advances and Future Prospects Unveiled 代谢组学:最新进展和未来展望
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-04 DOI: 10.2174/0115748936270744231115110329
Shweta Sharma, Garima Singh, Mymoona Akhter
: In the era of genomics, fueled by advanced technologies and analytical tools, metabo-lomics has become a vital component in biomedical research. Its significance spans various do-mains, encompassing biomarker identification, uncovering underlying mechanisms and pathways, as well as the exploration of new drug targets and precision medicine. This article presents a com-prehensive overview of the latest developments in metabolomics techniques, emphasizing their wide-ranging applications across diverse research fields and underscoring their immense potential for future advancements.
在基因组学时代,在先进技术和分析工具的推动下,代谢组学已成为生物医学研究的重要组成部分。它的意义涵盖了各种各样的领域,包括生物标志物鉴定,揭示潜在的机制和途径,以及探索新的药物靶点和精准医学。本文全面概述了代谢组学技术的最新发展,强调了它们在不同研究领域的广泛应用,并强调了它们未来发展的巨大潜力。
{"title":"Metabolomics: Recent Advances and Future Prospects Unveiled","authors":"Shweta Sharma, Garima Singh, Mymoona Akhter","doi":"10.2174/0115748936270744231115110329","DOIUrl":"https://doi.org/10.2174/0115748936270744231115110329","url":null,"abstract":": In the era of genomics, fueled by advanced technologies and analytical tools, metabo-lomics has become a vital component in biomedical research. Its significance spans various do-mains, encompassing biomarker identification, uncovering underlying mechanisms and pathways, as well as the exploration of new drug targets and precision medicine. This article presents a com-prehensive overview of the latest developments in metabolomics techniques, emphasizing their wide-ranging applications across diverse research fields and underscoring their immense potential for future advancements.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder 堆叠- kcr:融合序列和自动编码器预测赖氨酸Crotonylation位点的堆叠模型
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-04 DOI: 10.2174/0115748936272040231117114252
Ying Liang, Suhui Li, Xiya You, You Guo, Jianjun Tang
Background:: Protein lysine crotonylation (Kcr), a newly discovered important post-translational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmen-tal defects and malignant transformation. Objective:: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computa-tional techniques. Methods:: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical proper-ties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. method: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results:: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Addi-tionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the cur-rent model. Conclusion:: These outcomes are additional evidence that Stacking-Kcr has strong application po-tential and generalization performance.
背景:蛋白质赖氨酸巴丁酰化(Protein lysine crotonylation, Kcr)是一种新发现的重要的翻译后修饰(PTM),它通常定位于转录起始位点,调控基因表达,与发育缺陷和恶性转化等多种病理状况有关。目的:确定Kcr位点有利于发现其生物学机制和开发治疗相关疾病的新药。然而,传统的实验方法是昂贵和低效的,需要新的计算技术的发展。在这项工作中,为了准确地识别Kcr位点,我们提出了一个称为堆叠-Kcr的集成学习模型。首先,从序列信息、理化性质和序列片段相似性中提取特征;然后,分别利用自动编码器和串行将序列信息和理化性质两个特征融合。最后,将融合后的两个特征和序列片段相似特征分别输入到四个基分类器中,利用一级预测结果构建元分类器,得到最终的预测结果。方法:在这项工作中,为了准确地识别Kcr位点,我们提出了一个称为堆叠-Kcr的集成学习模型。首先,从序列信息、理化性质、序列片段相似性等方面提取特征;然后,分别利用自动编码器和串行将序列信息和理化性质两个特征融合。最后,将融合后的两个特征和序列片段相似特征分别输入到四个基分类器中,利用一级预测结果构建元分类器,得到最终的预测结果。结果:该模型经五重交叉验证,准确率为0.828,AUC为0.910。这表明stack - kcr方法相对于传统的机器学习方法具有明显的优势。在独立测试集上,stack - kcr的准确率为84.89%,AUC为92.21%,高于其他先进工具的1.7%和0.8%。此外,我们在磷酸化位点上训练stack - kcr,结果优于目前的模型。结论:这些结果进一步证明了堆叠- kcr具有较强的应用潜力和泛化性能。
{"title":"Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder","authors":"Ying Liang, Suhui Li, Xiya You, You Guo, Jianjun Tang","doi":"10.2174/0115748936272040231117114252","DOIUrl":"https://doi.org/10.2174/0115748936272040231117114252","url":null,"abstract":"Background:: Protein lysine crotonylation (Kcr), a newly discovered important post-translational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmen-tal defects and malignant transformation. Objective:: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computa-tional techniques. Methods:: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical proper-ties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. method: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results:: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Addi-tionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the cur-rent model. Conclusion:: These outcomes are additional evidence that Stacking-Kcr has strong application po-tential and generalization performance.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking iProm-Yeast:基于ML堆叠的酵母启动子预测工具
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-04 DOI: 10.2174/0115748936256869231019113616
Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong
Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.
背景与目的:基因启动子作为转录起始位点附近的DNA调控元件,在基因转录调控中起着至关重要的作用。尽管有许多方法,包括比对信号和基于内容的启动子预测方法,但由于启动子序列缺乏明确的特征,准确识别启动子仍然具有挑战性。因此,已经提出了许多用于启动子识别的机器学习和深度学习模型,但这些工具的性能并不精确。最近的研究大多集中在鉴定sigma或植物启动子上。而酿酒酵母启动子的准确鉴定仍然是一个未被充分探索的领域。在这项研究中,我们介绍了一种酵母启动子的鉴定方法“iPromyeast”。利用真核酵母的基因组序列,我们研究了载体编码和启动子分类。此外,我们通过使用启动子序列而不是基因组的非启动子区域开发了一个更困难的阴性集。新开发的负重构方法改进了分类,并最大限度地减少了假阳性预测的数量。方法:为了克服与启动子预测相关的问题,我们研究了替代向量编码和特征提取方法。接下来,这些策略与几种机器学习算法和一维卷积神经网络模型相结合。我们的研究结果表明,伪二核苷酸组合更适合用于特征编码,而机器学习叠加方法对于精确的启动子分类是非常好的。此外,我们提供了一种使用启动子序列而不是非启动子区域的负重构方法,从而获得更高的分类性能和更少的假阳性预测。结果:基于5倍交叉验证的结果,所提出的预测因子iProm-Yeast具有很好的检测酿酒酵母启动子的潜力。准确度(Acc)为86.27%,灵敏度(Sn)为82.29%,特异性(Sp)为89.47%,马修斯相关系数(MCC)为0.72,受试者工作特征曲线下面积(AUROC)为0.98。我们还进行了跨物种分析,以确定iProm-Yeast在其他物种中的普遍性。结论:iProm-Yeast是一种准确鉴定酿酒酵母启动子的可靠方法。采用先进的矢量编码技术和负重构方法,提高了分类精度,减少了误报预测。此外,它还为研究人员提供了一个可靠和精确的网站服务器来研究不同生物的基因调控。
{"title":"iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking","authors":"Muhammad Shujaat, Hilal Tayara, Sunggoo Yoo, Kil To Chong","doi":"10.2174/0115748936256869231019113616","DOIUrl":"https://doi.org/10.2174/0115748936256869231019113616","url":null,"abstract":"Background and Objective:: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods:: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results:: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion:: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Super-enhancers Based on Mean-shift Undersampling 基于均值偏移欠采样的超增强子预测
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-12-01 DOI: 10.2174/0115748936268302231110111456
Han Cheng, Shumei Ding, Cangzhi Jia
Background:: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective:: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods:: This work adopted mean-shift to cluster majority class samples and selected five sets of balanced datasets for mouse and three sets of balanced datasets for humans to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results:: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion:: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.
背景:超级增强子是基于主转录因子、染色质调节因子或染色质标记的结合占用而定义的增强子簇。据报道,超级增强子在转录上比常规增强子更活跃,细胞类型特异性更强。因此,有必要从常规增强剂中识别超级增强剂。已经提出了各种计算方法来识别超增强剂作为辅助工具。然而,大多数方法使用ChIP-seq数据,缺少这部分数据会使预测器无法执行或无法达到令人满意的性能。目的:本研究的目的是提出一种基于多特征融合的叠加计算模型来识别人类和小鼠物种的超级增强子。方法:采用mean-shift对多数类样本进行聚类,选择5组小鼠平衡数据集和3组人类平衡数据集训练堆叠模型。将5类序列信息作为XGBoost分类器的输入,设计各分类器概率输出的平均值作为最终分类结果。结果:10倍交叉验证和跨细胞系验证的结果证明,与其他现有方法相比,我们的方法具有优越的性能。源代码和数据集可从https://github.com/Cheng-Han-max/SE_voting获得。结论:特征重要性分析表明,失配在前20个重要特征中所占比例最高。
{"title":"Prediction of Super-enhancers Based on Mean-shift Undersampling","authors":"Han Cheng, Shumei Ding, Cangzhi Jia","doi":"10.2174/0115748936268302231110111456","DOIUrl":"https://doi.org/10.2174/0115748936268302231110111456","url":null,"abstract":"Background:: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective:: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods:: This work adopted mean-shift to cluster majority class samples and selected five sets of balanced datasets for mouse and three sets of balanced datasets for humans to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results:: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion:: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138515002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCV Filter: A Hybrid Deep Learning Model for SARS-CoV-2 Variants Classification SCV滤波器:一种用于SARS-CoV-2变体分类的混合深度学习模型
IF 4 3区 生物学 Q1 Mathematics Pub Date : 2023-11-22 DOI: 10.2174/1574893618666230809121509
Han Wang, Jingyang Gao
Background: The high mutability of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) makes it easy for mutations to occur during transmission. As the epidemic continues to develop, several mutated strains have been produced. Researchers worldwide are working on the effective identification of SARS-CoV-2. Objective: In this paper, we propose a new deep learning method that can effectively identify SARSCoV- 2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Methods: Deep learning is effective in extracting rich features from sequence data, which has significant implications for the study of Coronavirus Disease 2019 (COVID-19), which has become prevalent in recent years. In this paper, we propose a new deep learning method that can effectively identify SARS-CoV-2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Results: The accuracy of the SCVfilter is 93.833% on Dataset-I consisting of different variant strains; 90.367% on Dataset-II consisting of data collected from China, Taiwan, and Hong Kong; and 79.701% on Dataset-III consisting of data collected from six continents (Africa, Asia, Europe, North America, Oceania, and South America). Conclusion: When using the SCV filter to process lengthy and high-homology SARS-CoV-2 data, it can automatically select features and accurately detect different variant strains of SARS-CoV-2. In addition, the SCV filter is sufficiently robust to handle the problems caused by sample imbalance and sequence incompleteness. Other: The SCVfilter is an open-source method available at https://github.com/deconvolutionw/ SCVfilter.
背景:严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的高易变性使其在传播过程中容易发生突变。随着疫情的继续发展,已经产生了几种变异菌株。世界各地的研究人员正在努力有效识别SARS-CoV-2。目的:本文提出了一种能够有效识别SARSCoV- 2变异序列的深度学习新方法——SCVfilter,它是一种以嵌入、注意残差网络和长短期记忆为组成部分的深度混合模型。方法:深度学习可以有效地从序列数据中提取丰富的特征,这对近年来流行的2019冠状病毒病(COVID-19)的研究具有重要意义。本文提出了一种能够有效识别SARS-CoV-2变异序列的深度学习新方法——SCVfilter,它是一种以嵌入、注意残差网络和长短期记忆为组成部分的深度混合模型。结果:在由不同变异菌株组成的Dataset-I上,SCVfilter的准确率为93.833%;来自中国、台湾和香港的数据在Dataset-II上占90.367%;在Dataset-III上占79.701%,包括来自六大洲(非洲、亚洲、欧洲、北美、大洋洲和南美洲)的数据。结论:利用SCV过滤器对冗长、高同源性的SARS-CoV-2数据进行处理时,可自动选择特征,准确检测出不同的SARS-CoV-2变异株。此外,SCV滤波器具有足够的鲁棒性,可以处理由样本不平衡和序列不完整引起的问题。其他:SCVfilter是一种开源方法,可在https://github.com/deconvolutionw/ SCVfilter上获得。
{"title":"SCV Filter: A Hybrid Deep Learning Model for SARS-CoV-2 Variants Classification","authors":"Han Wang, Jingyang Gao","doi":"10.2174/1574893618666230809121509","DOIUrl":"https://doi.org/10.2174/1574893618666230809121509","url":null,"abstract":"Background: The high mutability of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) makes it easy for mutations to occur during transmission. As the epidemic continues to develop, several mutated strains have been produced. Researchers worldwide are working on the effective identification of SARS-CoV-2. Objective: In this paper, we propose a new deep learning method that can effectively identify SARSCoV- 2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Methods: Deep learning is effective in extracting rich features from sequence data, which has significant implications for the study of Coronavirus Disease 2019 (COVID-19), which has become prevalent in recent years. In this paper, we propose a new deep learning method that can effectively identify SARS-CoV-2 Variant sequences, called SCVfilter, which is a deep hybrid model with embedding, attention residual network, and long short-term memory as components. Results: The accuracy of the SCVfilter is 93.833% on Dataset-I consisting of different variant strains; 90.367% on Dataset-II consisting of data collected from China, Taiwan, and Hong Kong; and 79.701% on Dataset-III consisting of data collected from six continents (Africa, Asia, Europe, North America, Oceania, and South America). Conclusion: When using the SCV filter to process lengthy and high-homology SARS-CoV-2 data, it can automatically select features and accurately detect different variant strains of SARS-CoV-2. In addition, the SCV filter is sufficiently robust to handle the problems caused by sample imbalance and sequence incompleteness. Other: The SCVfilter is an open-source method available at https://github.com/deconvolutionw/ SCVfilter.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138514985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors 利用机器学习和分子描述子预测皮毛样蛋白转录因子的dna结合位点
3区 生物学 Q1 Mathematics Pub Date : 2023-10-27 DOI: 10.2174/0115748936264122231016094702
Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain
Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.
转录因子因其在基因表达调控中的关键作用而在生物技术领域引起了极大的兴趣。Fur是革兰氏阴性菌中最重要的转录因子之一,是一种全球性的调节因子,被研究作为抗菌药物设计的治疗靶点。它的dna结合域,包含一个螺旋-螺旋-螺旋基序,是其最相关的特征之一。方法:在本研究中,我们评估了几种基于Fur超家族蛋白和其他螺旋-转-螺旋转录因子预测dna结合位点的机器学习算法,包括支持向量机(SVM)、随机森林(RF)、决策树(DT)和朴素贝叶斯(NB)。我们还测试了使用来自氨基酸序列和结合DNA的蛋白质片段结构的几个分子描述符的功效。在保持良好分类性能的前提下,采用特征选择过程在每种情况下选择较少的描述符。结果:使用12个序列衍生属性的SVM模型和使用9个结构衍生特征的DT模型获得了最好的结果,分别达到82%和76%的准确率。结论:所获得的性能表明,我们使用的描述符与预测dna结合位点相关,因为它们可以区分蛋白质的结合区和非结合区。
{"title":"Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors","authors":"Mauricio Arenas-Salinas, Jessica Lara Muñoz, José Antonio Reyes, Felipe Besoain","doi":"10.2174/0115748936264122231016094702","DOIUrl":"https://doi.org/10.2174/0115748936264122231016094702","url":null,"abstract":"Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features. Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance. Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively. Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136318847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NaProGraph: Network Analyzer for Interactions between Nucleic Acids and Proteins 核酸与蛋白质相互作用的网络分析仪
3区 生物学 Q1 Mathematics Pub Date : 2023-10-20 DOI: 10.2174/0115748936266189231004110412
Sajjad nematzadeh, Nizamettin Aydin, Zeyneb Kurt, Mahsa Torkamanian-Afshar
abstract: Interactions of RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and nucleic acids (NAs) and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. The tool we have developed is publicly available at https://naprolink.com/NaProGraph/ background: Interactions between RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. method: This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and NAs and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. conclusion: The NaProGraph tool serves as an effective online resource for researchers interested in studying interactions between nucleic acids and proteins. By leveraging a comprehensive dataset and providing various visualization and extraction capabilities, NaProGraph facilitates the exploration of macromolecular relationships and aids in understanding intracellular processes in living organisms. other: -
RNA和DNA与蛋白质的相互作用对于阐明生物体的细胞内过程、诊断疾病、设计适体药物以及其他应用至关重要。因此,研究这些大分子之间的关系对生命科学研究至关重要。本研究提出了一个在线网络提供者工具(naprogram),该工具提供了一个直观和用户友好的界面,用于研究核酸(NA)与蛋白质之间的相互作用。naprogram利用结构生物信息学研究合作实验室(RCSB)蛋白质数据库(PDB)中几乎所有相互作用的大分子的综合和精心策划的数据集。研究人员可以使用这个在线工具来关注PDB的特定部分,调查其相关关系,并可视化和提取相关信息。该工具提供了蛋白质和核酸(NAs)之间原子和残基的频率以及大分子初级结构的相似性的见解。此外,蛋白质的功能相似性可以通过Pfam的蛋白质家族和氏族来推断。我们开发的工具可在https://naprolink.com/NaProGraph/上公开获得。背景:RNA和DNA与蛋白质之间的相互作用对于阐明活生物体的细胞内过程、诊断疾病、设计适体药物和其他应用至关重要。因此,研究这些大分子之间的关系对生命科学研究至关重要。方法:本研究提出了一个在线网络提供者工具(NaProGraph),该工具为研究核酸(NA)与蛋白质之间的相互作用提供了一个直观和用户友好的界面。naprogram利用结构生物信息学研究合作实验室(RCSB)蛋白质数据库(PDB)中几乎所有相互作用的大分子的综合和精心策划的数据集。研究人员可以使用这个在线工具来关注PDB的特定部分,调查其相关关系,并可视化和提取相关信息。该工具提供了蛋白质和NAs之间原子和残基的频率以及大分子初级结构的相似性的见解。此外,蛋白质的功能相似性可以通过Pfam的蛋白质家族和氏族来推断。结论:napgraph工具为研究核酸与蛋白质相互作用的研究人员提供了一个有效的在线资源。通过利用一个全面的数据集,提供各种可视化和提取功能,nagraph促进了大分子关系的探索,并有助于理解活生物体的细胞内过程。其他:-
{"title":"NaProGraph: Network Analyzer for Interactions between Nucleic Acids and Proteins","authors":"Sajjad nematzadeh, Nizamettin Aydin, Zeyneb Kurt, Mahsa Torkamanian-Afshar","doi":"10.2174/0115748936266189231004110412","DOIUrl":"https://doi.org/10.2174/0115748936266189231004110412","url":null,"abstract":"abstract: Interactions of RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and nucleic acids (NAs) and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. The tool we have developed is publicly available at https://naprolink.com/NaProGraph/ background: Interactions between RNA and DNA with proteins are crucial for elucidating intracellular processes in living organisms, diagnosing disorders, designing aptamer drugs, and other applications. Therefore, investigating the relationships between these macromolecules is essential to life science research. method: This study proposes an online network provider tool (NaProGraph) that offers an intuitive and user-friendly interface for studying interactions between nucleic acids (NA) and proteins. NaProGraph utilizes a comprehensive and curated dataset encompassing nearly all interacting macromolecules in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Researchers can employ this online tool to focus on a specific portion of the PDB, investigate its associated relationships, and visualize and extract pertinent information. This tool provides insights into the frequency of atoms and residues between proteins and NAs and the similarity of the macromolecules' primary structures. Furthermore, the functional similarity of proteins can be inferred using protein families and clans from Pfam. conclusion: The NaProGraph tool serves as an effective online resource for researchers interested in studying interactions between nucleic acids and proteins. By leveraging a comprehensive dataset and providing various visualization and extraction capabilities, NaProGraph facilitates the exploration of macromolecular relationships and aids in understanding intracellular processes in living organisms. other: -","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135666197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Current Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1