首页 > 最新文献

Computational Biology and Chemistry最新文献

英文 中文
ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique ILYCROsite:基于 FCM-GRNN 欠采样技术的赖氨酸巴豆酰化位点鉴定
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-13 DOI: 10.1016/j.compbiolchem.2024.108212

Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: https://github.com/wmqskr/ILYCROsite.

蛋白质赖氨酸巴豆酰化是一种重要的翻译后修饰,可调节各种细胞活动。例如,组蛋白巴豆酰化会影响染色质结构并促进组蛋白替换。鉴定和了解赖氨酸巴豆酰化位点在蛋白质研究领域至关重要。然而,由于非组蛋白巴豆酰化位点的数量不断增加,现有的基于传统机器学习的分类器可能会遇到性能限制。鉴于深度学习技术在序列数据分析方面的独特优势,本研究提出了一种基于深度学习的新型巴豆酰化位点识别模型,以解决这一问题。本研究开发了一种基于 MLP-Attention 的模型来识别巴豆酰化位点。首先,使用三种特征提取策略,即氨基酸组成、K-mer和基于距离的残基特征提取策略,对巴豆化和非巴豆化序列进行编码。然后,为了平衡训练数据集,引入了结合模糊聚类和广义神经网络方法的 FCM-GRNN 欠采样算法。最后,为了提高巴豆酰化位点识别的有效性,我们探索了多种分类算法,并在相关实验性能比较的基础上,最终选择了多层感知器(MLP)结合叠加自注意机制来构建预测模型 ILYCROsite。独立测试和五倍交叉验证的结果表明,本研究提出的模型 ILYCROsite 具有优异的性能。值得注意的是,在独立测试集上,ILYCROsite 的 AUC 值达到了 87.93 %,明显优于现有的先进模型。此外,SHAP(Shapley Additive exPlanations)值用于分析特征的重要性及其对模型预测的影响。同时,为了方便研究人员使用本研究构建的预测模型,我们开发了一个预测程序来识别给定蛋白质序列中的巴豆酰化位点。该程序的数据和代码见:https://github.com/wmqskr/ILYCROsite。
{"title":"ILYCROsite: Identification of lysine crotonylation sites based on FCM-GRNN undersampling technique","authors":"","doi":"10.1016/j.compbiolchem.2024.108212","DOIUrl":"10.1016/j.compbiolchem.2024.108212","url":null,"abstract":"<div><p>Protein lysine crotonylation is an important post-translational modification that regulates various cellular activities. For example, histone crotonylation affects chromatin structure and promotes histone replacement. Identification and understanding of lysine crotonylation sites is crucial in the field of protein research. However, due to the increasing amount of non-histone crotonylation sites, existing classifiers based on traditional machine learning may encounter performance limitations. In order to address this problem, a novel deep learning-based model for identifying crotonylation sites is presented in this study, given the unique advantages of deep learning techniques for sequence data analysis. In this study, an MLP-Attention-based model was developed for the identification of crotonylation sites. Firstly, three feature extraction strategies, namely Amino Acid Composition, K-mer, and Distance-based residue features extraction strategy, were used to encode crotonylated and non-crotonylated sequences. Then, in order to balance the training dataset, the FCM-GRNN undersampling algorithm combining fuzzy clustering and generalized neural network approaches was introduced. Finally, to improve the effectiveness of crotonylation site identification, we explored various classification algorithms, and based on the relevant experimental performance comparisons, the multilayer perceptron (MLP) combined with the superimposed self-attention mechanism was finally selected to construct the prediction model ILYCROsite. The results obtained from independent testing and five-fold cross-validation demonstrated that the model proposed in this study, ILYCROsite, had excellent performance. Notably, on the independent test set, ILYCROsite achieves an AUC value of 87.93 %, which is significantly better than the existing state-of-the-art models. In addition, SHAP (Shapley Additive exPlanations) values were used to analyze the importance of features and their impact on model predictions. Meanwhile, in order to facilitate researchers to use the prediction model constructed in this study, we developed a prediction program to identify the crotonylation sites in a given protein sequence. The data and code for this program are available at: <span><span>https://github.com/wmqskr/ILYCROsite</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive bioinformatic analysis of the role of TGF-β1-stimulated activating transcription factor 3 by non-coding RNAs during breast cancer progression 对非编码 RNA 在乳腺癌进展过程中对 TGF-β1 刺激的活化转录因子 3 的作用进行全面的生物信息学分析
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-12 DOI: 10.1016/j.compbiolchem.2024.108208

A potent growth inhibitor for normal mammary epithelial cells is transforming growth factor beta 1 (TGF-β1). When breast tissues lose the anti-proliferative activity of this factor, invasion and bone metastases increase. Human breast cancer (hBC) cells express more activating transcription factor 3 (ATF3) when exposed to TGF-β1, and this transcription factor is essential for BC development and bone metastases. Non-coding RNAs (ncRNAs), including circular RNAs (circRNAs) and microRNAs (miRNAs), have emerged as key regulators controlling several cellular processes. In hBC cells, TGF-β1 stimulated the expression of hsa-miR-4653–5p that putatively targets ATF3. Bioinformatics analysis predicted that hsa-miR-4653–5p targets several key signaling components and transcription factors, including NFKB1, STAT1, STAT3, NOTCH1, JUN, TCF3, p300, NRF2, SUMO2, and NANOG, suggesting the diversified role of hsa-miR-4653–5p under physiological and pathological conditions. Despite the high abundance of hsa-miR-4653–5p in hBC cells, the ATF3 level remained elevated, indicating other ncRNAs could inhibit hsa-miR-4653–5p’s activity. In silico analysis identified several circRNAs having the binding sites for hsa-miR-4653–5p, indicating the sponging activity of circRNAs towards hsa-miR-4653–5p. The study's findings suggest that TGF-β1 regulates circRNAs and hsa-miR-4653–5p, which in turn affects ATF3 expression, thus influencing BC progression and bone metastasis. Therefore, focusing on the TGF-β1/circRNAs/hsa-miR-4653–5p/ATF3 network could lead to new ways of diagnosing and treating BC.

转化生长因子β1(TGF-β1)是正常乳腺上皮细胞的一种强效生长抑制因子。当乳腺组织失去这种因子的抗增殖活性时,侵袭和骨转移就会增加。人类乳腺癌(hBC)细胞暴露于 TGF-β1 时会表达更多的活化转录因子 3(ATF3),这种转录因子对 BC 的发展和骨转移至关重要。非编码 RNA(ncRNA),包括环状 RNA(circRNA)和微 RNA(miRNA),已成为控制多种细胞过程的关键调控因子。在 hBC 细胞中,TGF-β1 刺激了 hsa-miR-4653-5p 的表达,而 hsa-miR-4653-5p 可能是 ATF3 的靶标。生物信息学分析预测,hsa-miR-4653-5p靶向多个关键信号转导元件和转录因子,包括NFKB1、STAT1、STAT3、NOTCH1、JUN、TCF3、p300、NRF2、SUMO2和NANOG,这表明hsa-miR-4653-5p在生理和病理条件下发挥着多样化的作用。尽管hsa-miR-4653-5p在hBC细胞中含量很高,但ATF3水平仍然升高,这表明其他ncRNA可以抑制hsa-miR-4653-5p的活性。硅学分析发现了几个与hsa-miR-4653-5p有结合位点的circRNA,这表明circRNA对hsa-miR-4653-5p具有海绵活性。研究结果表明,TGF-β1可调控circRNAs和hsa-miR-4653-5p,进而影响ATF3的表达,从而影响BC的进展和骨转移。因此,关注TGF-β1/circRNAs/hsa-miR-4653-5p/ATF3网络可为诊断和治疗BC提供新方法。
{"title":"A comprehensive bioinformatic analysis of the role of TGF-β1-stimulated activating transcription factor 3 by non-coding RNAs during breast cancer progression","authors":"","doi":"10.1016/j.compbiolchem.2024.108208","DOIUrl":"10.1016/j.compbiolchem.2024.108208","url":null,"abstract":"<div><p>A potent growth inhibitor for normal mammary epithelial cells is transforming growth factor beta 1 (TGF-β1). When breast tissues lose the anti-proliferative activity of this factor, invasion and bone metastases increase. Human breast cancer (hBC) cells express more activating transcription factor 3 (ATF3) when exposed to TGF-β1, and this transcription factor is essential for BC development and bone metastases. Non-coding RNAs (ncRNAs), including circular RNAs (circRNAs) and microRNAs (miRNAs), have emerged as key regulators controlling several cellular processes. In hBC cells, TGF-β1 stimulated the expression of hsa-miR-4653–5p that putatively targets ATF3. Bioinformatics analysis predicted that hsa-miR-4653–5p targets several key signaling components and transcription factors, including NFKB1, STAT1, STAT3, NOTCH1, JUN, TCF3, p300, NRF2, SUMO2, and NANOG, suggesting the diversified role of hsa-miR-4653–5p under physiological and pathological conditions. Despite the high abundance of hsa-miR-4653–5p in hBC cells, the ATF3 level remained elevated, indicating other ncRNAs could inhibit hsa-miR-4653–5p’s activity. <em>In silico</em> analysis identified several circRNAs having the binding sites for hsa-miR-4653–5p, indicating the sponging activity of circRNAs towards hsa-miR-4653–5p. The study's findings suggest that TGF-β1 regulates circRNAs and hsa-miR-4653–5p, which in turn affects ATF3 expression, thus influencing BC progression and bone metastasis. Therefore, focusing on the TGF-β1/circRNAs/hsa-miR-4653–5p/ATF3 network could lead to new ways of diagnosing and treating BC.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling therapeutic biomarkers and druggable targets in ALS: An integrative microarray analysis, molecular docking, and structural dynamic studies
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-12 DOI: 10.1016/j.compbiolchem.2024.108211

Amyotrophic lateral sclerosis (ALS), commonly known as Lou Gehrig's disease, is a debilitating neurodegenerative disorder characterized by the progressive degeneration of nerve cells in the brain and spinal cord. Despite extensive research, its precise etiology remains elusive, and early diagnosis is challenging due to the absence of specific tests. This study aimed to identify potential blood-based biomarkers for early ALS detection and monitoring using datasets from whole blood samples (GSE112680) and oligodendrocytes, astrocytes, and fibroblasts (GSE87385) obtained from the NCBI-GEO repository. Through bioinformatics analysis, including protein-protein interactions and molecular pathway analyses, we identified differentially expressed genes (DEGs) associated with ALS. Notably, ALS2, ADH7, ALDH8A1, ALDH3B1, ABHD2, ABHD17B, ABHD12, ABHD13, PGAM2, AURKB, ANAPC11, VAPA, UNC45B, and TNNT2 emerged as top-ranked DEGs, implicated in drug metabolism, protein depalmytilation, and the AKT/mTOR signaling pathways. Among these, AurKB established as a potential therapeutic biomarker with relevance to various neurological conditions. Consequently, AurKB was selected for identifying potential therapeutic molecules and utilized for in silico structural characterization studies. Exploration of the IMPATT database led to the discovery of a lead compound similar to Fostamatinib, currently used for AurKB. Initial molecular docking and MMGBSA-based binding energy analysis were followed by molecular dynamics simulation (MDS) and free energy landscape (FEL) analysis to validate the ligand's binding efficacy and understand dynamic processes within the biological system. The identified potential biomarkers and lead molecule provide novel insights into the correlation between blood cell transcripts and ALS pathology, paving the way for blood-based diagnostic tools for early ALS detection and ongoing disease monitoring.

肌萎缩性脊髓侧索硬化症(ALS)俗称卢伽雷氏病,是一种使人衰弱的神经退行性疾病,其特征是大脑和脊髓中的神经细胞逐渐退化。尽管进行了广泛的研究,但其确切的病因仍然难以捉摸,而且由于缺乏特定的检测方法,早期诊断也很困难。本研究旨在利用从 NCBI-GEO 数据库中获得的全血样本(GSE112680)和少突胶质细胞、星形胶质细胞和成纤维细胞(GSE87385)数据集,鉴定用于早期 ALS 检测和监测的潜在血液生物标志物。通过生物信息学分析,包括蛋白-蛋白相互作用和分子通路分析,我们确定了与 ALS 相关的差异表达基因(DEGs)。值得注意的是,ALS2、ADH7、ALDH8A1、ALDH3B1、ABHD2、ABHD17B、ABHD12、ABHD13、PGAM2、AURKB、ANAPC11、VAPA、UNC45B 和 TNNT2 成为排名靠前的 DEGs,它们与药物代谢、蛋白质脱钙和 AKT/mTOR 信号通路有关。其中,AurKB 是一个潜在的治疗生物标志物,与各种神经疾病相关。因此,AurKB 被选中用于鉴定潜在的治疗分子,并被用于硅结构特征研究。通过探索 IMPATT 数据库,发现了一种与目前用于 AurKB 的 Fostamatinib 相似的先导化合物。初步的分子对接和基于 MMGBSA 的结合能分析之后,进行了分子动力学模拟(MDS)和自由能谱(FEL)分析,以验证配体的结合效能并了解生物系统内的动态过程。已确定的潜在生物标志物和先导分子为了解血细胞转录本与 ALS 病理学之间的相关性提供了新的视角,为基于血液的诊断工具铺平了道路,从而可用于 ALS 的早期检测和持续疾病监测。
{"title":"Unveiling therapeutic biomarkers and druggable targets in ALS: An integrative microarray analysis, molecular docking, and structural dynamic studies","authors":"","doi":"10.1016/j.compbiolchem.2024.108211","DOIUrl":"10.1016/j.compbiolchem.2024.108211","url":null,"abstract":"<div><p>Amyotrophic lateral sclerosis (ALS), commonly known as Lou Gehrig's disease, is a debilitating neurodegenerative disorder characterized by the progressive degeneration of nerve cells in the brain and spinal cord. Despite extensive research, its precise etiology remains elusive, and early diagnosis is challenging due to the absence of specific tests. This study aimed to identify potential blood-based biomarkers for early ALS detection and monitoring using datasets from whole blood samples (GSE112680) and oligodendrocytes, astrocytes, and fibroblasts (GSE87385) obtained from the NCBI-GEO repository. Through bioinformatics analysis, including protein-protein interactions and molecular pathway analyses, we identified differentially expressed genes (DEGs) associated with ALS. Notably, ALS2, ADH7, ALDH8A1, ALDH3B1, ABHD2, ABHD17B, ABHD12, ABHD13, PGAM2, AURKB, ANAPC11, VAPA, UNC45B, and TNNT2 emerged as top-ranked DEGs, implicated in drug metabolism, protein depalmytilation, and the AKT/mTOR signaling pathways. Among these, AurKB established as a potential therapeutic biomarker with relevance to various neurological conditions. Consequently, AurKB was selected for identifying potential therapeutic molecules and utilized for <em>in silico</em> structural characterization studies. Exploration of the IMPATT database led to the discovery of a lead compound similar to Fostamatinib, currently used for AurKB. Initial molecular docking and MMGBSA-based binding energy analysis were followed by molecular dynamics simulation (MDS) and free energy landscape (FEL) analysis to validate the ligand's binding efficacy and understand dynamic processes within the biological system. The identified potential biomarkers and lead molecule provide novel insights into the correlation between blood cell transcripts and ALS pathology, paving the way for blood-based diagnostic tools for early ALS detection and ongoing disease monitoring.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142243425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurately identifying positive and negative regulation of apoptosis using fusion features and machine learning methods 利用融合特征和机器学习方法准确识别细胞凋亡的正向和负向调控
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-11 DOI: 10.1016/j.compbiolchem.2024.108207

Apoptotic proteins play a crucial role in the apoptosis process, ensuring a balance between cell proliferation and death. Thus, further elucidating the regulatory mechanisms of apoptosis will enhance our understanding of their functions. However, the development of computational methods to accurately identify positive and negative regulation of apoptosis remains a significant challenge. This work proposes a machine learning model based on multi-feature fusion to effectively identify the roles of positive and negative regulation of apoptosis. Initially, we constructed a reliable benchmark dataset containing 200 positive regulation of apoptosis and 241 negative regulation of apoptosis proteins. Subsequently, we developed a classifier that combines the support vector machine (SVM) with pseudo composition of k-spaced amino acid pairs (PseCKSAAP), composition transition distribution (CTD), dipeptide deviation from expected mean (DDE), and PSSM-composition to identify these proteins. Analysis of variance (ANOVA) was employed to select optimized features that could yield the maximum prediction performance. Evaluating the proposed model on independent data revealed and achieved an accuracy of 0.781 with an AUROC of 0.837, demonstrating our model's potent capabilities.

凋亡蛋白在细胞凋亡过程中发挥着至关重要的作用,确保了细胞增殖和死亡之间的平衡。因此,进一步阐明细胞凋亡的调控机制将增进我们对其功能的了解。然而,开发计算方法以准确识别细胞凋亡的正负调控仍是一项重大挑战。本研究提出了一种基于多特征融合的机器学习模型,以有效识别细胞凋亡正负调控的作用。首先,我们构建了一个可靠的基准数据集,其中包含 200 个凋亡正调控蛋白和 241 个凋亡负调控蛋白。随后,我们开发了一种分类器,将支持向量机(SVM)与k间隔氨基酸对的伪组成(PseCKSAAP)、组成转换分布(CTD)、二肽与预期平均值的偏差(DDE)和PSSM-组成相结合来识别这些蛋白质。采用方差分析(ANOVA)来选择能产生最大预测性能的优化特征。在独立数据上对所提出的模型进行评估后发现,该模型的准确率达到了 0.781,AUROC 为 0.837,这证明了我们模型的强大功能。
{"title":"Accurately identifying positive and negative regulation of apoptosis using fusion features and machine learning methods","authors":"","doi":"10.1016/j.compbiolchem.2024.108207","DOIUrl":"10.1016/j.compbiolchem.2024.108207","url":null,"abstract":"<div><p>Apoptotic proteins play a crucial role in the apoptosis process, ensuring a balance between cell proliferation and death. Thus, further elucidating the regulatory mechanisms of apoptosis will enhance our understanding of their functions. However, the development of computational methods to accurately identify positive and negative regulation of apoptosis remains a significant challenge. This work proposes a machine learning model based on multi-feature fusion to effectively identify the roles of positive and negative regulation of apoptosis. Initially, we constructed a reliable benchmark dataset containing 200 positive regulation of apoptosis and 241 negative regulation of apoptosis proteins. Subsequently, we developed a classifier that combines the support vector machine (SVM) with pseudo composition of <em>k</em>-spaced amino acid pairs (PseCKSAAP), composition transition distribution (CTD), dipeptide deviation from expected mean (DDE), and PSSM-composition to identify these proteins. Analysis of variance (ANOVA) was employed to select optimized features that could yield the maximum prediction performance. Evaluating the proposed model on independent data revealed and achieved an accuracy of 0.781 with an AUROC of 0.837, demonstrating our model's potent capabilities.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular descriptors and in silico studies of 4-((5-(decylthio)-4-methyl-4n-1,2,4-triazol-3-yl)methyl)morpholine as a potential drug for the treatment of fungal pathologies 4-((5-(癸硫基)-4-甲基-4n-1,2,4-三唑-3-基)甲基)吗啉作为治疗真菌病症的潜在药物的分子描述符和硅学研究
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-11 DOI: 10.1016/j.compbiolchem.2024.108206

The article explores the polypharmacological profiling of 4-((5-(decylthio)-4-methyl-4H-1,2,4-triazole-3-yl)methyl)morpholine as a potential antimicrobial agent. The study utilized 15148 electronic pharmacophore models of organisms, ranked by the Tversky index. Detailed analysis revealed classical bonding patterns with selected enzymes, identifying key amino acid residues involved in complex formation. Protein target prediction was conducted through various stages using the Galaxy web service, including ligand structure creation, pharmacophore alignment, and target ranking. The activities of the molecules against 1G6C, 2W6O, 3G7F, 3OWU, 4IVR, and 4TZT proteins were compared. Docking studies with PyMOL and Discovery Studio Visualizer revealed binding to thymidine kinase, thiamine phosphate synthase, and biotin carboxylase with promising binding affinities. These interactions suggest potential antibacterial and antiviral effects, warranting further virtual screening and in-depth studies for the development of effective antimicrobial drugs. Calculations of the molecules were made with the gaussian package program. Calculations were made on the 6-31++g** basis set at B3LYP, HF, and M062X levels with Gaussian software. Afterwards, the 0–100 ns interaction of the molecule with the highest activity was examined.

文章探讨了作为潜在抗菌剂的 4-[(5-(癸硫基)-4-甲基-4H-1,2,4-三唑-3-基)甲基]吗啉的多药理学特征。这项研究利用了 15148 个生物电子药理模型,并按照 Tversky 指数进行了排序。详细分析揭示了与选定酶的经典结合模式,确定了参与复合物形成的关键氨基酸残基。蛋白质靶标预测是通过银河网络服务的各个阶段进行的,包括配体结构创建、药源比对和靶标排序。比较了分子对 1G6C、2W6O、3G7F、3OWU、4IVR 和 4TZT 蛋白的活性。利用 PyMOL 和 Discovery Studio Visualizer 进行的对接研究显示,这些分子与胸苷激酶、磷酸硫胺素合成酶和生物素羧化酶的结合亲和力良好。这些相互作用表明它们具有潜在的抗菌和抗病毒作用,值得进一步进行虚拟筛选和深入研究,以开发有效的抗菌药物。分子的计算采用高斯软件包程序。计算是在 B3LYP、HF 和 M062X 水平的 6-31++g** 基础集上用高斯软件进行的。随后,研究了活性最高的分子的 0-100 ns 相互作用。
{"title":"Molecular descriptors and in silico studies of 4-((5-(decylthio)-4-methyl-4n-1,2,4-triazol-3-yl)methyl)morpholine as a potential drug for the treatment of fungal pathologies","authors":"","doi":"10.1016/j.compbiolchem.2024.108206","DOIUrl":"10.1016/j.compbiolchem.2024.108206","url":null,"abstract":"<div><p>The article explores the polypharmacological profiling of 4-((5-(decylthio)-4-methyl-4H-1,2,4-triazole-3-yl)methyl)morpholine as a potential antimicrobial agent. The study utilized 15148 electronic pharmacophore models of organisms, ranked by the Tversky index. Detailed analysis revealed classical bonding patterns with selected enzymes, identifying key amino acid residues involved in complex formation. Protein target prediction was conducted through various stages using the Galaxy web service, including ligand structure creation, pharmacophore alignment, and target ranking. The activities of the molecules against 1G6C, 2W6O, 3G7F, 3OWU, 4IVR, and 4TZT proteins were compared. Docking studies with PyMOL and Discovery Studio Visualizer revealed binding to thymidine kinase, thiamine phosphate synthase, and biotin carboxylase with promising binding affinities. These interactions suggest potential antibacterial and antiviral effects, warranting further virtual screening and in-depth studies for the development of effective antimicrobial drugs. Calculations of the molecules were made with the gaussian package program. Calculations were made on the 6-31++g** basis set at B3LYP, HF, and M062X levels with Gaussian software. Afterwards, the 0–100 ns interaction of the molecule with the highest activity was examined.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An open source in silico workflow to assist in the design of fusion proteins 协助设计融合蛋白的开源硅学工作流程
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-10 DOI: 10.1016/j.compbiolchem.2024.108209

Fusion proteins have the potential to become the new norm for targeted therapeutic treatments. Highly specific payload delivery can be achieved by combining custom targeting moieties, such as VHH domains, with active parts of proteins that have a particular activity not naturally targeted to the intended cells. Conversely, novel drug products may make use of the highly specific targeting properties of naturally occurring proteins and combine them with custom payloads. When designing such a product, there is rarely a known structure for the final construct which makes it difficult to assess molecular behaviour that may ultimately impact therapeutic outcome. Considering the time and cost of expressing a construct, optimising the purification procedure, obtaining sufficient quantities for biophysical characterisation, and performing structural studies in vitro, there is an enormous benefit to conduct in silico studies ahead of wet lab work.

By following a repeatable, streamlined, and fast workflow of molecular dynamics assessment, it is possible to eliminate low-performing candidates from costly experimental work. There are, however, many aspects to consider when designing a novel fusion protein and it is crucial not to overlook some elements. In this work, we suggest a set of user-friendly, open-source methods which can be used to screen fusion protein candidates from the sequence alone. We used the light chain and translocation domain of botulinum toxin A (BoNT/A) fused with a selected VHH domain, termed here LC-HN-VHH, as a case study for a general approach to designing, modelling, and simulating fusion proteins. Its behaviour in silico correlated well with initial in vitro work, with SEC HPLC showing multiple protein states in solution and a dynamic protein shifting between these states over time without loss of material.

融合蛋白有可能成为靶向治疗的新标准。通过将定制的靶向分子(如 VHH 结构域)与蛋白质的活性部分相结合,可以实现高特异性的有效载荷传递,而这些蛋白质具有的特殊活性并非天然针对目标细胞。相反,新型药物产品可以利用天然蛋白质的高度特异性靶向特性,并将其与定制的有效载荷相结合。在设计此类产品时,最终构建物很少有已知的结构,因此很难评估可能最终影响治疗效果的分子行为。考虑到表达构建体、优化纯化程序、获得足够数量的生物物理特征以及在体外进行结构研究所需的时间和成本,在湿实验室工作之前进行硅学研究大有裨益。然而,在设计新型融合蛋白时需要考虑很多方面,因此不忽略某些因素至关重要。在这项工作中,我们提出了一套用户友好的开源方法,可用于仅从序列筛选候选融合蛋白。我们将肉毒杆菌毒素 A(BoNT/A)的轻链和易位结构域与一个选定的 VHH 结构域融合(在此称为 LC-HN-VHH),作为设计、建模和模拟融合蛋白一般方法的案例研究。它在硅学中的表现与最初的体外研究结果有很好的相关性,SEC HPLC 显示溶液中有多种蛋白质状态,而且随着时间的推移,蛋白质会在这些状态之间发生动态变化,而不会丢失物质。
{"title":"An open source in silico workflow to assist in the design of fusion proteins","authors":"","doi":"10.1016/j.compbiolchem.2024.108209","DOIUrl":"10.1016/j.compbiolchem.2024.108209","url":null,"abstract":"<div><p>Fusion proteins have the potential to become the new norm for targeted therapeutic treatments. Highly specific payload delivery can be achieved by combining custom targeting moieties, such as V<sub>HH</sub> domains, with active parts of proteins that have a particular activity not naturally targeted to the intended cells. Conversely, novel drug products may make use of the highly specific targeting properties of naturally occurring proteins and combine them with custom payloads. When designing such a product, there is rarely a known structure for the final construct which makes it difficult to assess molecular behaviour that may ultimately impact therapeutic outcome. Considering the time and cost of expressing a construct, optimising the purification procedure, obtaining sufficient quantities for biophysical characterisation, and performing structural studies <em>in vitro</em>, there is an enormous benefit to conduct <em>in silico</em> studies ahead of wet lab work.</p><p>By following a repeatable, streamlined, and fast workflow of molecular dynamics assessment, it is possible to eliminate low-performing candidates from costly experimental work. There are, however, many aspects to consider when designing a novel fusion protein and it is crucial not to overlook some elements. In this work, we suggest a set of user-friendly, open-source methods which can be used to screen fusion protein candidates from the sequence alone. We used the light chain and translocation domain of botulinum toxin A (BoNT/A) fused with a selected V<sub>HH</sub> domain, termed here LC-H<sub>N</sub>-V<sub>HH</sub>, as a case study for a general approach to designing, modelling, and simulating fusion proteins. Its behaviour <em>in silico</em> correlated well with initial <em>in vitro</em> work, with SEC HPLC showing multiple protein states in solution and a dynamic protein shifting between these states over time without loss of material.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S147692712400197X/pdfft?md5=6a2955eabc805b598902561119497014&pid=1-s2.0-S147692712400197X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-silico exploration of Attukal Kizhangu L. compounds: Promising candidates for periodontitis treatment 对Attukal Kizhangu L.复合物的分子内探索:有望治疗牙周炎的候选化合物
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-07 DOI: 10.1016/j.compbiolchem.2024.108186

A medicinal pteridophyte known as Attukal Kizhangu L. has been used to cure patients for centuries by administering plant parts based on conventional and common practices. Regarding its biological functions, significant use and advancement have been made. Extract of Attukal Kizhangu L. is the subject of the current study, which uses network pharmacology as its foundation. Three targeted compounds such as α-Lapachone, Dihydrochalcone, and Piperine were chosen for additional research from the 17 Phytoconstituents that were filtered out by the Coupled UPLC-HRMS study since they followed to Lipinski rule and showed no toxicity. The pharmacokinetics and physicochemical properties of these targeted compounds were analyzed by using three online web servers pkCSM, Swiss ADME, and Protox-II. This is the first in silico study to document these compound's effectiveness against the standard drug DOX in treating Periodontitis. The Swiss target prediction database was used to retrieve the targets of these compounds. DisGeNET and GeneCards were used to extract the targets of periodontitis. The top five hub genes were identified by Cytoscape utilizing the protein-protein interaction of common genes, from which two hub genes and three binding proteins of collagenase enzymes were used for further studies AA2, PGE2, PI2, TNFA, and PGP. The minimal binding energy observed in molecular docking, indicative of the optimal docking score, corresponds to the highest affinity between the protein and ligand. To corroborate the findings of the docking study, molecular dynamics (MD) simulations, and MMPBSA calculations were conducted for the complexes involving AA2-α-LPHE, AA2-DHC, and AA2-PPR. This research concluded that AA2-DHC was the most stable complex among the investigated interactions, surpassing the stability of the other complexes examined in comparison with the standard drug DOX. Overall, the findings supported the promotion of widespread use of Attukal Kizhangu L. in clinics as a potential therapeutic agent or may be employed for the treatment of acute and chronic Periodontitis.

几个世纪以来,人们一直在使用一种名为阿图卡尔-奇占古(Attukal Kizhangu L.)的药用翼手目植物,根据传统和常见的做法,通过施用植物部分来治疗病人。关于它的生物功能,已经有了重要的应用和进展。本研究以阿图卡尔-基赞古提取物为主题,以网络药理学为基础。由于α-拉帕醌、二氢查尔酮和胡椒碱符合利宾斯基规则且无毒性,因此从耦合 UPLC-HRMS 研究筛选出的 17 种植物成分中选择了三种目标化合物进行进一步研究。我们使用 pkCSM、Swiss ADME 和 Protox-II 这三个在线网络服务器分析了这些目标化合物的药代动力学和理化性质。这是首次在硅学研究中证明这些化合物在治疗牙周炎方面对标准药物 DOX 的有效性。瑞士靶点预测数据库用于检索这些化合物的靶点。DisGeNET 和 GeneCards 被用来提取牙周炎的靶点。Cytoscape利用常见基因的蛋白质-蛋白质相互作用确定了前五个中心基因,并从中选出两个中心基因和三个胶原酶结合蛋白用于进一步研究:AA2、PGE2、PI2、TNFA和PGP。分子对接中观察到的最小结合能(表明最佳对接得分)与蛋白质和配体之间的最高亲和力相对应。为了证实对接研究的结果,对涉及 AA2-α-LPHE、AA2-DHC 和 AA2-PPR 的复合物进行了分子动力学(MD)模拟和 MMPBSA 计算。研究结果表明,在所研究的相互作用中,AA2-DHC 是最稳定的复合物,与标准药物 DOX 相比,其稳定性超过了所研究的其他复合物。总之,研究结果支持在临床上广泛使用阿图卡尔-奇占古作为一种潜在的治疗剂,或可用于治疗急性和慢性牙周炎。
{"title":"In-silico exploration of Attukal Kizhangu L. compounds: Promising candidates for periodontitis treatment","authors":"","doi":"10.1016/j.compbiolchem.2024.108186","DOIUrl":"10.1016/j.compbiolchem.2024.108186","url":null,"abstract":"<div><p>A medicinal pteridophyte known as <em>Attukal Kizhangu L.</em> has been used to cure patients for centuries by administering plant parts based on conventional and common practices. Regarding its biological functions, significant use and advancement have been made. Extract of <em>Attukal Kizhangu L</em>. is the subject of the current study, which uses network pharmacology as its foundation. Three targeted compounds such as α-Lapachone, Dihydrochalcone, and Piperine were chosen for additional research from the 17 Phytoconstituents that were filtered out by the Coupled UPLC-HRMS study since they followed to Lipinski rule and showed no toxicity. The pharmacokinetics and physicochemical properties of these targeted compounds were analyzed by using three online web servers pkCSM, Swiss ADME, and Protox-II. This is the first in silico study to document these compound's effectiveness against the standard drug DOX in treating Periodontitis. The Swiss target prediction database was used to retrieve the targets of these compounds. DisGeNET and GeneCards were used to extract the targets of periodontitis. The top five hub genes were identified by Cytoscape utilizing the protein-protein interaction of common genes, from which two hub genes and three binding proteins of collagenase enzymes were used for further studies AA2, PGE2, PI2, TNFA, and PGP. The minimal binding energy observed in molecular docking, indicative of the optimal docking score, corresponds to the highest affinity between the protein and ligand. To corroborate the findings of the docking study, molecular dynamics (MD) simulations, and MMPBSA calculations were conducted for the complexes involving AA2-α-LPHE, AA2-DHC, and AA2-PPR. This research concluded that AA2-DHC was the most stable complex among the investigated interactions, surpassing the stability of the other complexes examined in comparison with the standard drug DOX. Overall, the findings supported the promotion of widespread use of <em>Attukal Kizhangu L</em>. in clinics as a potential therapeutic agent or may be employed for the treatment of acute and chronic Periodontitis.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1476927124001749/pdfft?md5=cc078a48df0d99b41835b4a5653a05a8&pid=1-s2.0-S1476927124001749-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142164612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-expression network and survival analysis of breast cancer inflammation and immune system hallmark genes 乳腺癌炎症和免疫系统标志基因的共表达网络和生存分析
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-06 DOI: 10.1016/j.compbiolchem.2024.108204

The tertiary lymphoid structure (TLS) plays a central role in cancer immune response, and its gene expression pattern, called the TLS signature, has shown prognostic value in breast cancer. The formation of TLS and tumor-associated high endothelial venules (TA-HEVs), responsible for lymphocytic infiltration within the TLS, is associated with the expression of cancer hallmark genes (CHGs) related to immunity and inflammation. In this study, we performed co-expression network analysis of immune- and inflammation-related CHGs to identify predictive genes for breast cancer. In total, 382 immune- and inflammation-related CHGs with high expression variance were extracted from the GSE86166 microarray dataset of patients with breast cancer. CHGs were classified into five modules by applying weighted gene co-expression network analysis. The survival analysis results for each module showed that one module comprising 45 genes was statistically significant for relapse-free and overall survival. Four network properties identified key genes in this module with high prognostic prediction abilities: CD34, CXCL12, F2RL2, JAM2, PROS1, RAPGEF3, and SELP. The prognostic accuracy of the seven genes in breast cancer was synergistic and exceeded that of other predictors in both small and large public datasets. Enrichment analysis predicted that these genes had functions related to leukocyte infiltration of TA-HEVs. There was a positive correlation between key gene expression and the TLS signature, suggesting that gene expression levels are associated with TLS density. Co-expression network analysis of inflammation- and immune-related CHGs allowed us to identify genes that share a standard function in cancer immunity and have a high prognostic predictive value. This analytical approach may contribute to the identification of prognostic genes in TLS.

三级淋巴结构(TLS)在癌症免疫反应中起着核心作用,其基因表达模式被称为TLS特征,在乳腺癌中显示出预后价值。TLS和肿瘤相关高内皮静脉(TA-HEVs)的形成负责TLS内的淋巴细胞浸润,与免疫和炎症相关的癌症标志基因(CHGs)的表达有关。在这项研究中,我们对免疫和炎症相关的 CHGs 进行了共表达网络分析,以确定乳腺癌的预测基因。我们从乳腺癌患者的 GSE86166 微阵列数据集中共提取了 382 个具有高表达差异的免疫和炎症相关 CHGs。通过加权基因共表达网络分析,将CHGs分为五个模块。每个模块的生存分析结果显示,由 45 个基因组成的一个模块对无复发生存率和总生存率具有统计学意义。四个网络属性确定了该模块中具有较高预后预测能力的关键基因:CD34、CXCL12、F2RL2、JAM2、PROS1、RAPGEF3 和 SELP。这七个基因在乳腺癌中的预后准确性具有协同作用,在小型和大型公共数据集中都超过了其他预测因子。富集分析预测,这些基因的功能与TA-HEV的白细胞浸润有关。关键基因表达与TLS特征之间存在正相关,表明基因表达水平与TLS密度相关。通过对炎症和免疫相关CHG的共表达网络分析,我们确定了在癌症免疫中具有相同标准功能并具有较高预后预测价值的基因。这种分析方法可能有助于确定TLS的预后基因。
{"title":"Co-expression network and survival analysis of breast cancer inflammation and immune system hallmark genes","authors":"","doi":"10.1016/j.compbiolchem.2024.108204","DOIUrl":"10.1016/j.compbiolchem.2024.108204","url":null,"abstract":"<div><p>The tertiary lymphoid structure (TLS) plays a central role in cancer immune response, and its gene expression pattern, called the TLS signature, has shown prognostic value in breast cancer. The formation of TLS and tumor-associated high endothelial venules (TA-HEVs), responsible for lymphocytic infiltration within the TLS, is associated with the expression of cancer hallmark genes (CHGs) related to immunity and inflammation. In this study, we performed co-expression network analysis of immune- and inflammation-related CHGs to identify predictive genes for breast cancer. In total, 382 immune- and inflammation-related CHGs with high expression variance were extracted from the GSE86166 microarray dataset of patients with breast cancer. CHGs were classified into five modules by applying weighted gene co-expression network analysis. The survival analysis results for each module showed that one module comprising 45 genes was statistically significant for relapse-free and overall survival. Four network properties identified key genes in this module with high prognostic prediction abilities: <em>CD34</em>, <em>CXCL12</em>, <em>F2RL2</em>, <em>JAM2</em>, <em>PROS1</em>, <em>RAPGEF3</em>, and <em>SELP</em>. The prognostic accuracy of the seven genes in breast cancer was synergistic and exceeded that of other predictors in both small and large public datasets. Enrichment analysis predicted that these genes had functions related to leukocyte infiltration of TA-HEVs. There was a positive correlation between key gene expression and the TLS signature, suggesting that gene expression levels are associated with TLS density. Co-expression network analysis of inflammation- and immune-related CHGs allowed us to identify genes that share a standard function in cancer immunity and have a high prognostic predictive value. This analytical approach may contribute to the identification of prognostic genes in TLS.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1476927124001920/pdfft?md5=1c407a366ff621563e7bacfdd48a6bb7&pid=1-s2.0-S1476927124001920-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142172702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AScirRNA: A novel computational approach to discover abiotic stress-responsive circular RNAs in plant genome AScirRNA:发现植物基因组中对非生物胁迫有反应的环状 RNA 的新型计算方法
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-06 DOI: 10.1016/j.compbiolchem.2024.108205

In the realm of plant biology, understanding the intricate regulatory mechanisms governing stress responses stands as a pivotal pursuit. Circular RNAs (circRNAs), emerging as critical players in gene regulation, have garnered attention in recent days for their potential roles in abiotic stress adaptation. A comprehensive grasp of circRNAs' functions in stress response offers avenues for breeders to manipulating plants to develop abiotic stress resistant crop cultivars to thrive in challenging climates. This study pioneers a machine learning-based model for predicting abiotic stress-responsive circRNAs. The K-tuple nucleotide composition (KNC) and Pseudo KNC (PKNC) features were utilized to numerically represent circRNAs. Three different feature selection strategies were employed to select relevant and non-redundant features. Eight shallow and four deep learning algorithms were evaluated to build the final predictive model. Following five-fold cross-validation process, XGBoost learning algorithm demonstrated superior performance with LightGBM-chosen 260 KNC features (Accuracy: 74.55 %, auROC: 81.23 %, auPRC: 76.52 %) and 160 PKNC features (Accuracy: 74.32 %, auROC: 81.04 %, auPRC: 76.43 %), over other combinations of learning algorithms and feature selection techniques. Further, the robustness of the developed models were evaluated using an independent test dataset, where the overall accuracy, auROC and auPRC were found to be 73.13 %, 72.34 % and 72.68 % for KNC feature set and 73.52 %, 79.53 % and 73.09 % for PKNC feature set, respectively. This computational approach was also integrated into an online prediction tool, AScirRNA (https://iasri-sg.icar.gov.in/ascirna/) for easy prediction by the users. Both the proposed model and the developed tool are poised to augment ongoing efforts in identifying stress-responsive circRNAs in plants.

在植物生物学领域,了解支配胁迫反应的复杂调控机制是一项关键的追求。环状 RNA(circRNA)作为基因调控的关键角色,因其在非生物胁迫适应中的潜在作用而在近期备受关注。全面掌握 circRNAs 在应激反应中的功能为育种者提供了一条途径,他们可以通过操纵植物来培育抗非生物应激的作物栽培品种,从而在充满挑战的气候条件下茁壮成长。本研究开创了一种基于机器学习的模型,用于预测非生物胁迫响应性 circRNA。该模型利用 K 元组核苷酸组成(KNC)和伪 KNC(PKNC)特征对 circRNA 进行数字表示。研究人员采用了三种不同的特征选择策略来选择相关的非冗余特征。对八种浅层学习算法和四种深度学习算法进行了评估,以建立最终的预测模型。经过五倍交叉验证过程,XGBoost 学习算法在使用 LightGBM 选择的 260 个 KNC 特征(准确率:74.55 %,auROC:81.23 %,auPRC:76.52 %)和 160 个 PKNC 特征(准确率:74.32 %,auROC:81.04 %,auPRC:76.43 %)时表现出优于其他学习算法和特征选择技术组合的性能。此外,还使用独立测试数据集对所开发模型的鲁棒性进行了评估,发现 KNC 特征集的总体准确率、auROC 和 auPRC 分别为 73.13 %、72.34 % 和 72.68 %,PKNC 特征集的总体准确率、auROC 和 auPRC 分别为 73.52 %、79.53 % 和 73.09 %。这种计算方法还被集成到在线预测工具 AScirRNA (https://iasri-sg.icar.gov.in/ascirna/) 中,方便用户进行预测。所提出的模型和所开发的工具都将为目前鉴定植物胁迫响应性 circRNA 的工作提供帮助。
{"title":"AScirRNA: A novel computational approach to discover abiotic stress-responsive circular RNAs in plant genome","authors":"","doi":"10.1016/j.compbiolchem.2024.108205","DOIUrl":"10.1016/j.compbiolchem.2024.108205","url":null,"abstract":"<div><p>In the realm of plant biology, understanding the intricate regulatory mechanisms governing stress responses stands as a pivotal pursuit. Circular RNAs (circRNAs), emerging as critical players in gene regulation, have garnered attention in recent days for their potential roles in abiotic stress adaptation. A comprehensive grasp of circRNAs' functions in stress response offers avenues for breeders to manipulating plants to develop abiotic stress resistant crop cultivars to thrive in challenging climates. This study pioneers a machine learning-based model for predicting abiotic stress-responsive circRNAs. The K-tuple nucleotide composition (KNC) and Pseudo KNC (PKNC) features were utilized to numerically represent circRNAs. Three different feature selection strategies were employed to select relevant and non-redundant features. Eight shallow and four deep learning algorithms were evaluated to build the final predictive model. Following five-fold cross-validation process, XGBoost learning algorithm demonstrated superior performance with LightGBM-chosen 260 KNC features (Accuracy: 74.55 %, auROC: 81.23 %, auPRC: 76.52 %) and 160 PKNC features (Accuracy: 74.32 %, auROC: 81.04 %, auPRC: 76.43 %), over other combinations of learning algorithms and feature selection techniques. Further, the robustness of the developed models were evaluated using an independent test dataset, where the overall accuracy, auROC and auPRC were found to be 73.13 %, 72.34 % and 72.68 % for KNC feature set and 73.52 %, 79.53 % and 73.09 % for PKNC feature set, respectively. This computational approach was also integrated into an online prediction tool, AScirRNA (<span><span>https://iasri-sg.icar.gov.in/ascirna/</span><svg><path></path></svg></span>) for easy prediction by the users. Both the proposed model and the developed tool are poised to augment ongoing efforts in identifying stress-responsive circRNAs in plants.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCSGMDA: A dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations DCSGMDA:基于堆叠式深度学习协作梯度分解的双通道卷积模型,用于预测 miRNA 与疾病的关联性
IF 2.6 4区 生物学 Q2 BIOLOGY Pub Date : 2024-09-04 DOI: 10.1016/j.compbiolchem.2024.108201

Numerous studies have shown that microRNAs (miRNAs) play a key role in human diseases as critical biomarkers. Its abnormal expression is often accompanied by the emergence of specific diseases. Therefore, studying the relationship between miRNAs and diseases can deepen the insights of their pathogenesis, grasp the process of disease onset and development, and promote drug research of specific diseases. However, many undiscovered relationships between miRNAs and diseases remain, significantly limiting research on miRNA-disease correlations. To explore more potential correlations, we propose a dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations (DCSGMDA). Firstly, we constructed similarity networks for miRNAs and diseases, as well as an association relationship network. Secondly, potential features were fully mined using stacked deep learning and gradient decomposition networks, along with dual-channel convolutional neural networks. Finally, correlations were scored by a multilayer perceptron. We performed 5-fold and 10-fold cross-validation experiments on DCSGMDA using two datasets based on the Human MicroRNA Disease Database (HMDD). Additionally, parametric, ablation, and comparative experiments, along with case studies, were conducted. The experimental results demonstrate that DCSGMDA performs well in predicting miRNA-disease associations.

大量研究表明,微小核糖核酸(miRNA)作为重要的生物标志物,在人类疾病中发挥着关键作用。其异常表达往往伴随着特定疾病的出现。因此,研究 miRNA 与疾病的关系可以加深对疾病发病机制的认识,把握疾病的发生和发展过程,促进特定疾病的药物研究。然而,miRNA 与疾病之间仍存在许多未被发现的关系,极大地限制了 miRNA 与疾病相关性的研究。为了探索更多潜在的相关性,我们提出了一种基于堆叠深度学习协作梯度分解的双通道卷积模型来预测miRNA与疾病的关联(DCSGMDA)。首先,我们构建了 miRNA 与疾病的相似性网络以及关联关系网络。其次,利用堆叠深度学习和梯度分解网络以及双通道卷积神经网络充分挖掘潜在特征。最后,通过多层感知器对相关性进行评分。我们使用基于人类微RNA疾病数据库(HMDD)的两个数据集对DCSGMDA进行了5倍和10倍交叉验证实验。此外,还进行了参数实验、消融实验、比较实验以及案例研究。实验结果表明,DCSGMDA 在预测 miRNA 与疾病的关联方面表现良好。
{"title":"DCSGMDA: A dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations","authors":"","doi":"10.1016/j.compbiolchem.2024.108201","DOIUrl":"10.1016/j.compbiolchem.2024.108201","url":null,"abstract":"<div><p>Numerous studies have shown that microRNAs (miRNAs) play a key role in human diseases as critical biomarkers. Its abnormal expression is often accompanied by the emergence of specific diseases. Therefore, studying the relationship between miRNAs and diseases can deepen the insights of their pathogenesis, grasp the process of disease onset and development, and promote drug research of specific diseases. However, many undiscovered relationships between miRNAs and diseases remain, significantly limiting research on miRNA-disease correlations. To explore more potential correlations, we propose a dual-channel convolutional model based on stacked deep learning collaborative gradient decomposition for predicting miRNA-disease associations (DCSGMDA). Firstly, we constructed similarity networks for miRNAs and diseases, as well as an association relationship network. Secondly, potential features were fully mined using stacked deep learning and gradient decomposition networks, along with dual-channel convolutional neural networks. Finally, correlations were scored by a multilayer perceptron. We performed 5-fold and 10-fold cross-validation experiments on DCSGMDA using two datasets based on the Human MicroRNA Disease Database (HMDD). Additionally, parametric, ablation, and comparative experiments, along with case studies, were conducted. The experimental results demonstrate that DCSGMDA performs well in predicting miRNA-disease associations.</p></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142157643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Biology and Chemistry
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1