首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
Gene regulatory network inference based on modified adaptive lasso. 基于改进自适应套索的基因调控网络推断。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 Epub Date: 2025-01-21 DOI: 10.1142/S0219720024500264
Chao Li, Xiaoran Huang, Xiao Luo, Xiaohui Lin

Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.

基因调控网络(grn)揭示了基因间的调控相互作用,为解释生物过程提供了直观的工具。然而,如何在高维小样本的情况下,从基因表达数据中识别出基因之间的直接关系是一个关键的挑战。本文提出了一种新的基于改进的自适应最小绝对收缩和选择算子(MALasso)的GRN推理方法。MALasso在距离相关的基础上扩展了样本数量,并定义了一种新的自适应lasso加权方式,在迭代过程中去除网络的假正边。利用DREAM挑战的模拟数据和基因表达数据验证了该方法的性能。MALasso与自适应套索等六种最新方法的比较结果表明,在大多数情况下,MALasso优于AUROCC和AUPRC的竞争方法,并且具有更好的直接边缘和间接边缘的区分能力。因此,通过修改自适应lasso的自适应加权方式,MALasso可以更准确地检测线性和非线性关系,去除假阳性边,识别基因之间的直接关系。
{"title":"Gene regulatory network inference based on modified adaptive lasso.","authors":"Chao Li, Xiaoran Huang, Xiao Luo, Xiaohui Lin","doi":"10.1142/S0219720024500264","DOIUrl":"10.1142/S0219720024500264","url":null,"abstract":"<p><p>Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450026"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The use of 4D data-independent acquisition-based proteomic analysis and machine learning to reveal potential biomarkers for stress levels. 利用基于 4D 数据独立采集的蛋白质组分析和机器学习来揭示压力水平的潜在生物标志物。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 Epub Date: 2024-11-15 DOI: 10.1142/S0219720024500252
Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang

Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.

研究表明,长期承受压力的人患心理应激障碍的风险可能更高。目前,心理压力主要由专业医生使用评分量表进行评估,这可能容易产生主观偏见和量表的局限性。因此,探索更客观、准确、高效的生物标志物来评估个体的心理压力水平势在必行。在本研究中,我们利用四维数据独立采集(4D-DIA)蛋白质组学进行定量蛋白质分析,然后采用支持向量机(SVM)结合SHAP解释算法来识别心理压力水平的潜在生物标志物。随后,通过机器学习分类和从知识图谱中获得的大量先验知识实现了生物标记物的验证。我们使用两批数据对生物标记物进行了交叉验证,结果显示,甘油醛-3-磷酸脱氢酶和纤连蛋白的组合产生的平均曲线下面积(AUC)为 92%,平均准确率为 86%,平均 F1 得分为 79%,平均灵敏度为 83%。因此,这种组合可能是检测压力水平以预防心理应激障碍的一种潜在方法。
{"title":"The use of 4D data-independent acquisition-based proteomic analysis and machine learning to reveal potential biomarkers for stress levels.","authors":"Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang","doi":"10.1142/S0219720024500252","DOIUrl":"10.1142/S0219720024500252","url":null,"abstract":"<p><p>Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450025"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Author index Volume 22 (2024).
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 DOI: 10.1142/S0219720024990014
{"title":"Author index Volume 22 (2024).","authors":"","doi":"10.1142/S0219720024990014","DOIUrl":"https://doi.org/10.1142/S0219720024990014","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2499001"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASAP-DTA: Predicting drug-target binding affinity with adaptive structure aware networks.
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 Epub Date: 2025-02-01 DOI: 10.1142/S0219720024500288
Weibin Ding, Shaohua Jiang, Ting Xu, Zhijian Lyu

The prediction of drug-target affinity (DTA) is crucial for efficiently identifying potential targets for drug repurposing, thereby reducing resource wastage. In this paper, we propose a novel graph-based deep learning model for DTA that leverages adaptive structure-aware pooling for graph processing. Our approach integrates a self-attention mechanism with an enhanced graph neural network to capture the significance of each node in the graph, marking a significant advancement in graph feature extraction. Specifically, adjacent nodes in the 2D molecular graph are aggregated into clusters, with the features of these clusters weighted according to their attention scores to form the final molecular representation. In terms of model architecture, we utilize both global and hierarchical pooling, and assess the performance of the model on multiple benchmark datasets. The evaluation results on the KIBA dataset show that our model achieved the lowest mean squared error (MSE) of 0.126, which is a 0.5% reduction compared to the best-performing baseline method. Additionally, to validate the generalization capabilities of the model, we conduct comparative experiments on regression and binary classification tasks. The results demonstrate that our model outperforms previous models in both types of tasks.

{"title":"ASAP-DTA: Predicting drug-target binding affinity with adaptive structure aware networks.","authors":"Weibin Ding, Shaohua Jiang, Ting Xu, Zhijian Lyu","doi":"10.1142/S0219720024500288","DOIUrl":"https://doi.org/10.1142/S0219720024500288","url":null,"abstract":"<p><p>The prediction of drug-target affinity (DTA) is crucial for efficiently identifying potential targets for drug repurposing, thereby reducing resource wastage. In this paper, we propose a novel graph-based deep learning model for DTA that leverages adaptive structure-aware pooling for graph processing. Our approach integrates a self-attention mechanism with an enhanced graph neural network to capture the significance of each node in the graph, marking a significant advancement in graph feature extraction. Specifically, adjacent nodes in the 2D molecular graph are aggregated into clusters, with the features of these clusters weighted according to their attention scores to form the final molecular representation. In terms of model architecture, we utilize both global and hierarchical pooling, and assess the performance of the model on multiple benchmark datasets. The evaluation results on the KIBA dataset show that our model achieved the lowest mean squared error (MSE) of 0.126, which is a 0.5% reduction compared to the best-performing baseline method. Additionally, to validate the generalization capabilities of the model, we conduct comparative experiments on regression and binary classification tasks. The results demonstrate that our model outperforms previous models in both types of tasks.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450028"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on similarity retrieval method based on mass spectral entropy. 基于质谱熵的相似性检索方法研究。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 Epub Date: 2025-02-01 DOI: 10.1142/S0219720024500276
Li-Ping Wu, Li Yong, Xiang Cheng, Yang Zhou

Compound identification in small molecule research relies on comparing experimental mass spectra with mass spectral databases. However, unequal data lengths often lead to inefficient and inaccurate retrieval. Moreover, the similarity calculation methods used by commercial software have limitations. To address these issues, two mass spectrometry data processing methods namely the "splicing-filling method" and the "matching-filling method" have been proposed. In addition, an information entropy-based similarity calculation method for mass spectra is presented. The alignment method converts mass spectra of different lengths for unknown and known compounds into equal-length mass spectra, allowing more accurate calculation of similarities between mass spectra. Information entropy measurements are used to quantify the differences in intensity distributions in the aligned mass spectral data, which are then used to compare the degree of similarity between different mass spectra. The results of the example validation show that the two data alignment methods can effectively solve the problem of unequal lengths of mass spectral data in similarity calculation. The results of the mass spectral entropy method are reliable and suitable for the identification of mass spectra.

{"title":"Research on similarity retrieval method based on mass spectral entropy.","authors":"Li-Ping Wu, Li Yong, Xiang Cheng, Yang Zhou","doi":"10.1142/S0219720024500276","DOIUrl":"https://doi.org/10.1142/S0219720024500276","url":null,"abstract":"<p><p>Compound identification in small molecule research relies on comparing experimental mass spectra with mass spectral databases. However, unequal data lengths often lead to inefficient and inaccurate retrieval. Moreover, the similarity calculation methods used by commercial software have limitations. To address these issues, two mass spectrometry data processing methods namely the \"splicing-filling method\" and the \"matching-filling method\" have been proposed. In addition, an information entropy-based similarity calculation method for mass spectra is presented. The alignment method converts mass spectra of different lengths for unknown and known compounds into equal-length mass spectra, allowing more accurate calculation of similarities between mass spectra. Information entropy measurements are used to quantify the differences in intensity distributions in the aligned mass spectral data, which are then used to compare the degree of similarity between different mass spectra. The results of the example validation show that the two data alignment methods can effectively solve the problem of unequal lengths of mass spectral data in similarity calculation. The results of the mass spectral entropy method are reliable and suitable for the identification of mass spectra.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450027"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring relationship between hypercholesterolemia and instability of atherosclerotic plaque - An approach based on a matrix population model.
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-01 DOI: 10.1142/S021972002450029X
Mateusz Twardawa, Kaja Gutowska, Piotr Formanowicz

Background: Cardiovascular diseases have long been studied to identify their causal factors and counteract them effectively. Atherosclerosis, an inflammatory process of the blood vessel wall, is a common cardiovascular disease. Among the many well-known risk factors, hypercholesterolemia is undoubtedly a significant condition for atherosclerotic plaque formation and is linked to atherosclerosis on many levels, i.e. cell interactions, cytokines levels, diet, and lifestyle. Current studies suggest that controlling balance between proinflammatory (M1) and anti-inflammatory (M2) types of macrophages may be used for patient condition improvement and necrotic core reduction. Methods: This study considered the effects of hypercholesterolemia on the population dynamics of macrophages (M0, M1, M2, foam cells) in atherosclerotic plaque. A mathematical model using a matrix approach to population dynamics was proposed and tested in various scenarios. In order to check model sensitivity and variability associated with error propagation, the uncertainty analysis was performed based on the Monte Carlo approach. Results: Simulations of macrophage population dynamics provided the assessment of necrotic core development and plaque instability. Excess lipid levels emerged as the most critical factor for necrotic core development. However, plaque growth can be significantly slowed if macrophages and foam cells can maintain proper lipid levels. This balance may be disrupted by proinflammatory lipids that eventually will increase plaque size, what is also reflected by M1/M2 dynamics. Conclusion: Hypercholesterolemia accelerates atherosclerosis development, leading to earlier cardiovascular incidents. In silico results suggest that reducing lipid intake and portion of proinflammatory lipids is crucial to slowing plaque development and reducing rupture risk, all of which requires preserving fragile M1/M2 balance. Targeting the inflammatory microenvironment and macrophage polarization represents a promising approach for atherosclerosis management.

{"title":"Exploring relationship between hypercholesterolemia and instability of atherosclerotic plaque - An approach based on a matrix population model.","authors":"Mateusz Twardawa, Kaja Gutowska, Piotr Formanowicz","doi":"10.1142/S021972002450029X","DOIUrl":"https://doi.org/10.1142/S021972002450029X","url":null,"abstract":"<p><p><b>Background:</b> Cardiovascular diseases have long been studied to identify their causal factors and counteract them effectively. Atherosclerosis, an inflammatory process of the blood vessel wall, is a common cardiovascular disease. Among the many well-known risk factors, hypercholesterolemia is undoubtedly a significant condition for atherosclerotic plaque formation and is linked to atherosclerosis on many levels, i.e. cell interactions, cytokines levels, diet, and lifestyle. Current studies suggest that controlling balance between proinflammatory (<i>M</i>1) and anti-inflammatory (<i>M</i>2) types of macrophages may be used for patient condition improvement and necrotic core reduction. <b>Methods:</b> This study considered the effects of hypercholesterolemia on the population dynamics of macrophages (<i>M</i>0, <i>M</i>1, <i>M</i>2, foam cells) in atherosclerotic plaque. A mathematical model using a matrix approach to population dynamics was proposed and tested in various scenarios. In order to check model sensitivity and variability associated with error propagation, the uncertainty analysis was performed based on the Monte Carlo approach. <b>Results:</b> Simulations of macrophage population dynamics provided the assessment of necrotic core development and plaque instability. Excess lipid levels emerged as the most critical factor for necrotic core development. However, plaque growth can be significantly slowed if macrophages and foam cells can maintain proper lipid levels. This balance may be disrupted by proinflammatory lipids that eventually will increase plaque size, what is also reflected by <i>M</i>1/<i>M</i>2 dynamics. <b>Conclusion:</b> Hypercholesterolemia accelerates atherosclerosis development, leading to earlier cardiovascular incidents. <i>In silico</i> results suggest that reducing lipid intake and portion of proinflammatory lipids is crucial to slowing plaque development and reducing rupture risk, all of which requires preserving fragile <i>M</i>1/<i>M</i>2 balance. Targeting the inflammatory microenvironment and macrophage polarization represents a promising approach for atherosclerosis management.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450029"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving drug-target interaction prediction through dual-modality fusion with InteractNet. 通过 InteractNet 的双模态融合改进药物-靶点相互作用预测。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 Epub Date: 2024-11-11 DOI: 10.1142/S0219720024500240
Baozhong Zhu, Runhua Zhang, Tengsheng Jiang, Zhiming Cui, Jing Chen, Hongjie Wu

In the drug discovery process, accurate prediction of drug-target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.

在药物发现过程中,准确预测药物与靶点的相互作用对于加速新药开发至关重要。然而,现有方法在处理复杂的生物分子相互作用时仍面临许多挑战。为此,我们提出了一种新的深度学习框架,它结合了蛋白质的结构信息和序列特征,通过双模融合提供全面的特征表示。该框架不仅整合了拓扑自适应图卷积网络和多头注意力机制,还引入了自屏蔽注意力机制,以确保每个蛋白质结合位点都能关注自身的独特特征及其与配体的相互作用。在多个公开数据集上的实验结果表明,我们的方法在预测性能上明显优于传统的机器学习和图神经网络方法。此外,我们的方法还能有效识别和解释关键的分子相互作用,为理解药物与靶点之间的复杂关系提供了新的见解。
{"title":"Improving drug-target interaction prediction through dual-modality fusion with InteractNet.","authors":"Baozhong Zhu, Runhua Zhang, Tengsheng Jiang, Zhiming Cui, Jing Chen, Hongjie Wu","doi":"10.1142/S0219720024500240","DOIUrl":"https://doi.org/10.1142/S0219720024500240","url":null,"abstract":"<p><p>In the drug discovery process, accurate prediction of drug-target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450024"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales. SAKit:集所有功能于一身的分析管道,用于识别大尺度和小尺度变异事件产生的新型蛋白质。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 DOI: 10.1142/S0219720024500227
Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei

Background: Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of "novel proteins" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. Results: We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. Conclusion: SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.

背景:基因突变导致必需蛋白失活或异常激活,可能引发细胞信号通路的改变甚至功能障碍,最终导致癌前病变和癌症的发生。突变和这种功能障碍会导致产生不属于传统人类蛋白质组的 "新型蛋白质"。对这些蛋白质进行鉴定,对于揭示有前景的药物靶点和设计创新的治疗模型具有深远的潜力。尽管随着核苷酸测序技术的广泛应用,出现了多种检测 DNA 或 RNA 变异的工具,但这些方法主要针对点突变,在检测大规模和组合突变方面表现不佳。此外,这些工具的结果仅限于基因组和转录组水平,不能提供基因改变产生的相应蛋白质信息。结果:我们开发了测序分析工具包(SAKit),这是一种用于混合测序分析的生物信息学管道,整合了长读程和短读程 RNA 测序数据。长读数因其出色的覆盖能力,可用于检测基因融合、外显子跳转、内含子保留和非编码区异常表达等大规模变异。短读数可在断点和剪接接头处验证这些发现。相反,短读数因其超强的测序深度,可用于鉴定小规模变异,包括单核苷酸变异、缺失和插入,长读数可提供额外的验证。SAKit 可使用由基因组参考文献和注释数据组成的种间配置文件进行分析,因此适用于人类和小鼠研究。此外,SAKit 还采用了分层过滤方法来剔除低置信度变异,并利用开放阅读框(ORF)分析将识别出的变异转化为蛋白质序列。结论SAKit 是一款功能强大、用途广泛的生物信息学工具,设计用于从 RNA-seq 数据中全面鉴定大规模和小规模变异,从而促进新型蛋白质的发现。该管道整合了长读程和短读程测序数据的分析,为基因组学和转录组学研究人员提供了强大的解决方案。SAKit 可免费访问并开源,可通过 GitHub (https://github.com/therarna/SAKit) 和 Docker 镜像 https://hub.docker.com/repository/docker/therarna) 获得。SAKit 主要在 Snakemake 框架内使用 Python 实现,确保了科学界的可重复性、可扩展性和易用性。
{"title":"SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.","authors":"Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei","doi":"10.1142/S0219720024500227","DOIUrl":"https://doi.org/10.1142/S0219720024500227","url":null,"abstract":"<p><p><i>Background:</i> Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of \"novel proteins\" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. <i>Results:</i> We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. <i>Conclusion:</i> SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450022"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular dynamics simulations of ribosome-binding sites in theophylline-responsive riboswitch associated with improving the gene expression regulation in chloroplasts. 叶绿素反应性核糖开关中与改善叶绿体基因表达调控有关的核糖体结合位点的分子动力学模拟。
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-01 Epub Date: 2024-10-30 DOI: 10.1142/S0219720024500239
Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi

The existence of an efficient inducible transgene expression system is a valuable tool in recombinant protein production. The synthetic theophylline-responsive riboswitch (theo.RS) can be replaced in the 5[Formula: see text] untranslated region of an mRNA and control the translation of downstream gene in chloroplasts in response to the binding with a ligand molecule, theophylline. One of the drawbacks associated with the efficiency of the theo.RS is the leak in the RS structure allowing undesired background translation when the switch is expected to be off. The purpose of this study was to detect the factors causing the leak of the theo.RS in the off mode, using molecular dynamics (MD) simulations the appropriate balancing of the simulation system, using the necessary commands, a 40[Formula: see text]ns simulation was conducted. Analysis of the solvent-accessible surface area for both ribosome-binding site (RBS) regions indicated that nucleotide 79 of the theo.RS, a guanine, had the highest surface exposure to ribosome access. These results were verified with the study of hydrogen bonding of RBS regions with the RNA structure. Therefore, redesigning the RBS regions and avoiding the unmasked nucleotide(s) in the structure may improve the tightness of theo.RS in off mode resulting in the efficient inhibition of translation.

高效的诱导转基因表达系统是重组蛋白质生产的重要工具。合成的茶碱反应性核糖开关(theo.RS)可被置换到 mRNA 的 5[式:见正文]非翻译区,并在与配体分子茶碱结合时控制叶绿体中下游基因的翻译。与 Theo.RS 的效率有关的缺点之一是 RS 结构中的泄漏,当开关预期关闭时,会出现不想要的背景翻译。本研究的目的是利用分子动力学(MD)模拟来检测导致 Theo.RS 在关闭模式下发生泄漏的因素。使用必要的命令对模拟系统进行适当平衡后,进行了 40[公式:见正文]ns 模拟。对两个核糖体结合位点(RBS)区域的可溶解表面积的分析表明,theo.RS 的第 79 号核苷酸(鸟嘌呤)在核糖体进入时具有最大的表面暴露。对 RBS 区域与 RNA 结构的氢键研究也验证了这些结果。因此,重新设计 RBS 区域并避免结构中的未屏蔽核苷酸可能会提高关闭模式下 theo.RS 的紧密性,从而有效抑制翻译。
{"title":"Molecular dynamics simulations of ribosome-binding sites in theophylline-responsive riboswitch associated with improving the gene expression regulation in chloroplasts.","authors":"Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi","doi":"10.1142/S0219720024500239","DOIUrl":"https://doi.org/10.1142/S0219720024500239","url":null,"abstract":"<p><p>The existence of an efficient inducible transgene expression system is a valuable tool in recombinant protein production. The synthetic theophylline-responsive riboswitch (theo.RS) can be replaced in the 5[Formula: see text] untranslated region of an mRNA and control the translation of downstream gene in chloroplasts in response to the binding with a ligand molecule, theophylline. One of the drawbacks associated with the efficiency of the theo.RS is the leak in the RS structure allowing undesired background translation when the switch is expected to be off. The purpose of this study was to detect the factors causing the leak of the theo.RS in the off mode, using molecular dynamics (MD) simulations the appropriate balancing of the simulation system, using the necessary commands, a 40[Formula: see text]ns simulation was conducted. Analysis of the solvent-accessible surface area for both ribosome-binding site (RBS) regions indicated that nucleotide 79 of the theo.RS, a guanine, had the highest surface exposure to ribosome access. These results were verified with the study of hydrogen bonding of RBS regions with the RNA structure. Therefore, redesigning the RBS regions and avoiding the unmasked nucleotide(s) in the structure may improve the tightness of theo.RS in off mode resulting in the efficient inhibition of translation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450023"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of a multi-tissue compound-target interaction network of Qingfei Paidu decoction in COVID-19 treatment based on deep learning and transcriptomic analysis. 基于深度学习和转录组学分析构建清瘟派杜煎剂治疗COVID-19的多组织化合物-靶标相互作用网络
IF 0.9 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-08-01 Epub Date: 2024-07-20 DOI: 10.1142/S0219720024500161
Xia Li, Xuetong Zhao, Xinjian Yu, Jianping Zhao, Xiangdong Fang

The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.

清瘟解毒汤(QFPDD)是一种广受赞誉的治疗方剂,在全国范围内用于冠状病毒病 2019(COVID-19)的临床治疗。清瘟派杜汤具有多成分、多靶点、多途径的协同治疗作用。然而,QFPDD 中各种成分和靶点之间错综复杂的相互作用及其在多个组织中的系统效应仍未确定。为了解决这个问题,我们对 QFPDD 的化学成分进行了定性分析。我们将多组织转录组分析与深度学习模型 GraphDTA 相结合,以筛选 QFPDD 在多个组织中的潜在化合物-靶标相互作用。我们预测了六种不同组织中与 QFPDD 相关的 13 种关键活性化合物、127 个潜在靶点和 27 条通路。值得注意的是,齐墩果酸-AXL在心脏、血液和肝脏中表现出领先的亲和力。分子对接和分子动力学模拟证实了它们强大的结合亲和力。齐墩果酸与 AXL 受体之间的强相互作用表明,AXL 是开发临床干预策略的一个很有前景的靶点。通过构建多组织化合物-靶点相互作用网络,我们的研究进一步阐明了 QFPDD 在多种组织中有效对抗 COVID-19 的机制。我们的研究还为今后研究其他中药配方在疾病治疗中的系统效应建立了框架。
{"title":"Construction of a multi-tissue compound-target interaction network of Qingfei Paidu decoction in COVID-19 treatment based on deep learning and transcriptomic analysis.","authors":"Xia Li, Xuetong Zhao, Xinjian Yu, Jianping Zhao, Xiangdong Fang","doi":"10.1142/S0219720024500161","DOIUrl":"10.1142/S0219720024500161","url":null,"abstract":"<p><p>The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450016"},"PeriodicalIF":0.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141735373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1