Pub Date : 2024-12-01Epub Date: 2025-01-21DOI: 10.1142/S0219720024500264
Chao Li, Xiaoran Huang, Xiao Luo, Xiaohui Lin
Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.
{"title":"Gene regulatory network inference based on modified adaptive lasso.","authors":"Chao Li, Xiaoran Huang, Xiao Luo, Xiaohui Lin","doi":"10.1142/S0219720024500264","DOIUrl":"10.1142/S0219720024500264","url":null,"abstract":"<p><p>Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450026"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-11-15DOI: 10.1142/S0219720024500252
Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang
Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.
研究表明,长期承受压力的人患心理应激障碍的风险可能更高。目前,心理压力主要由专业医生使用评分量表进行评估,这可能容易产生主观偏见和量表的局限性。因此,探索更客观、准确、高效的生物标志物来评估个体的心理压力水平势在必行。在本研究中,我们利用四维数据独立采集(4D-DIA)蛋白质组学进行定量蛋白质分析,然后采用支持向量机(SVM)结合SHAP解释算法来识别心理压力水平的潜在生物标志物。随后,通过机器学习分类和从知识图谱中获得的大量先验知识实现了生物标记物的验证。我们使用两批数据对生物标记物进行了交叉验证,结果显示,甘油醛-3-磷酸脱氢酶和纤连蛋白的组合产生的平均曲线下面积(AUC)为 92%,平均准确率为 86%,平均 F1 得分为 79%,平均灵敏度为 83%。因此,这种组合可能是检测压力水平以预防心理应激障碍的一种潜在方法。
{"title":"The use of 4D data-independent acquisition-based proteomic analysis and machine learning to reveal potential biomarkers for stress levels.","authors":"Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang","doi":"10.1142/S0219720024500252","DOIUrl":"10.1142/S0219720024500252","url":null,"abstract":"<p><p>Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450025"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1142/S0219720024990014
{"title":"Author index Volume 22 (2024).","authors":"","doi":"10.1142/S0219720024990014","DOIUrl":"https://doi.org/10.1142/S0219720024990014","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2499001"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2025-02-01DOI: 10.1142/S0219720024500288
Weibin Ding, Shaohua Jiang, Ting Xu, Zhijian Lyu
The prediction of drug-target affinity (DTA) is crucial for efficiently identifying potential targets for drug repurposing, thereby reducing resource wastage. In this paper, we propose a novel graph-based deep learning model for DTA that leverages adaptive structure-aware pooling for graph processing. Our approach integrates a self-attention mechanism with an enhanced graph neural network to capture the significance of each node in the graph, marking a significant advancement in graph feature extraction. Specifically, adjacent nodes in the 2D molecular graph are aggregated into clusters, with the features of these clusters weighted according to their attention scores to form the final molecular representation. In terms of model architecture, we utilize both global and hierarchical pooling, and assess the performance of the model on multiple benchmark datasets. The evaluation results on the KIBA dataset show that our model achieved the lowest mean squared error (MSE) of 0.126, which is a 0.5% reduction compared to the best-performing baseline method. Additionally, to validate the generalization capabilities of the model, we conduct comparative experiments on regression and binary classification tasks. The results demonstrate that our model outperforms previous models in both types of tasks.
{"title":"ASAP-DTA: Predicting drug-target binding affinity with adaptive structure aware networks.","authors":"Weibin Ding, Shaohua Jiang, Ting Xu, Zhijian Lyu","doi":"10.1142/S0219720024500288","DOIUrl":"https://doi.org/10.1142/S0219720024500288","url":null,"abstract":"<p><p>The prediction of drug-target affinity (DTA) is crucial for efficiently identifying potential targets for drug repurposing, thereby reducing resource wastage. In this paper, we propose a novel graph-based deep learning model for DTA that leverages adaptive structure-aware pooling for graph processing. Our approach integrates a self-attention mechanism with an enhanced graph neural network to capture the significance of each node in the graph, marking a significant advancement in graph feature extraction. Specifically, adjacent nodes in the 2D molecular graph are aggregated into clusters, with the features of these clusters weighted according to their attention scores to form the final molecular representation. In terms of model architecture, we utilize both global and hierarchical pooling, and assess the performance of the model on multiple benchmark datasets. The evaluation results on the KIBA dataset show that our model achieved the lowest mean squared error (MSE) of 0.126, which is a 0.5% reduction compared to the best-performing baseline method. Additionally, to validate the generalization capabilities of the model, we conduct comparative experiments on regression and binary classification tasks. The results demonstrate that our model outperforms previous models in both types of tasks.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450028"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2025-02-01DOI: 10.1142/S0219720024500276
Li-Ping Wu, Li Yong, Xiang Cheng, Yang Zhou
Compound identification in small molecule research relies on comparing experimental mass spectra with mass spectral databases. However, unequal data lengths often lead to inefficient and inaccurate retrieval. Moreover, the similarity calculation methods used by commercial software have limitations. To address these issues, two mass spectrometry data processing methods namely the "splicing-filling method" and the "matching-filling method" have been proposed. In addition, an information entropy-based similarity calculation method for mass spectra is presented. The alignment method converts mass spectra of different lengths for unknown and known compounds into equal-length mass spectra, allowing more accurate calculation of similarities between mass spectra. Information entropy measurements are used to quantify the differences in intensity distributions in the aligned mass spectral data, which are then used to compare the degree of similarity between different mass spectra. The results of the example validation show that the two data alignment methods can effectively solve the problem of unequal lengths of mass spectral data in similarity calculation. The results of the mass spectral entropy method are reliable and suitable for the identification of mass spectra.
{"title":"Research on similarity retrieval method based on mass spectral entropy.","authors":"Li-Ping Wu, Li Yong, Xiang Cheng, Yang Zhou","doi":"10.1142/S0219720024500276","DOIUrl":"https://doi.org/10.1142/S0219720024500276","url":null,"abstract":"<p><p>Compound identification in small molecule research relies on comparing experimental mass spectra with mass spectral databases. However, unequal data lengths often lead to inefficient and inaccurate retrieval. Moreover, the similarity calculation methods used by commercial software have limitations. To address these issues, two mass spectrometry data processing methods namely the \"splicing-filling method\" and the \"matching-filling method\" have been proposed. In addition, an information entropy-based similarity calculation method for mass spectra is presented. The alignment method converts mass spectra of different lengths for unknown and known compounds into equal-length mass spectra, allowing more accurate calculation of similarities between mass spectra. Information entropy measurements are used to quantify the differences in intensity distributions in the aligned mass spectral data, which are then used to compare the degree of similarity between different mass spectra. The results of the example validation show that the two data alignment methods can effectively solve the problem of unequal lengths of mass spectral data in similarity calculation. The results of the mass spectral entropy method are reliable and suitable for the identification of mass spectra.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450027"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1142/S021972002450029X
Mateusz Twardawa, Kaja Gutowska, Piotr Formanowicz
Background: Cardiovascular diseases have long been studied to identify their causal factors and counteract them effectively. Atherosclerosis, an inflammatory process of the blood vessel wall, is a common cardiovascular disease. Among the many well-known risk factors, hypercholesterolemia is undoubtedly a significant condition for atherosclerotic plaque formation and is linked to atherosclerosis on many levels, i.e. cell interactions, cytokines levels, diet, and lifestyle. Current studies suggest that controlling balance between proinflammatory (M1) and anti-inflammatory (M2) types of macrophages may be used for patient condition improvement and necrotic core reduction. Methods: This study considered the effects of hypercholesterolemia on the population dynamics of macrophages (M0, M1, M2, foam cells) in atherosclerotic plaque. A mathematical model using a matrix approach to population dynamics was proposed and tested in various scenarios. In order to check model sensitivity and variability associated with error propagation, the uncertainty analysis was performed based on the Monte Carlo approach. Results: Simulations of macrophage population dynamics provided the assessment of necrotic core development and plaque instability. Excess lipid levels emerged as the most critical factor for necrotic core development. However, plaque growth can be significantly slowed if macrophages and foam cells can maintain proper lipid levels. This balance may be disrupted by proinflammatory lipids that eventually will increase plaque size, what is also reflected by M1/M2 dynamics. Conclusion: Hypercholesterolemia accelerates atherosclerosis development, leading to earlier cardiovascular incidents. In silico results suggest that reducing lipid intake and portion of proinflammatory lipids is crucial to slowing plaque development and reducing rupture risk, all of which requires preserving fragile M1/M2 balance. Targeting the inflammatory microenvironment and macrophage polarization represents a promising approach for atherosclerosis management.
{"title":"Exploring relationship between hypercholesterolemia and instability of atherosclerotic plaque - An approach based on a matrix population model.","authors":"Mateusz Twardawa, Kaja Gutowska, Piotr Formanowicz","doi":"10.1142/S021972002450029X","DOIUrl":"https://doi.org/10.1142/S021972002450029X","url":null,"abstract":"<p><p><b>Background:</b> Cardiovascular diseases have long been studied to identify their causal factors and counteract them effectively. Atherosclerosis, an inflammatory process of the blood vessel wall, is a common cardiovascular disease. Among the many well-known risk factors, hypercholesterolemia is undoubtedly a significant condition for atherosclerotic plaque formation and is linked to atherosclerosis on many levels, i.e. cell interactions, cytokines levels, diet, and lifestyle. Current studies suggest that controlling balance between proinflammatory (<i>M</i>1) and anti-inflammatory (<i>M</i>2) types of macrophages may be used for patient condition improvement and necrotic core reduction. <b>Methods:</b> This study considered the effects of hypercholesterolemia on the population dynamics of macrophages (<i>M</i>0, <i>M</i>1, <i>M</i>2, foam cells) in atherosclerotic plaque. A mathematical model using a matrix approach to population dynamics was proposed and tested in various scenarios. In order to check model sensitivity and variability associated with error propagation, the uncertainty analysis was performed based on the Monte Carlo approach. <b>Results:</b> Simulations of macrophage population dynamics provided the assessment of necrotic core development and plaque instability. Excess lipid levels emerged as the most critical factor for necrotic core development. However, plaque growth can be significantly slowed if macrophages and foam cells can maintain proper lipid levels. This balance may be disrupted by proinflammatory lipids that eventually will increase plaque size, what is also reflected by <i>M</i>1/<i>M</i>2 dynamics. <b>Conclusion:</b> Hypercholesterolemia accelerates atherosclerosis development, leading to earlier cardiovascular incidents. <i>In silico</i> results suggest that reducing lipid intake and portion of proinflammatory lipids is crucial to slowing plaque development and reducing rupture risk, all of which requires preserving fragile <i>M</i>1/<i>M</i>2 balance. Targeting the inflammatory microenvironment and macrophage polarization represents a promising approach for atherosclerosis management.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 6","pages":"2450029"},"PeriodicalIF":0.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the drug discovery process, accurate prediction of drug-target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.
{"title":"Improving drug-target interaction prediction through dual-modality fusion with InteractNet.","authors":"Baozhong Zhu, Runhua Zhang, Tengsheng Jiang, Zhiming Cui, Jing Chen, Hongjie Wu","doi":"10.1142/S0219720024500240","DOIUrl":"https://doi.org/10.1142/S0219720024500240","url":null,"abstract":"<p><p>In the drug discovery process, accurate prediction of drug-target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450024"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1142/S0219720024500227
Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei
Background: Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of "novel proteins" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. Results: We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. Conclusion: SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.
{"title":"SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.","authors":"Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei","doi":"10.1142/S0219720024500227","DOIUrl":"https://doi.org/10.1142/S0219720024500227","url":null,"abstract":"<p><p><i>Background:</i> Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of \"novel proteins\" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. <i>Results:</i> We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. <i>Conclusion:</i> SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450022"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-10-30DOI: 10.1142/S0219720024500239
Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi
The existence of an efficient inducible transgene expression system is a valuable tool in recombinant protein production. The synthetic theophylline-responsive riboswitch (theo.RS) can be replaced in the 5[Formula: see text] untranslated region of an mRNA and control the translation of downstream gene in chloroplasts in response to the binding with a ligand molecule, theophylline. One of the drawbacks associated with the efficiency of the theo.RS is the leak in the RS structure allowing undesired background translation when the switch is expected to be off. The purpose of this study was to detect the factors causing the leak of the theo.RS in the off mode, using molecular dynamics (MD) simulations the appropriate balancing of the simulation system, using the necessary commands, a 40[Formula: see text]ns simulation was conducted. Analysis of the solvent-accessible surface area for both ribosome-binding site (RBS) regions indicated that nucleotide 79 of the theo.RS, a guanine, had the highest surface exposure to ribosome access. These results were verified with the study of hydrogen bonding of RBS regions with the RNA structure. Therefore, redesigning the RBS regions and avoiding the unmasked nucleotide(s) in the structure may improve the tightness of theo.RS in off mode resulting in the efficient inhibition of translation.
{"title":"Molecular dynamics simulations of ribosome-binding sites in theophylline-responsive riboswitch associated with improving the gene expression regulation in chloroplasts.","authors":"Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi","doi":"10.1142/S0219720024500239","DOIUrl":"https://doi.org/10.1142/S0219720024500239","url":null,"abstract":"<p><p>The existence of an efficient inducible transgene expression system is a valuable tool in recombinant protein production. The synthetic theophylline-responsive riboswitch (theo.RS) can be replaced in the 5[Formula: see text] untranslated region of an mRNA and control the translation of downstream gene in chloroplasts in response to the binding with a ligand molecule, theophylline. One of the drawbacks associated with the efficiency of the theo.RS is the leak in the RS structure allowing undesired background translation when the switch is expected to be off. The purpose of this study was to detect the factors causing the leak of the theo.RS in the off mode, using molecular dynamics (MD) simulations the appropriate balancing of the simulation system, using the necessary commands, a 40[Formula: see text]ns simulation was conducted. Analysis of the solvent-accessible surface area for both ribosome-binding site (RBS) regions indicated that nucleotide 79 of the theo.RS, a guanine, had the highest surface exposure to ribosome access. These results were verified with the study of hydrogen bonding of RBS regions with the RNA structure. Therefore, redesigning the RBS regions and avoiding the unmasked nucleotide(s) in the structure may improve the tightness of theo.RS in off mode resulting in the efficient inhibition of translation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450023"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.
{"title":"Construction of a multi-tissue compound-target interaction network of Qingfei Paidu decoction in COVID-19 treatment based on deep learning and transcriptomic analysis.","authors":"Xia Li, Xuetong Zhao, Xinjian Yu, Jianping Zhao, Xiangdong Fang","doi":"10.1142/S0219720024500161","DOIUrl":"10.1142/S0219720024500161","url":null,"abstract":"<p><p>The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450016"},"PeriodicalIF":0.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141735373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}