Upendra K Pradhan, Prabina K Meher, Sanchita Naha, Nitesh K Sharma, Aarushi Agarwal, Ajit Gupta, Rajender Parsad
DNA-binding proteins (DBPs) play critical roles in many biological processes, including gene expression, DNA replication, recombination and repair. Understanding the molecular mechanisms underlying these processes depends on the precise identification of DBPs. In recent times, several computational methods have been developed to identify DBPs. However, because of the generic nature of the models, these models are unable to identify species-specific DBPs with higher accuracy. Therefore, a species-specific computational model is needed to predict species-specific DBPs. In this paper, we introduce the computational DBPMod method, which makes use of a machine learning approach to identify species-specific DBPs. For prediction, both shallow learning algorithms and deep learning models were used, with shallow learning models achieving higher accuracy. Additionally, the evolutionary features outperformed sequence-derived features in terms of accuracy. Five model organisms, including Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Homo sapiens and Mus musculus, were used to assess the performance of DBPMod. Five-fold cross-validation and independent test set analyses were used to evaluate the prediction accuracy in terms of area under receiver operating characteristic curve (auROC) and area under precision-recall curve (auPRC), which was found to be ~89-92% and ~89-95%, respectively. The comparative results demonstrate that the DBPMod outperforms 12 current state-of-the-art computational approaches in identifying the DBPs for all five model organisms. We further developed the web server of DBPMod to make it easier for researchers to detect DBPs and is publicly available at https://iasri-sg.icar.gov.in/dbpmod/. DBPMod is expected to be an invaluable tool for discovering DBPs, supplementing the current experimental and computational methods.
{"title":"DBPMod: a supervised learning model for computational recognition of DNA-binding proteins in model organisms.","authors":"Upendra K Pradhan, Prabina K Meher, Sanchita Naha, Nitesh K Sharma, Aarushi Agarwal, Ajit Gupta, Rajender Parsad","doi":"10.1093/bfgp/elad039","DOIUrl":"10.1093/bfgp/elad039","url":null,"abstract":"<p><p>DNA-binding proteins (DBPs) play critical roles in many biological processes, including gene expression, DNA replication, recombination and repair. Understanding the molecular mechanisms underlying these processes depends on the precise identification of DBPs. In recent times, several computational methods have been developed to identify DBPs. However, because of the generic nature of the models, these models are unable to identify species-specific DBPs with higher accuracy. Therefore, a species-specific computational model is needed to predict species-specific DBPs. In this paper, we introduce the computational DBPMod method, which makes use of a machine learning approach to identify species-specific DBPs. For prediction, both shallow learning algorithms and deep learning models were used, with shallow learning models achieving higher accuracy. Additionally, the evolutionary features outperformed sequence-derived features in terms of accuracy. Five model organisms, including Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Homo sapiens and Mus musculus, were used to assess the performance of DBPMod. Five-fold cross-validation and independent test set analyses were used to evaluate the prediction accuracy in terms of area under receiver operating characteristic curve (auROC) and area under precision-recall curve (auPRC), which was found to be ~89-92% and ~89-95%, respectively. The comparative results demonstrate that the DBPMod outperforms 12 current state-of-the-art computational approaches in identifying the DBPs for all five model organisms. We further developed the web server of DBPMod to make it easier for researchers to detect DBPs and is publicly available at https://iasri-sg.icar.gov.in/dbpmod/. DBPMod is expected to be an invaluable tool for discovering DBPs, supplementing the current experimental and computational methods.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"363-372"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10483304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cell type identification is an important task for single-cell RNA-sequencing (scRNA-seq) data analysis. Many prediction methods have recently been proposed, but the predictive accuracy of difficult cell type identification tasks is still low. In this work, we proposed a novel Gaussian noise augmentation-based scRNA-seq contrastive learning method (GsRCL) to learn a type of discriminative feature representations for cell type identification tasks. A large-scale computational evaluation suggests that GsRCL successfully outperformed other state-of-the-art predictive methods on difficult cell type identification tasks, while the conventional random genes masking augmentation-based contrastive learning method also improved the accuracy of easy cell type identification tasks in general.
{"title":"Improving cell type identification with Gaussian noise-augmented single-cell RNA-seq contrastive learning.","authors":"Ibrahim Alsaggaf, Daniel Buchan, Cen Wan","doi":"10.1093/bfgp/elad059","DOIUrl":"10.1093/bfgp/elad059","url":null,"abstract":"<p><p>Cell type identification is an important task for single-cell RNA-sequencing (scRNA-seq) data analysis. Many prediction methods have recently been proposed, but the predictive accuracy of difficult cell type identification tasks is still low. In this work, we proposed a novel Gaussian noise augmentation-based scRNA-seq contrastive learning method (GsRCL) to learn a type of discriminative feature representations for cell type identification tasks. A large-scale computational evaluation suggests that GsRCL successfully outperformed other state-of-the-art predictive methods on difficult cell type identification tasks, while the conventional random genes masking augmentation-based contrastive learning method also improved the accuracy of easy cell type identification tasks in general.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"441-451"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139503121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Breast cancer (B.C.) still has increasing incidences and mortality rates globally. It is known that B.C. and other cancers have a very high rate of genetic heterogeneity and genomic mutations. Traditional oncology approaches have not been able to provide a lasting solution. Targeted therapeutics have been instrumental in handling the complexity and resistance associated with B.C. However, the progress of genomic technology has transformed our understanding of the genetic landscape of breast cancer, opening new avenues for improved anti-cancer therapeutics. Genomics is critical in developing tailored therapeutics and identifying patients most benefit from these treatments. The next generation of breast cancer clinical trials has incorporated next-generation sequencing technologies into the process, and we have seen benefits. These innovations have led to the approval of better-targeted therapies for patients with breast cancer. Genomics has a role to play in clinical trials, including genomic tests that have been approved, patient selection and prediction of therapeutic response. Multiple clinical trials in breast cancer have been done and are still ongoing, which have applied genomics technology. Precision medicine can be achieved in breast cancer therapy with increased efforts and advanced genomic studies in this domain. Genomics studies assist with patient outcomes improvement and oncology advancement by providing a deeper understanding of the biology behind breast cancer. This article will examine the present state of genomics in breast cancer clinical trials.
{"title":"Genomics in Clinical trials for Breast Cancer.","authors":"David Enoma","doi":"10.1093/bfgp/elad054","DOIUrl":"10.1093/bfgp/elad054","url":null,"abstract":"<p><p>Breast cancer (B.C.) still has increasing incidences and mortality rates globally. It is known that B.C. and other cancers have a very high rate of genetic heterogeneity and genomic mutations. Traditional oncology approaches have not been able to provide a lasting solution. Targeted therapeutics have been instrumental in handling the complexity and resistance associated with B.C. However, the progress of genomic technology has transformed our understanding of the genetic landscape of breast cancer, opening new avenues for improved anti-cancer therapeutics. Genomics is critical in developing tailored therapeutics and identifying patients most benefit from these treatments. The next generation of breast cancer clinical trials has incorporated next-generation sequencing technologies into the process, and we have seen benefits. These innovations have led to the approval of better-targeted therapies for patients with breast cancer. Genomics has a role to play in clinical trials, including genomic tests that have been approved, patient selection and prediction of therapeutic response. Multiple clinical trials in breast cancer have been done and are still ongoing, which have applied genomics technology. Precision medicine can be achieved in breast cancer therapy with increased efforts and advanced genomic studies in this domain. Genomics studies assist with patient outcomes improvement and oncology advancement by providing a deeper understanding of the biology behind breast cancer. This article will examine the present state of genomics in breast cancer clinical trials.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"325-334"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139038236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein methylation is a form of post-translational modifications of protein, which is crucial for various cellular processes, including transcription activity and DNA repair. Correctly predicting protein methylation sites is fundamental for research and drug discovery. Some experimental techniques, such as methyl-specific antibodies, chromatin immune precipitation and mass spectrometry, exist for predicting protein methylation sites, but these techniques are time-consuming and costly. The ability to predict methylation sites using in silico techniques may help researchers identify potential candidate sites for future examination and make it easier to carry out site-specific investigations and downstream characterizations. In this research, we proposed a novel deep learning-based predictor, named DeepPRMS, to identify protein methylation sites in primary sequences. The DeepPRMS utilizes the gated recurrent unit (GRU) and convolutional neural network (CNN) algorithms to extract the sequential and spatial information from the primary sequences. GRU is used to extract sequential information, while CNN is used for spatial information. We combined the latent representation of GRU and CNN models to have a better interaction among them. Based on the independent test data set, DeepPRMS obtained an accuracy of 85.32%, a specificity of 84.94%, Matthew's correlation coefficient of 0.71 and a sensitivity of 85.80%. The results indicate that DeepPRMS can predict protein methylation sites with high accuracy and outperform the state-of-the-art models. The DeepPRMS is expected to effectively guide future research experiments for identifying potential methylated protein sites. The web server is available at http://deepprms.nitsri.ac.in/.
{"title":"DeepPRMS: advanced deep learning model to predict protein arginine methylation sites.","authors":"Monika Khandelwal, Ranjeet Kumar Rout","doi":"10.1093/bfgp/elae001","DOIUrl":"10.1093/bfgp/elae001","url":null,"abstract":"<p><p>Protein methylation is a form of post-translational modifications of protein, which is crucial for various cellular processes, including transcription activity and DNA repair. Correctly predicting protein methylation sites is fundamental for research and drug discovery. Some experimental techniques, such as methyl-specific antibodies, chromatin immune precipitation and mass spectrometry, exist for predicting protein methylation sites, but these techniques are time-consuming and costly. The ability to predict methylation sites using in silico techniques may help researchers identify potential candidate sites for future examination and make it easier to carry out site-specific investigations and downstream characterizations. In this research, we proposed a novel deep learning-based predictor, named DeepPRMS, to identify protein methylation sites in primary sequences. The DeepPRMS utilizes the gated recurrent unit (GRU) and convolutional neural network (CNN) algorithms to extract the sequential and spatial information from the primary sequences. GRU is used to extract sequential information, while CNN is used for spatial information. We combined the latent representation of GRU and CNN models to have a better interaction among them. Based on the independent test data set, DeepPRMS obtained an accuracy of 85.32%, a specificity of 84.94%, Matthew's correlation coefficient of 0.71 and a sensitivity of 85.80%. The results indicate that DeepPRMS can predict protein methylation sites with high accuracy and outperform the state-of-the-art models. The DeepPRMS is expected to effectively guide future research experiments for identifying potential methylated protein sites. The web server is available at http://deepprms.nitsri.ac.in/.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"452-463"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139547623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhecheng Zhou, Zhenya Du, Xin Jiang, Linlin Zhuo, Yixin Xu, Xiangzheng Fu, Mingzhe Liu, Quan Zou
MicroRNAs (miRNAs) are found ubiquitously in biological cells and play a pivotal role in regulating the expression of numerous target genes. Therapies centered around miRNAs are emerging as a promising strategy for disease treatment, aiming to intervene in disease progression by modulating abnormal miRNA expressions. The accurate prediction of miRNA-drug resistance (MDR) is crucial for the success of miRNA therapies. Computational models based on deep learning have demonstrated exceptional performance in predicting potential MDRs. However, their effectiveness can be compromised by errors in the data acquisition process, leading to inaccurate node representations. To address this challenge, we introduce the GAM-MDR model, which combines the graph autoencoder (GAE) with random path masking techniques to precisely predict potential MDRs. The reliability and effectiveness of the GAM-MDR model are mainly reflected in two aspects. Firstly, it efficiently extracts the representations of miRNA and drug nodes in the miRNA-drug network. Secondly, our designed random path masking strategy efficiently reconstructs critical paths in the network, thereby reducing the adverse impact of noisy data. To our knowledge, this is the first time that a random path masking strategy has been integrated into a GAE to infer MDRs. Our method was subjected to multiple validations on public datasets and yielded promising results. We are optimistic that our model could offer valuable insights for miRNA therapeutic strategies and deepen the understanding of the regulatory mechanisms of miRNAs. Our data and code are publicly available at GitHub:https://github.com/ZZCrazy00/GAM-MDR.
{"title":"GAM-MDR: probing miRNA-drug resistance using a graph autoencoder based on random path masking.","authors":"Zhecheng Zhou, Zhenya Du, Xin Jiang, Linlin Zhuo, Yixin Xu, Xiangzheng Fu, Mingzhe Liu, Quan Zou","doi":"10.1093/bfgp/elae005","DOIUrl":"10.1093/bfgp/elae005","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are found ubiquitously in biological cells and play a pivotal role in regulating the expression of numerous target genes. Therapies centered around miRNAs are emerging as a promising strategy for disease treatment, aiming to intervene in disease progression by modulating abnormal miRNA expressions. The accurate prediction of miRNA-drug resistance (MDR) is crucial for the success of miRNA therapies. Computational models based on deep learning have demonstrated exceptional performance in predicting potential MDRs. However, their effectiveness can be compromised by errors in the data acquisition process, leading to inaccurate node representations. To address this challenge, we introduce the GAM-MDR model, which combines the graph autoencoder (GAE) with random path masking techniques to precisely predict potential MDRs. The reliability and effectiveness of the GAM-MDR model are mainly reflected in two aspects. Firstly, it efficiently extracts the representations of miRNA and drug nodes in the miRNA-drug network. Secondly, our designed random path masking strategy efficiently reconstructs critical paths in the network, thereby reducing the adverse impact of noisy data. To our knowledge, this is the first time that a random path masking strategy has been integrated into a GAE to infer MDRs. Our method was subjected to multiple validations on public datasets and yielded promising results. We are optimistic that our model could offer valuable insights for miRNA therapeutic strategies and deepen the understanding of the regulatory mechanisms of miRNAs. Our data and code are publicly available at GitHub:https://github.com/ZZCrazy00/GAM-MDR.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"475-483"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristina Moral-Turón, Gualberto Asencio-Cortés, Francesc Rodriguez-Diaz, Alejandro Rubio, Alberto G Navarro, Ana M Brokate-Llanos, Andrés Garzón, Manuel J Muñoz, Antonio J Pérez-Pulido
Massive gene expression analyses are widely used to find differentially expressed genes under specific conditions. The results of these experiments are often available in public databases that are undergoing a growth similar to that of molecular sequence databases in the past. This now allows novel secondary computational tools to emerge that use such information to gain new knowledge. If several genes have a similar expression profile across heterogeneous transcriptomics experiments, they could be functionally related. These associations are usually useful for the annotation of uncharacterized genes. In addition, the search for genes with opposite expression profiles is useful for finding negative regulators and proposing inhibitory compounds in drug repurposing projects. Here we present a new web application, Automatic and Serial Analysis of CO-expression (ASACO), which has the potential to discover positive and negative correlator genes to a given query gene, based on thousands of public transcriptomics experiments. In addition, examples of use are presented, comparing with previous contrasted knowledge. The results obtained propose ASACO as a useful tool to improve knowledge about genes associated with human diseases and noncoding genes. ASACO is available at http://www.bioinfocabd.upo.es/asaco/.
{"title":"ASACO: Automatic and Serial Analysis of CO-expression to discover gene modifiers with potential use in drug repurposing.","authors":"Cristina Moral-Turón, Gualberto Asencio-Cortés, Francesc Rodriguez-Diaz, Alejandro Rubio, Alberto G Navarro, Ana M Brokate-Llanos, Andrés Garzón, Manuel J Muñoz, Antonio J Pérez-Pulido","doi":"10.1093/bfgp/elae006","DOIUrl":"10.1093/bfgp/elae006","url":null,"abstract":"<p><p>Massive gene expression analyses are widely used to find differentially expressed genes under specific conditions. The results of these experiments are often available in public databases that are undergoing a growth similar to that of molecular sequence databases in the past. This now allows novel secondary computational tools to emerge that use such information to gain new knowledge. If several genes have a similar expression profile across heterogeneous transcriptomics experiments, they could be functionally related. These associations are usually useful for the annotation of uncharacterized genes. In addition, the search for genes with opposite expression profiles is useful for finding negative regulators and proposing inhibitory compounds in drug repurposing projects. Here we present a new web application, Automatic and Serial Analysis of CO-expression (ASACO), which has the potential to discover positive and negative correlator genes to a given query gene, based on thousands of public transcriptomics experiments. In addition, examples of use are presented, comparing with previous contrasted knowledge. The results obtained propose ASACO as a useful tool to improve knowledge about genes associated with human diseases and noncoding genes. ASACO is available at http://www.bioinfocabd.upo.es/asaco/.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"484-494"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139998292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerous methods have been developed to integrate spatial transcriptomics sequencing data with single-cell RNA sequencing (scRNA-seq) data. Continuous development and improvement of these methods offer multiple options for integrating and analyzing scRNA-seq and spatial transcriptomics data based on diverse research inquiries. However, each method has its own advantages, limitations and scope of application. Researchers need to select the most suitable method for their research purposes based on the actual situation. This review article presents a compilation of 19 integration methods sourced from a wide range of available approaches, serving as a comprehensive reference for researchers to select the suitable integration method for their specific research inquiries. By understanding the principles of these methods, we can identify their similarities and differences, comprehend their applicability and potential complementarity, and lay the foundation for future method development and understanding. This review article presents 19 methods that aim to integrate scRNA-seq data and spatial transcriptomics data. The methods are classified into two main groups and described accordingly. The article also emphasizes the incorporation of High Variance Genes in annotating various technologies, aiming to obtain biologically relevant information aligned with the intended purpose.
{"title":"Integration tools for scRNA-seq data and spatial transcriptomics sequencing data.","authors":"Chaorui Yan, Yanxu Zhu, Miao Chen, Kainan Yang, Feifei Cui, Quan Zou, Zilong Zhang","doi":"10.1093/bfgp/elae002","DOIUrl":"10.1093/bfgp/elae002","url":null,"abstract":"<p><p>Numerous methods have been developed to integrate spatial transcriptomics sequencing data with single-cell RNA sequencing (scRNA-seq) data. Continuous development and improvement of these methods offer multiple options for integrating and analyzing scRNA-seq and spatial transcriptomics data based on diverse research inquiries. However, each method has its own advantages, limitations and scope of application. Researchers need to select the most suitable method for their research purposes based on the actual situation. This review article presents a compilation of 19 integration methods sourced from a wide range of available approaches, serving as a comprehensive reference for researchers to select the suitable integration method for their specific research inquiries. By understanding the principles of these methods, we can identify their similarities and differences, comprehend their applicability and potential complementarity, and lay the foundation for future method development and understanding. This review article presents 19 methods that aim to integrate scRNA-seq data and spatial transcriptomics data. The methods are classified into two main groups and described accordingly. The article also emphasizes the incorporation of High Variance Genes in annotating various technologies, aiming to obtain biologically relevant information aligned with the intended purpose.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"295-302"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139547636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Type 1 diabetes (T1D) is an autoimmune disease caused by the destruction of insulin-producing pancreatic islet beta cells. Despite significant advancements, the precise pathogenesis of the disease remains unknown. This work integrated data from expression quantitative trait locus (eQTL) studies with Genome wide association study (GWAS) summary data of T1D and single-cell transcriptome data to investigate the potential pathogenic mechanisms of the CTSH gene involved in T1D in exocrine pancreas. Using the summary data-based Mendelian randomization (SMR) approach, we obtained four potential causative genes associated with T1D: BTN3A2, PGAP3, SMARCE1 and CTSH. To further investigate these genes'roles in T1D development, we validated them using a scRNA-seq dataset from pancreatic tissues of both T1D patients and healthy controls. The analysis showed a significantly high expression of the CTSH gene in T1D acinar cells, whereas the other three genes showed no significant changes in the scRNA-seq data. Moreover, single-cell WGCNA analysis revealed the strongest positive correlation between the module containing CTSH and T1D. In addition, we found cellular ligand-receptor interactions between the acinar cells and different cell types, especially ductal cells. Finally, based on functional enrichment analysis, we hypothesized that the CTSH gene in the exocrine pancreas enhances the antiviral response, leading to the overexpression of pro-inflammatory cytokines and the development of an inflammatory microenvironment. This process promotes β cells injury and ultimately the development of T1D. Our findings offer insights into the underlying pathogenic mechanisms of T1D.
{"title":"Integrating multi-omics data to analyze the potential pathogenic mechanism of CTSH gene involved in type 1 diabetes in the exocrine pancreas.","authors":"Zerun Song, Shuai Li, Zhenwei Shang, Wenhua Lv, Xiangshu Cheng, Xin Meng, Rui Chen, Shuhao Zhang, Ruijie Zhang","doi":"10.1093/bfgp/elad052","DOIUrl":"10.1093/bfgp/elad052","url":null,"abstract":"<p><p>Type 1 diabetes (T1D) is an autoimmune disease caused by the destruction of insulin-producing pancreatic islet beta cells. Despite significant advancements, the precise pathogenesis of the disease remains unknown. This work integrated data from expression quantitative trait locus (eQTL) studies with Genome wide association study (GWAS) summary data of T1D and single-cell transcriptome data to investigate the potential pathogenic mechanisms of the CTSH gene involved in T1D in exocrine pancreas. Using the summary data-based Mendelian randomization (SMR) approach, we obtained four potential causative genes associated with T1D: BTN3A2, PGAP3, SMARCE1 and CTSH. To further investigate these genes'roles in T1D development, we validated them using a scRNA-seq dataset from pancreatic tissues of both T1D patients and healthy controls. The analysis showed a significantly high expression of the CTSH gene in T1D acinar cells, whereas the other three genes showed no significant changes in the scRNA-seq data. Moreover, single-cell WGCNA analysis revealed the strongest positive correlation between the module containing CTSH and T1D. In addition, we found cellular ligand-receptor interactions between the acinar cells and different cell types, especially ductal cells. Finally, based on functional enrichment analysis, we hypothesized that the CTSH gene in the exocrine pancreas enhances the antiviral response, leading to the overexpression of pro-inflammatory cytokines and the development of an inflammatory microenvironment. This process promotes β cells injury and ultimately the development of T1D. Our findings offer insights into the underlying pathogenic mechanisms of T1D.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"406-417"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138483553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao-Wei Liu, Han-Lin Li, Cai-Yi Ma, Tian-Yu Shi, Tian-Yu Wang, Dan Yan, Hua Tang, Hao Lin, Ke-Jun Deng
Gut microbes is a crucial factor in the pathogenesis of type 1 diabetes (T1D). However, it is still unclear which gut microbiota are the key factors affecting T1D and their influence on the development and progression of the disease. To fill these knowledge gaps, we constructed a model to find biomarker from gut microbiota in patients with T1D. We first identified microbial markers using Linear discriminant analysis Effect Size (LEfSe) and random forest (RF) methods. Furthermore, by constructing co-occurrence networks for gut microbes in T1D, we aimed to reveal all gut microbial interactions as well as major beneficial and pathogenic bacteria in healthy populations and type 1 diabetic patients. Finally, PICRUST2 was used to predict Kyoto Encyclopedia of Genes and Genomes (KEGG) functional pathways and KO gene levels of microbial markers to investigate the biological role. Our study revealed that 21 identified microbial genera are important biomarker for T1D. Their AUC values are 0.962 and 0.745 on discovery set and validation set. Functional analysis showed that 10 microbial genera were significantly positively associated with D-arginine and D-ornithine metabolism, spliceosome in transcription, steroid hormone biosynthesis and glycosaminoglycan degradation. These genera were significantly negatively correlated with steroid biosynthesis, cyanoamino acid metabolism and drug metabolism. The other 11 genera displayed an inverse correlation. In summary, our research identified a comprehensive set of T1D gut biomarkers with universal applicability and have revealed the biological consequences of alterations in gut microbiota and their interplay. These findings offer significant prospects for individualized management and treatment of T1D.
{"title":"Predicting the role of the human gut microbiome in type 1 diabetes using machine-learning methods.","authors":"Xiao-Wei Liu, Han-Lin Li, Cai-Yi Ma, Tian-Yu Shi, Tian-Yu Wang, Dan Yan, Hua Tang, Hao Lin, Ke-Jun Deng","doi":"10.1093/bfgp/elae004","DOIUrl":"10.1093/bfgp/elae004","url":null,"abstract":"<p><p>Gut microbes is a crucial factor in the pathogenesis of type 1 diabetes (T1D). However, it is still unclear which gut microbiota are the key factors affecting T1D and their influence on the development and progression of the disease. To fill these knowledge gaps, we constructed a model to find biomarker from gut microbiota in patients with T1D. We first identified microbial markers using Linear discriminant analysis Effect Size (LEfSe) and random forest (RF) methods. Furthermore, by constructing co-occurrence networks for gut microbes in T1D, we aimed to reveal all gut microbial interactions as well as major beneficial and pathogenic bacteria in healthy populations and type 1 diabetic patients. Finally, PICRUST2 was used to predict Kyoto Encyclopedia of Genes and Genomes (KEGG) functional pathways and KO gene levels of microbial markers to investigate the biological role. Our study revealed that 21 identified microbial genera are important biomarker for T1D. Their AUC values are 0.962 and 0.745 on discovery set and validation set. Functional analysis showed that 10 microbial genera were significantly positively associated with D-arginine and D-ornithine metabolism, spliceosome in transcription, steroid hormone biosynthesis and glycosaminoglycan degradation. These genera were significantly negatively correlated with steroid biosynthesis, cyanoamino acid metabolism and drug metabolism. The other 11 genera displayed an inverse correlation. In summary, our research identified a comprehensive set of T1D gut biomarkers with universal applicability and have revealed the biological consequences of alterations in gut microbiota and their interplay. These findings offer significant prospects for individualized management and treatment of T1D.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"464-474"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139906960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
{"title":"EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors.","authors":"Huixiang Peng, Jing Xu, Kangchen Liu, Fang Liu, Aidi Zhang, Xiujun Zhang","doi":"10.1093/bfgp/elad040","DOIUrl":"10.1093/bfgp/elad040","url":null,"abstract":"<p><p>Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"373-383"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10112267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}