Tyler Kolisnik, Faeze Keshavarz-Rahaghi, Rachel V Purcell, Adam N H Smith, Olin K Silander
Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn "RandomForestClassifier" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.
随机森林模型被广泛应用于基因组数据分析,并能深入揭示复杂的生物机制,尤其是当特征以交互、非线性或非相加的方式影响目标时。目前,一些计算速度最快的随机森林方法是用 Python 实现的。然而,许多生物学家使用 R 进行基因组数据分析,因为 R 提供了一个统一的平台来执行额外的统计分析和可视化。pyRforest 继承了 Python 的高效内存管理和并行化功能,并针对大型基因组数据集(如 RNA-seq 数据集)上的分类任务进行了优化。这种方法可用于估算和直观显示单个特征的 P 值,使研究人员能够识别出有可靠统计证据表明存在效应的特征子集。此外,pyRforest 还包括 SHapley Additive exPlanations 值的计算和可视化方法。pyRforest 结合了 Python 和 R 的优势,从而改进了用于基因组数据分析的随机森林模型的实现和可解释性。pyRforest 的下载地址为:https://www.github.com/tkolisnik/pyRforest,相关的 vignette 下载地址为:https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf。
{"title":"pyRforest: a comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R.","authors":"Tyler Kolisnik, Faeze Keshavarz-Rahaghi, Rachel V Purcell, Adam N H Smith, Olin K Silander","doi":"10.1093/bfgp/elae038","DOIUrl":"10.1093/bfgp/elae038","url":null,"abstract":"<p><p>Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly when features influence the target in interactive, nonlinear, or nonadditive ways. Currently, some of the most efficient Random Forest methods in terms of computational speed are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here, we present an R package, pyRforest, which integrates Python scikit-learn \"RandomForestClassifier\" algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize P-values for individual features, allowing the researcher to identify a subset of features for which there is robust statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley Additive exPlanations values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of Random Forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: https://www.github.com/tkolisnik/pyRforest with an associated vignette at https://github.com/tkolisnik/pyRforest/blob/main/vignettes/pyRforest-vignette.pdf.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735746/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ericka M Hernandez-Benitez, Esperanza Martínez-Romero, José Luis Aguirre-Noyola, Jose Arcadio Farias-Rico, Daniela Ledezma-Tejeida
Growth of the common bean plant Phaseolus vulgaris is tightly linked to its symbiotic relationship with diverse rhizobial species, particularly Rhizobium phaseoli, an alphaproteobacterium that forms root nodules and provides high levels of nitrogen to the plant. Molecular cross-talk is known to happen through plant-derived metabolites, but only flavonoids have been identified as nodulation signals, which act through the activation of the NodD Transcription Factor (TF). The identification of signals that mediate nodulation via TFs can aid in the rational design of biofertilizers that promote plant-bacteria symbiosis. Here, we identified 57 TFs in the R. phaseoli genome through sequence conservation from Escherichia coli, and predicted a transcriptional regulatory network comprising 16 TFs, and 1,371 target genes. We identified the regulatory interactions relevant to nodulation via transcriptome analysis, and hypothesize that PuuR is a TF involved in nodulation, potentially acting via its known binding metabolite putrescine. Sequence and structural evidence predict a model where putrescine acts as a signaling metabolite in nodulation via the TF PuuR, and the regulation of the nodI gene.
{"title":"Computational inference of Rhizobium phaseoli transcriptional regulatory network predicts Transcription Factors involved in nodulation.","authors":"Ericka M Hernandez-Benitez, Esperanza Martínez-Romero, José Luis Aguirre-Noyola, Jose Arcadio Farias-Rico, Daniela Ledezma-Tejeida","doi":"10.1093/bfgp/elaf020","DOIUrl":"10.1093/bfgp/elaf020","url":null,"abstract":"<p><p>Growth of the common bean plant Phaseolus vulgaris is tightly linked to its symbiotic relationship with diverse rhizobial species, particularly Rhizobium phaseoli, an alphaproteobacterium that forms root nodules and provides high levels of nitrogen to the plant. Molecular cross-talk is known to happen through plant-derived metabolites, but only flavonoids have been identified as nodulation signals, which act through the activation of the NodD Transcription Factor (TF). The identification of signals that mediate nodulation via TFs can aid in the rational design of biofertilizers that promote plant-bacteria symbiosis. Here, we identified 57 TFs in the R. phaseoli genome through sequence conservation from Escherichia coli, and predicted a transcriptional regulatory network comprising 16 TFs, and 1,371 target genes. We identified the regulatory interactions relevant to nodulation via transcriptome analysis, and hypothesize that PuuR is a TF involved in nodulation, potentially acting via its known binding metabolite putrescine. Sequence and structural evidence predict a model where putrescine acts as a signaling metabolite in nodulation via the TF PuuR, and the regulation of the nodI gene.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12645834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The viruses threats provoke concerns regarding their sustained epidemic transmission, making the development of vaccines particularly important. In the prolonged and costly process of vaccine development, the most important initial step is to identify protective immunogens. Machine learning (ML) approaches are productive in analyzing big data such as microbial proteomes, and can remarkably reduce the cost of experimental work in developing novel vaccine candidates. We intensively evaluated the B cell epitope immunogenicity prediction power of eight commonly-used ML methods by random sampling cross validation on a large dataset consisting of known viral immunogens and non-immunogens we manually curated from the public domain. Extreme Gradient Boosting, K Nearest Neighbours, and Random Forest) showed the strongest predictive power. We then proposed a novel soft-voting based ensemble approach (VirusImmu), which demonstrated a powerful and stable capability for viral immunogenicity prediction across the test set and external test set irrespective of protein sequence length. VirusImmu was successfully applied to facilitate identifying linear B cell epitopes against African Swine Fever Virus as confirmed by indirect ELISA in vitro. In short, VirusImmu exhibited tremendous potentials in predicting immunogenicity of viral protein segments. It is freely accessible at https://github.com/zhangjbig/VirusImmu.
{"title":"VirusImmu: a novel ensemble machine learning approach for viral immunogenicity prediction.","authors":"Jing Li, Zhongpeng Zhao, ChengZheng Tai, Ting Sun, Lingyun Tan, Xinyu Li, Wei He, HongJun Li, Jing Zhang","doi":"10.1093/bfgp/elaf008","DOIUrl":"https://doi.org/10.1093/bfgp/elaf008","url":null,"abstract":"<p><p>The viruses threats provoke concerns regarding their sustained epidemic transmission, making the development of vaccines particularly important. In the prolonged and costly process of vaccine development, the most important initial step is to identify protective immunogens. Machine learning (ML) approaches are productive in analyzing big data such as microbial proteomes, and can remarkably reduce the cost of experimental work in developing novel vaccine candidates. We intensively evaluated the B cell epitope immunogenicity prediction power of eight commonly-used ML methods by random sampling cross validation on a large dataset consisting of known viral immunogens and non-immunogens we manually curated from the public domain. Extreme Gradient Boosting, K Nearest Neighbours, and Random Forest) showed the strongest predictive power. We then proposed a novel soft-voting based ensemble approach (VirusImmu), which demonstrated a powerful and stable capability for viral immunogenicity prediction across the test set and external test set irrespective of protein sequence length. VirusImmu was successfully applied to facilitate identifying linear B cell epitopes against African Swine Fever Virus as confirmed by indirect ELISA in vitro. In short, VirusImmu exhibited tremendous potentials in predicting immunogenicity of viral protein segments. It is freely accessible at https://github.com/zhangjbig/VirusImmu.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12051847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.
{"title":"STLBRF: an improved random forest algorithm based on standardized-threshold for feature screening of gene expression data.","authors":"Huini Feng, Ying Ju, Xiaofeng Yin, Wenshi Qiu, Xu Zhang","doi":"10.1093/bfgp/elae048","DOIUrl":"10.1093/bfgp/elae048","url":null,"abstract":"<p><p>When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Xia, An Xiong, Zilong Zhang, Quan Zou, Feifei Cui
Deep learning models have made significant progress in the biomedical field, particularly in the prediction of drug-drug interactions (DDIs). DDIs are pharmacodynamic reactions between two or more drugs in the body, which may lead to adverse effects and are of great significance for drug development and clinical research. However, predicting DDI through traditional clinical trials and experiments is not only costly but also time-consuming. When utilizing advanced Artificial Intelligence (AI) and deep learning techniques, both developers and users face multiple challenges, including the problem of acquiring and encoding data, as well as the difficulty of designing computational methods. In this paper, we review a variety of DDI prediction methods, including similarity-based, network-based, and integration-based approaches, to provide an up-to-date and easy-to-understand guide for researchers in different fields. Additionally, we provide an in-depth analysis of widely used molecular representations and a systematic exposition of the theoretical framework of models used to extract features from graph data.
{"title":"A comprehensive review of deep learning-based approaches for drug-drug interaction prediction.","authors":"Yan Xia, An Xiong, Zilong Zhang, Quan Zou, Feifei Cui","doi":"10.1093/bfgp/elae052","DOIUrl":"10.1093/bfgp/elae052","url":null,"abstract":"<p><p>Deep learning models have made significant progress in the biomedical field, particularly in the prediction of drug-drug interactions (DDIs). DDIs are pharmacodynamic reactions between two or more drugs in the body, which may lead to adverse effects and are of great significance for drug development and clinical research. However, predicting DDI through traditional clinical trials and experiments is not only costly but also time-consuming. When utilizing advanced Artificial Intelligence (AI) and deep learning techniques, both developers and users face multiple challenges, including the problem of acquiring and encoding data, as well as the difficulty of designing computational methods. In this paper, we review a variety of DDI prediction methods, including similarity-based, network-based, and integration-based approaches, to provide an up-to-date and easy-to-understand guide for researchers in different fields. Additionally, we provide an in-depth analysis of widely used molecular representations and a systematic exposition of the theoretical framework of models used to extract features from graph data.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11847217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangfei Huang, Zilu Yu, Juan Tian, Tao Chen, Aiping Wei, Chao Mei, Shibiao Chen, Yong Li
Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.
{"title":"m6A RNA modification pathway: orchestrating fibrotic mechanisms across multiple organs.","authors":"Xiangfei Huang, Zilu Yu, Juan Tian, Tao Chen, Aiping Wei, Chao Mei, Shibiao Chen, Yong Li","doi":"10.1093/bfgp/elae051","DOIUrl":"10.1093/bfgp/elae051","url":null,"abstract":"<p><p>Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735750/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The evolutionarily conserved Integrator complex, which is composed of over 10 subunits, orchestrates diverse RNA-processing events such as 3'-end maturation of small nuclear RNAs (snRNAs), transcription termination of RNA Polymerase II, and DNA damage response signaling pathways; however, the functional roles of individual Integrator complex subunits in lung adenocarcinoma (LUAD) remain poorly characterized, and this study aimed to systematically investigate the potential oncogenic functions and prognostic values of these subunits in LUAD. To achieve this goal, the expression profiles of Integrator complex subunits were profiled using transcriptomic data from the The Cancer Genome Atlas (TCGA) database, survival analyses (including Kaplan-Meier and Cox regression models) were performed to evaluate the correlations between subunit expression levels and patient survival outcomes (overall survival (OS) and disease-free survival (DFS)), co-expression network analysis was conducted to annotate the potential biological functions of key subunits, and functional validation was performed using CCK-8 assays and flow cytometry to assess the impact of INTS7 depletion on cell proliferation and cycle progression in LUAD cell lines. The findings of this study showed that Integrator complex subunits were significantly overexpressed in LUAD tissues compared to normal lung parenchyma; among these subunits, INTS7 expression was most strongly associated with shortened OS and DFS, indicating its pivotal role in LUAD pathogenesis, while bioinformatics analyses revealed that INTS7 is involved in regulating critical biological processes including cell cycle progression, transcriptional regulation, and RNA metabolism, and loss-of-function experiments demonstrated that genetic silencing of INTS7 significantly inhibited cell proliferation and induced cell cycle arrest in LUAD cells. Ultimately, this study provides the first evidence that INTS7, a core component of the Integrator complex, serves as a functional and prognostic regulator in LUAD, highlighting its potential as a therapeutic target for this malignancy.
{"title":"INTS7 modulates cell proliferation and apoptosis via promoting cell cycle progression in lung adenocarcinoma.","authors":"Yaming Liu, Tengfei Huang, Dehua Zeng, Meiqing Zhang, Duohuan Lian, Shunkai Zhou, Mengmeng Chen, Zhiyong Zeng, Huizhong Li","doi":"10.1093/bfgp/elaf014","DOIUrl":"https://doi.org/10.1093/bfgp/elaf014","url":null,"abstract":"<p><p>The evolutionarily conserved Integrator complex, which is composed of over 10 subunits, orchestrates diverse RNA-processing events such as 3'-end maturation of small nuclear RNAs (snRNAs), transcription termination of RNA Polymerase II, and DNA damage response signaling pathways; however, the functional roles of individual Integrator complex subunits in lung adenocarcinoma (LUAD) remain poorly characterized, and this study aimed to systematically investigate the potential oncogenic functions and prognostic values of these subunits in LUAD. To achieve this goal, the expression profiles of Integrator complex subunits were profiled using transcriptomic data from the The Cancer Genome Atlas (TCGA) database, survival analyses (including Kaplan-Meier and Cox regression models) were performed to evaluate the correlations between subunit expression levels and patient survival outcomes (overall survival (OS) and disease-free survival (DFS)), co-expression network analysis was conducted to annotate the potential biological functions of key subunits, and functional validation was performed using CCK-8 assays and flow cytometry to assess the impact of INTS7 depletion on cell proliferation and cycle progression in LUAD cell lines. The findings of this study showed that Integrator complex subunits were significantly overexpressed in LUAD tissues compared to normal lung parenchyma; among these subunits, INTS7 expression was most strongly associated with shortened OS and DFS, indicating its pivotal role in LUAD pathogenesis, while bioinformatics analyses revealed that INTS7 is involved in regulating critical biological processes including cell cycle progression, transcriptional regulation, and RNA metabolism, and loss-of-function experiments demonstrated that genetic silencing of INTS7 significantly inhibited cell proliferation and induced cell cycle arrest in LUAD cells. Ultimately, this study provides the first evidence that INTS7, a core component of the Integrator complex, serves as a functional and prognostic regulator in LUAD, highlighting its potential as a therapeutic target for this malignancy.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hwisoo Choi, Hyeonkyu Kim, Hoebin Chung, Dong-Sung Lee, Junil Kim
Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.
{"title":"Application of computational algorithms for single-cell RNA-seq and ATAC-seq in neurodegenerative diseases.","authors":"Hwisoo Choi, Hyeonkyu Kim, Hoebin Chung, Dong-Sung Lee, Junil Kim","doi":"10.1093/bfgp/elae044","DOIUrl":"10.1093/bfgp/elae044","url":null,"abstract":"<p><p>Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: A dynamic model of gene activation in response to hypoxia accounting for both HIF-1 and HIF-2.","authors":"","doi":"10.1093/bfgp/elaf028","DOIUrl":"10.1093/bfgp/elaf028","url":null,"abstract":"","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12723799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-omics characterization of individual cells offers remarkable potential for analyzing the dynamics and relationships of gene regulatory states across millions of cells. How to integrate multimodal data is an open problem, existing integration methods struggle with accuracy and modality-specific biological variation retention. In this paper, we present scHyper (scalable, interpretable machine learning for single cell integration), a low-code and data-efficient deep transfer model designed for integrating paired and unpaired single-cell multimodal data. We benchmark scHyper against datasets from different multimodal data. ScHyper learns a low-dimensional representation and aligns the covariance matrices of the measured modalities, achieving high accuracy even with large scale atlas-level datasets with low memory and computational time across different cell lines, shedding light on regulatory relationships between different types of omics. Altogether, we show that scHyper is a versatile and robust tool for cell-type label transfer and integration from multimodal single-cell datasets.
{"title":"Integration of single cell multiomics data by deep transfer hypergraph neural network.","authors":"Yulong Kan, Zhongxiao Zhang, Yingjie Wang, Yunjing Qi, Haoxin Chang, Weihao Wang, Zheng Zhang, Quanhong Liu, Xiaoran Shi","doi":"10.1093/bfgp/elaf009","DOIUrl":"10.1093/bfgp/elaf009","url":null,"abstract":"<p><p>Multi-omics characterization of individual cells offers remarkable potential for analyzing the dynamics and relationships of gene regulatory states across millions of cells. How to integrate multimodal data is an open problem, existing integration methods struggle with accuracy and modality-specific biological variation retention. In this paper, we present scHyper (scalable, interpretable machine learning for single cell integration), a low-code and data-efficient deep transfer model designed for integrating paired and unpaired single-cell multimodal data. We benchmark scHyper against datasets from different multimodal data. ScHyper learns a low-dimensional representation and aligns the covariance matrices of the measured modalities, achieving high accuracy even with large scale atlas-level datasets with low memory and computational time across different cell lines, shedding light on regulatory relationships between different types of omics. Altogether, we show that scHyper is a versatile and robust tool for cell-type label transfer and integration from multimodal single-cell datasets.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12397973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144979613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}