Background: Comorbidities and genetic correlations between gastrointestinal tract diseases and psychiatric disorders have been widely reported, but the underlying intrinsic link between Alzheimer's disease (AD) and inflammatory bowel disease (IBD) is not adequately understood.
Methods: To identify pathogenic cell types of AD and IBD and explore their shared genetic architecture, we developed Pathogenic Cell types and shared Genetic Loci (PCGL) framework, which studied AD and IBD and its two subtypes of ulcerative colitis (UC) and Crohn's disease (CD).
Results: We found that monocytes and CD8 T cells were the enriched pathogenic cell types of AD and IBDs, respectively. By PCGL framework, there was a significant global genetic correlation between AD and each of IBD, UC, and CD. Especially, local genetic correlations between AD and IBD showed strong signals in chr6. Bidirectional two-sample MR Analyses also validated these. Cross-trait meta-analysis identified two key genetic loci rs660895 (on chr6) and rs917117 (on chr7), which have not been previously reported. Two loci are located on the genes HLA-DRB1 and JAZF1, respectively. MAGMA genome-wide gene-based analysis identified six overlapping genes including HLA-DRB1. Subsequently, for one thing, SMR analyses further validated six shared genes in specific tissues and monocytes. For another, pathway enrichment analysis revealed shared genes were enriched in several natural killer cell mediated cytotoxicity and chemokine signaling pathways.
Conclusions: PCGL not only revealed the significant genetic correlations underlying AD and IBDs but also identified enriched pathogenic cell types and new shared loci and genes. We highlighted the mediation of HLA-DRB1 effects in the comorbidity mechanisms.
{"title":"Identification of pathogenic cell types and shared genetic loci and genes for Alzheimer's disease and inflammatory bowel disease.","authors":"Jingjing Zhang, Yuqing Yan, Liqin Han, Rui Qiao, Xiaohui Niu, Peiluan Li","doi":"10.1093/bfgp/elaf013","DOIUrl":"10.1093/bfgp/elaf013","url":null,"abstract":"<p><strong>Background: </strong>Comorbidities and genetic correlations between gastrointestinal tract diseases and psychiatric disorders have been widely reported, but the underlying intrinsic link between Alzheimer's disease (AD) and inflammatory bowel disease (IBD) is not adequately understood.</p><p><strong>Methods: </strong>To identify pathogenic cell types of AD and IBD and explore their shared genetic architecture, we developed Pathogenic Cell types and shared Genetic Loci (PCGL) framework, which studied AD and IBD and its two subtypes of ulcerative colitis (UC) and Crohn's disease (CD).</p><p><strong>Results: </strong>We found that monocytes and CD8 T cells were the enriched pathogenic cell types of AD and IBDs, respectively. By PCGL framework, there was a significant global genetic correlation between AD and each of IBD, UC, and CD. Especially, local genetic correlations between AD and IBD showed strong signals in chr6. Bidirectional two-sample MR Analyses also validated these. Cross-trait meta-analysis identified two key genetic loci rs660895 (on chr6) and rs917117 (on chr7), which have not been previously reported. Two loci are located on the genes HLA-DRB1 and JAZF1, respectively. MAGMA genome-wide gene-based analysis identified six overlapping genes including HLA-DRB1. Subsequently, for one thing, SMR analyses further validated six shared genes in specific tissues and monocytes. For another, pathway enrichment analysis revealed shared genes were enriched in several natural killer cell mediated cytotoxicity and chemokine signaling pathways.</p><p><strong>Conclusions: </strong>PCGL not only revealed the significant genetic correlations underlying AD and IBDs but also identified enriched pathogenic cell types and new shared loci and genes. We highlighted the mediation of HLA-DRB1 effects in the comorbidity mechanisms.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12415860/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145014481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequence compressors for novel species frequently face challenges when processing wide-scale raw, FASTA, or multi-FASTA structured data. For years, molecular sequence databases have favored the widely used general-purpose Gzip and Zstd compressors. The absence of sequence-specific characteristics in these encoders results in subpar performance, and their use depends on time-consuming parameter adjustments. To address these limitations, in this article, we propose a reference-free, lossless sequence compressor called GraSS (Grammatical, Statistical, and Substitution Rule-Based). GraSS compresses sequences more effectively by taking advantage of certain characteristics seen in DNA and RNA sequences. It supports various formats, including raw, FASTA, and multi-FASTA, commonly found in GenBank DNA and RNA files. We evaluate GraSS's performance using ten benchmark DNA sequences with reduced number of repeats, two highly repetitive RNA sequences, and fifteen raw DNA sequences. Test results indicate that the weighted average compression ratios (WACR) for DNA and RNA sequences are 4.5 and 19.6, respectively. Additionally, the entire DNA sequence corpus has a total compression time (TCT) of 246.8 seconds (s). These results demonstrate that the proposed compression method performs better than several advanced algorithms specifically designed to handle various levels of sequence redundancy. The decompression times, memory usage, and CPU usage are also very competitive. Contact: anirban@klyuniv.ac.in.
{"title":"A lossless reference-free sequence compression algorithm leveraging grammatical, statistical, and substitution rules.","authors":"Subhankar Roy, Dilip Kumar Maity, Anirban Mukhopadhyay","doi":"10.1093/bfgp/elae050","DOIUrl":"10.1093/bfgp/elae050","url":null,"abstract":"<p><p>Deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequence compressors for novel species frequently face challenges when processing wide-scale raw, FASTA, or multi-FASTA structured data. For years, molecular sequence databases have favored the widely used general-purpose Gzip and Zstd compressors. The absence of sequence-specific characteristics in these encoders results in subpar performance, and their use depends on time-consuming parameter adjustments. To address these limitations, in this article, we propose a reference-free, lossless sequence compressor called GraSS (Grammatical, Statistical, and Substitution Rule-Based). GraSS compresses sequences more effectively by taking advantage of certain characteristics seen in DNA and RNA sequences. It supports various formats, including raw, FASTA, and multi-FASTA, commonly found in GenBank DNA and RNA files. We evaluate GraSS's performance using ten benchmark DNA sequences with reduced number of repeats, two highly repetitive RNA sequences, and fifteen raw DNA sequences. Test results indicate that the weighted average compression ratios (WACR) for DNA and RNA sequences are 4.5 and 19.6, respectively. Additionally, the entire DNA sequence corpus has a total compression time (TCT) of 246.8 seconds (s). These results demonstrate that the proposed compression method performs better than several advanced algorithms specifically designed to handle various levels of sequence redundancy. The decompression times, memory usage, and CPU usage are also very competitive. Contact: anirban@klyuniv.ac.in.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142959201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transcription factor (TF) chromatin immunoprecipitation followed by sequencing (ChIP-seq) is essential for identifying genome-wide TF-binding sites (TFBSs), and the collected datasets offer a variety of opportunities for downstream analyses such as inference of gene regulatory network and prediction for effects of single-nucleotide polymorphisms (SNPs) on TFBSs. Although TF ChIP-seq data continue to accumulate in public databases, comprehensive coverage of biologically relevant TF-sample pairs (i.e. combination of targeted TF and cell type) remains elusive. This is due to the need for TF-specific antibodies and large cell numbers, limiting feasible TF-cell type combinations. Moreover, ChIP-seq is measurable when the TF is expressed in the target cell type. Thus, defining the full space of biologically relevant TF-sample pairs-including both measured and unmeasured-is essential to assess and improve dataset comprehensiveness. Here, we investigated publicly available human TF ChIP-seq datasets and introduced the concept of unmeasured TF-sample pairs, defined as biologically relevant TF-sample combinations for which ChIP-seq experiments have not yet been performed. Notably, many expressed TFs in specific cell types remain unmeasured by ChIP-seq, affecting the coverage of regulatory regions revealed by TF ChIP-seq and genome-wide association study-SNP analyses. Furthermore, we propose practical strategies to efficiently supplement currently unmeasured data and discuss how these approaches can significantly enhance data-driven research. The database of unmeasured human TF-sample pairs is publicly accessible at https://moccs-db.shinyapps.io/Unmeasured_shiny_v1/, facilitating the systematic expansion of TF ChIP-seq datasets and thereby enhancing our comprehension of gene regulatory mechanisms.
{"title":"Unmeasured human transcription factor ChIP-seq data shape functional genomics and demand strategic prioritization.","authors":"Saeko Tahara, Haruka Ozaki","doi":"10.1093/bfgp/elaf016","DOIUrl":"10.1093/bfgp/elaf016","url":null,"abstract":"<p><p>Transcription factor (TF) chromatin immunoprecipitation followed by sequencing (ChIP-seq) is essential for identifying genome-wide TF-binding sites (TFBSs), and the collected datasets offer a variety of opportunities for downstream analyses such as inference of gene regulatory network and prediction for effects of single-nucleotide polymorphisms (SNPs) on TFBSs. Although TF ChIP-seq data continue to accumulate in public databases, comprehensive coverage of biologically relevant TF-sample pairs (i.e. combination of targeted TF and cell type) remains elusive. This is due to the need for TF-specific antibodies and large cell numbers, limiting feasible TF-cell type combinations. Moreover, ChIP-seq is measurable when the TF is expressed in the target cell type. Thus, defining the full space of biologically relevant TF-sample pairs-including both measured and unmeasured-is essential to assess and improve dataset comprehensiveness. Here, we investigated publicly available human TF ChIP-seq datasets and introduced the concept of unmeasured TF-sample pairs, defined as biologically relevant TF-sample combinations for which ChIP-seq experiments have not yet been performed. Notably, many expressed TFs in specific cell types remain unmeasured by ChIP-seq, affecting the coverage of regulatory regions revealed by TF ChIP-seq and genome-wide association study-SNP analyses. Furthermore, we propose practical strategies to efficiently supplement currently unmeasured data and discuss how these approaches can significantly enhance data-driven research. The database of unmeasured human TF-sample pairs is publicly accessible at https://moccs-db.shinyapps.io/Unmeasured_shiny_v1/, facilitating the systematic expansion of TF ChIP-seq datasets and thereby enhancing our comprehension of gene regulatory mechanisms.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12479113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keyu Wan, Tiantian Nie, Wenhao Ouyang, Yunjing Xiong, Jing Bian, Ying Huang, Li Ling, Zhenjun Huang, Xianhua Zhu
RNA modifications include not only methylation modifications, such as m6A, but also acetylation modifications, which constitute a complex interaction involving "writers," "readers," and "erasers" that play crucial roles in growth, genetics, and disease. N4-acetylcytidine (ac4C) is an ancient and highly conserved RNA modification that plays a profound role in the pathogenesis of a wide range of diseases. This review provides insights into the functional impact of ac4C modifications in disease and introduces new perspectives for disease treatment. These studies provide important insights into the biological functions of post-transcriptional RNA modifications and their potential roles in disease mechanisms, offering new perspectives and strategies for disease treatment.
{"title":"Exploring the impact of N4-acetylcytidine modification in RNA on non-neoplastic disease: unveiling its role in pathogenesis and therapeutic opportunities.","authors":"Keyu Wan, Tiantian Nie, Wenhao Ouyang, Yunjing Xiong, Jing Bian, Ying Huang, Li Ling, Zhenjun Huang, Xianhua Zhu","doi":"10.1093/bfgp/elae020","DOIUrl":"10.1093/bfgp/elae020","url":null,"abstract":"<p><p>RNA modifications include not only methylation modifications, such as m6A, but also acetylation modifications, which constitute a complex interaction involving \"writers,\" \"readers,\" and \"erasers\" that play crucial roles in growth, genetics, and disease. N4-acetylcytidine (ac4C) is an ancient and highly conserved RNA modification that plays a profound role in the pathogenesis of a wide range of diseases. This review provides insights into the functional impact of ac4C modifications in disease and introduces new perspectives for disease treatment. These studies provide important insights into the biological functions of post-transcriptional RNA modifications and their potential roles in disease mechanisms, offering new perspectives and strategies for disease treatment.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735739/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141263641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Emerging challenge: dynamic solution structures of nucleic acids.","authors":"","doi":"10.1093/bfgp/elae053","DOIUrl":"10.1093/bfgp/elae053","url":null,"abstract":"","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":"24 ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143016869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In current bioinformatics research, spatial transcriptomics (ST) as a rapidly evolving technology is gradually receiving widespread attention from researchers. Spatial domains are regions where gene expression and histology are consistent in space, and detecting spatial domains can better understand the organization and functional distribution of tissues. Spatial domain recognition is a fundamental step in the process of ST data interpretation, which is also a major challenge in ST analysis. Therefore, developing more accurate, efficient, and general spatial domain recognition methods has become an important and urgent research direction. This article aims to review the current status and progress of spatial domain recognition research, explore the advantages and limitations of existing methods, and provide suggestions and directions for future tool development.
在当前的生物信息学研究中,空间转录组学(ST)作为一种快速发展的技术正逐渐受到研究人员的广泛关注。空间域是基因表达和组织学在空间上一致的区域,检测空间域可以更好地了解组织的组织和功能分布。空间域识别是 ST 数据解读过程中的基础步骤,也是 ST 分析中的一大挑战。因此,开发更准确、高效、通用的空间域识别方法已成为一个重要而紧迫的研究方向。本文旨在回顾空间域识别研究的现状和进展,探讨现有方法的优势和局限,并为未来工具的开发提供建议和方向。
{"title":"A comprehensive review of approaches for spatial domain recognition of spatial transcriptomes.","authors":"Ziyi Wang, Aoyun Geng, Hao Duan, Feifei Cui, Quan Zou, Zilong Zhang","doi":"10.1093/bfgp/elae040","DOIUrl":"10.1093/bfgp/elae040","url":null,"abstract":"<p><p>In current bioinformatics research, spatial transcriptomics (ST) as a rapidly evolving technology is gradually receiving widespread attention from researchers. Spatial domains are regions where gene expression and histology are consistent in space, and detecting spatial domains can better understand the organization and functional distribution of tissues. Spatial domain recognition is a fundamental step in the process of ST data interpretation, which is also a major challenge in ST analysis. Therefore, developing more accurate, efficient, and general spatial domain recognition methods has become an important and urgent research direction. This article aims to review the current status and progress of spatial domain recognition research, explore the advantages and limitations of existing methods, and provide suggestions and directions for future tool development.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"702-712"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142481471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Acute myeloid leukemia (AML) is one of the leading leukemic malignancies in adults. The heterogeneity of the disease makes the diagnosis and treatment extremely difficult. With the advent of next-generation sequencing (NGS) technologies, exploration at the molecular level for the identification of biomarkers and drug targets has been the focus for the researchers to come up with novel therapies for better prognosis and survival outcomes of AML patients. However, the huge amount of data from NGS platforms requires a comprehensive AML platform to streamline literature mining efforts and save time. To facilitate this, we developed AMLdb, an interactive multi-omics platform that allows users to query, visualize, retrieve, and analyse AML related multi-omics data. AMLdb contains 86 datasets for gene expression profiles, 15 datasets for methylation profiles, CRISPR-Cas9 knockout screens of 26 AML cell lines, sensitivity of 26 AML cell lines to 288 drugs, mutations in 41 unique genes in 23 AML cell lines, and information on 41 experimentally validated biomarkers. In this study, we have reported five genes, i.e. CBFB, ENO1, IMPDH2, SEPHS2, and MYH9 identified via our analysis using AMLdb. ENO1 is uniquely identified gene which requires further investigation as a novel potential target while other reported genes have been previously confirmed as targets through experimental studies. Top of form we believe that these findings utilizing AMLdb can make it an invaluable resource to accelerate the development of effective therapies for AML and assisting the research community in advancing their understanding of AML pathogenesis. AMLdb is freely available at https://project.iith.ac.in/cgntlab/amldb.
急性髓性白血病(AML)是成人主要的白血病恶性肿瘤之一。这种疾病的异质性给诊断和治疗带来了极大的困难。随着下一代测序(NGS)技术的出现,在分子水平上探索生物标志物和药物靶点已成为研究人员的工作重点,以便提出新的疗法,改善急性髓细胞白血病患者的预后和生存状况。然而,来自 NGS 平台的海量数据需要一个全面的 AML 平台来简化文献挖掘工作并节省时间。为此,我们开发了一个交互式多组学平台 AMLdb,允许用户查询、可视化、检索和分析 AML 相关的多组学数据。AMLdb 包含 86 个基因表达谱数据集、15 个甲基化谱数据集、26 个 AML 细胞系的 CRISPR-Cas9 基因敲除筛选、26 个 AML 细胞系对 288 种药物的敏感性、23 个 AML 细胞系中 41 个独特基因的突变以及 41 个实验验证生物标志物的信息。在本研究中,我们报告了通过 AMLdb 分析发现的五个基因,即 CBFB、ENO1、IMPDH2、SEPHS2 和 MYH9。ENO1是唯一被发现的基因,作为一个新的潜在靶点还需要进一步研究,而其他报告的基因之前已通过实验研究证实为靶点。最重要的是,我们相信利用 AMLdb 的这些发现可以使其成为加快开发急性髓细胞性白血病有效疗法的宝贵资源,并帮助研究界加深对急性髓细胞性白血病发病机制的了解。AMLdb 可在 https://project.iith.ac.in/cgntlab/amldb 免费获取。
{"title":"AMLdb: a comprehensive multi-omics platform to identify biomarkers and drug targets for acute myeloid leukemia.","authors":"Keerthana Vinod Kumar, Ambuj Kumar, Kavita Kundal, Avik Sengupta, Kunjulakshmi R, Subashani Singh, Bhanu Teja Korra, Simran Sharma, Vandana Suresh, Mayilaadumveettil Nishana, Rahul Kumar","doi":"10.1093/bfgp/elae024","DOIUrl":"10.1093/bfgp/elae024","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) is one of the leading leukemic malignancies in adults. The heterogeneity of the disease makes the diagnosis and treatment extremely difficult. With the advent of next-generation sequencing (NGS) technologies, exploration at the molecular level for the identification of biomarkers and drug targets has been the focus for the researchers to come up with novel therapies for better prognosis and survival outcomes of AML patients. However, the huge amount of data from NGS platforms requires a comprehensive AML platform to streamline literature mining efforts and save time. To facilitate this, we developed AMLdb, an interactive multi-omics platform that allows users to query, visualize, retrieve, and analyse AML related multi-omics data. AMLdb contains 86 datasets for gene expression profiles, 15 datasets for methylation profiles, CRISPR-Cas9 knockout screens of 26 AML cell lines, sensitivity of 26 AML cell lines to 288 drugs, mutations in 41 unique genes in 23 AML cell lines, and information on 41 experimentally validated biomarkers. In this study, we have reported five genes, i.e. CBFB, ENO1, IMPDH2, SEPHS2, and MYH9 identified via our analysis using AMLdb. ENO1 is uniquely identified gene which requires further investigation as a novel potential target while other reported genes have been previously confirmed as targets through experimental studies. Top of form we believe that these findings utilizing AMLdb can make it an invaluable resource to accelerate the development of effective therapies for AML and assisting the research community in advancing their understanding of AML pathogenesis. AMLdb is freely available at https://project.iith.ac.in/cgntlab/amldb.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"798-805"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristina Santucci, Yuning Cheng, Si-Mei Xu, Michael Janitz
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
{"title":"Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches.","authors":"Kristina Santucci, Yuning Cheng, Si-Mei Xu, Michael Janitz","doi":"10.1093/bfgp/elae031","DOIUrl":"10.1093/bfgp/elae031","url":null,"abstract":"<p><p>Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"683-694"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ferroptosis, a commonly observed type of programmed cell death caused by abnormal metabolic and biochemical mechanisms, is frequently triggered by cellular stress. The occurrence of ferroptosis is predominantly linked to pathophysiological conditions due to the substantial impact of various metabolic pathways, including fatty acid metabolism and iron regulation, on cellular reactions to lipid peroxidation and ferroptosis. This mode of cell death serves as a fundamental factor in the development of numerous diseases, thereby presenting a range of therapeutic targets. Single-cell sequencing technology provides insights into the cellular and molecular characteristics of individual cells, as opposed to bulk sequencing, which provides data in a more generalized manner. Single-cell sequencing has found extensive application in the field of cancer research. This paper reviews the progress made in ferroptosis-associated cancer research using single-cell sequencing, including ferroptosis-associated pathways, immune checkpoints, biomarkers, and the identification of cell clusters associated with ferroptosis in tumors. In general, the utilization of single-cell sequencing technology has the potential to contribute significantly to the investigation of the mechanistic regulatory pathways linked to ferroptosis. Moreover, it can shed light on the intricate connection between ferroptosis and cancer. This technology holds great promise in advancing tumor-wide diagnosis, targeted therapy, and prognosis prediction.
{"title":"Advances in integrating single-cell sequencing data to unravel the mechanism of ferroptosis in cancer.","authors":"Zhaolan Du, Yi Shi, Jianjun Tan","doi":"10.1093/bfgp/elae025","DOIUrl":"10.1093/bfgp/elae025","url":null,"abstract":"<p><p>Ferroptosis, a commonly observed type of programmed cell death caused by abnormal metabolic and biochemical mechanisms, is frequently triggered by cellular stress. The occurrence of ferroptosis is predominantly linked to pathophysiological conditions due to the substantial impact of various metabolic pathways, including fatty acid metabolism and iron regulation, on cellular reactions to lipid peroxidation and ferroptosis. This mode of cell death serves as a fundamental factor in the development of numerous diseases, thereby presenting a range of therapeutic targets. Single-cell sequencing technology provides insights into the cellular and molecular characteristics of individual cells, as opposed to bulk sequencing, which provides data in a more generalized manner. Single-cell sequencing has found extensive application in the field of cancer research. This paper reviews the progress made in ferroptosis-associated cancer research using single-cell sequencing, including ferroptosis-associated pathways, immune checkpoints, biomarkers, and the identification of cell clusters associated with ferroptosis in tumors. In general, the utilization of single-cell sequencing technology has the potential to contribute significantly to the investigation of the mechanistic regulatory pathways linked to ferroptosis. Moreover, it can shed light on the intricate connection between ferroptosis and cancer. This technology holds great promise in advancing tumor-wide diagnosis, targeted therapy, and prognosis prediction.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"713-725"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141319002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.
预计到 2050 年,可能会有 1 000 万人死于耐药菌感染。要解决这一问题,找出新一代抗生素是一种有效的方法。抗菌肽(AMPs)是一类先天性免疫效应物,因其消除耐药病原体(包括病毒、细菌和真菌)的能力而备受关注。近年来,人们广泛应用计算方法,特别是机器学习(ML)和深度学习(DL)来发现 AMPs。然而,现有的方法只能利用肽的组成、理化和结构特性等特征,无法完全捕捉到 AMPs 的序列信息。在这里,我们提出了一种基于集合随机投影(RP)的计算模型 SAMP,该模型除了利用传统的基于序列的特征进行 AMP 预测外,还利用了一种新型特征,即比例化拆分氨基酸组成(PSAAC)。利用这种新型特征集,SAMP 可以捕捉 N 端和 C 端的残基模式(如排序信号),同时还能保留中间肽段的序列顺序信息。在不同的平衡和不平衡数据集上进行的基准测试表明,SAMP 在准确度、马修斯相关系数 (MCC)、G-measure 和 F1 分数等方面始终优于 iAMPpred 和 AMPScanner V2 等现有的一流方法。此外,通过利用集合 RP 架构,SAMP 可以扩展到处理大规模 AMP 识别,与没有 RP 的模型相比,性能得到进一步提高。为方便使用 SAMP,我们开发了一个 Python 软件包,可在 https://github.com/wan-mlab/SAMP 免费获取。
{"title":"SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition.","authors":"Junxi Feng, Mengtao Sun, Cong Liu, Weiwei Zhang, Changmou Xu, Jieqiong Wang, Guangshun Wang, Shibiao Wan","doi":"10.1093/bfgp/elae046","DOIUrl":"10.1093/bfgp/elae046","url":null,"abstract":"<p><p>It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":" ","pages":"879-890"},"PeriodicalIF":2.5,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}