Fragment-based molecular generation has emerged as a promising paradigm in structure-based drug design (SBDD), deriving effective compounds with advanced properties, including chemical validity, synthetic feasibility, pharmacological relevance, etc. However, existing approaches often struggle with generating molecules which can both conform to 3D structural constraints and retain chemical plausibility. This is largely due to the fact that prior works often treat scaffolds and R-groups of molecules indiscriminately, overlooking the distinct semantic roles played by scaffolds and R-groups. Specifically, the scaffold serves as the rigid structural backbone that determines the global geometric topology and binding pose, whereas R-groups act as functional substituents responsible for fine-tuning local physicochemical interactions. Therefore, in this work, we propose fragment-based dual conditional diffusion (FDC-Diff), a novel dual conditional diffusion framework that integrates chemical priors and structural cues for fragment-based molecular generation. Unlike traditional de novo methods that generate atoms sequentially, FDC-Diff decomposes the molecule generation process into two semantically complementary stages. Given the protein pocket and an initial fragment, in the first stage, a spatially constrained scaffold is constructed to capture the global molecular topology. In the second stage, R-groups onto the obtained scaffold are elaborated to capture local semantics to further refine molecular properties. To ensure synthetic accessibility, initial fragments and scaffold-modification hierarchy are derived from curated reaction rules, and a physical-chemistry-inspired refinement step is applied to optimize final conformations. Experimental results on multiple SBDD benchmarks demonstrate that FDC-Diff achieves state-of-the-art performance in terms of comprehensive evaluations. Furthermore, our model excels at producing chemically valid, spatially compatible, and pharmacologically relevant molecules, suggesting its potential as a feasible tool for fragment-based drug design.
{"title":"An effective fragment-based dual conditional diffusion framework for molecular generation.","authors":"Haotian Chen, Yiting Shen, Jichun Li, Weizhong Zhao","doi":"10.1093/bib/bbaf727","DOIUrl":"10.1093/bib/bbaf727","url":null,"abstract":"<p><p>Fragment-based molecular generation has emerged as a promising paradigm in structure-based drug design (SBDD), deriving effective compounds with advanced properties, including chemical validity, synthetic feasibility, pharmacological relevance, etc. However, existing approaches often struggle with generating molecules which can both conform to 3D structural constraints and retain chemical plausibility. This is largely due to the fact that prior works often treat scaffolds and R-groups of molecules indiscriminately, overlooking the distinct semantic roles played by scaffolds and R-groups. Specifically, the scaffold serves as the rigid structural backbone that determines the global geometric topology and binding pose, whereas R-groups act as functional substituents responsible for fine-tuning local physicochemical interactions. Therefore, in this work, we propose fragment-based dual conditional diffusion (FDC-Diff), a novel dual conditional diffusion framework that integrates chemical priors and structural cues for fragment-based molecular generation. Unlike traditional de novo methods that generate atoms sequentially, FDC-Diff decomposes the molecule generation process into two semantically complementary stages. Given the protein pocket and an initial fragment, in the first stage, a spatially constrained scaffold is constructed to capture the global molecular topology. In the second stage, R-groups onto the obtained scaffold are elaborated to capture local semantics to further refine molecular properties. To ensure synthetic accessibility, initial fragments and scaffold-modification hierarchy are derived from curated reaction rules, and a physical-chemistry-inspired refinement step is applied to optimize final conformations. Experimental results on multiple SBDD benchmarks demonstrate that FDC-Diff achieves state-of-the-art performance in terms of comprehensive evaluations. Furthermore, our model excels at producing chemically valid, spatially compatible, and pharmacologically relevant molecules, suggesting its potential as a feasible tool for fragment-based drug design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146002891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Mohammad Nesari, Habib MotieGhader, Saeid Ghorbian
Single-cell RNA sequencing (scRNA-seq) has transformed the resolution of cellular heterogeneity, offering insights into dynamic biological processes from tumor evolution to immune regulation. However, its clinical translation is limited by challenges such as data sparsity, batch effects (differences caused by technical variation rather than biology), and the absence of standardized benchmarks for core pipelines like Seurat and Scanpy. This review outlines emerging computational strategies that address these limitations: (A) robust preprocessing, including SCTransform for zero-inflation(an excess of zero counts in gene-expression data) correction and Harmony for batch integration-achieving 30% faster alignment than BBKNN in cohorts exceeding 100,000 cells; (B) transformer-based annotation tools such as scGPT and CellTypist, which reach >95% accuracy in immune profiling using models pretrained on 33 million cells; and (C) multimodal integration with spatial transcriptomics (e.g., 10x Visium, cell2location v2), which delineate microenvironmental niches and rare CX3CR1+ T-cell subsets in disease contexts like glioblastoma and severe COVID-19. We further assess how scANVI bridges scRNA-seq and ATAC-seq to uncover epigenetic mechanisms underlying therapy resistance, and how spatial methods elucidate tumor-immune crosstalk at subcellular resolution. Despite these advances, ethical risks remain, particularly around re-identification of rare patient-derived clones such as pre-metastatic cells. To promote clinical adoption, we propose a roadmap that prioritizes benchmarked workflows (e.g., scverse ecosystem), privacy-aware data sharing via federated learning, and causal AI approaches to disentangle biological signal from technical artifact. By synthesizing computational innovations with translational case studies, this review equips researchers to navigate both the analytical and ethical complexities of scRNA-seq in pursuit of actionable diagnostics.
{"title":"Advances and challenges in single-cell RNA sequencing data analysis: a comprehensive review.","authors":"Ali Mohammad Nesari, Habib MotieGhader, Saeid Ghorbian","doi":"10.1093/bib/bbaf723","DOIUrl":"10.1093/bib/bbaf723","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has transformed the resolution of cellular heterogeneity, offering insights into dynamic biological processes from tumor evolution to immune regulation. However, its clinical translation is limited by challenges such as data sparsity, batch effects (differences caused by technical variation rather than biology), and the absence of standardized benchmarks for core pipelines like Seurat and Scanpy. This review outlines emerging computational strategies that address these limitations: (A) robust preprocessing, including SCTransform for zero-inflation(an excess of zero counts in gene-expression data) correction and Harmony for batch integration-achieving 30% faster alignment than BBKNN in cohorts exceeding 100,000 cells; (B) transformer-based annotation tools such as scGPT and CellTypist, which reach >95% accuracy in immune profiling using models pretrained on 33 million cells; and (C) multimodal integration with spatial transcriptomics (e.g., 10x Visium, cell2location v2), which delineate microenvironmental niches and rare CX3CR1+ T-cell subsets in disease contexts like glioblastoma and severe COVID-19. We further assess how scANVI bridges scRNA-seq and ATAC-seq to uncover epigenetic mechanisms underlying therapy resistance, and how spatial methods elucidate tumor-immune crosstalk at subcellular resolution. Despite these advances, ethical risks remain, particularly around re-identification of rare patient-derived clones such as pre-metastatic cells. To promote clinical adoption, we propose a roadmap that prioritizes benchmarked workflows (e.g., scverse ecosystem), privacy-aware data sharing via federated learning, and causal AI approaches to disentangle biological signal from technical artifact. By synthesizing computational innovations with translational case studies, this review equips researchers to navigate both the analytical and ethical complexities of scRNA-seq in pursuit of actionable diagnostics.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12860385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146096646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lena Maria Hackl, Fabian Neuhaus, Sabine Ameling, Uwe Völker, Jan Baumbach, Olga Tsoy
Alternative splicing is a crucial mechanism of gene regulation that enables condition- and tissue-specific expression of gene isoforms. Its dysregulation plays a role in various diseases such as cancer, neurological disorders, and metabolic conditions. Despite its importance, accurate detection of alternative splicing events remains challenging. Comprehensive alternative splicing event detection typically requires deep sequencing with over 100 million reads; however, much of the publicly accessible RNA sequencing data is of lower sequencing depth. Recent advances, particularly deep learning models working with genomic sequences, offer new avenues for predicting alternative splicing without reliance on high sequencing depth data. Our study addresses the question: Can we utilize the vast repository of publicly available RNA sequencing data for comprehensive alternative splicing detection, despite the low sequencing depth? Our results demonstrate the potential of sequence-based deep learning tools such as AlphaGenome, SpliceAI and DeepSplice for initial hypothesis development and as additional filters in standard RNA sequencing pipelines, especially when sequencing depth is limited. Nonetheless, validation with higher sequencing depths remains essential for confirmation of splice events. Overall, our findings underscore the need for integrative methods combining genomic sequence data and RNA sequencing data for the prediction of tissue- and condition-specific alternative splicing in resource-limited settings.
{"title":"Detection of alternative splicing: deep sequencing or deep learning?","authors":"Lena Maria Hackl, Fabian Neuhaus, Sabine Ameling, Uwe Völker, Jan Baumbach, Olga Tsoy","doi":"10.1093/bib/bbaf705","DOIUrl":"10.1093/bib/bbaf705","url":null,"abstract":"<p><p>Alternative splicing is a crucial mechanism of gene regulation that enables condition- and tissue-specific expression of gene isoforms. Its dysregulation plays a role in various diseases such as cancer, neurological disorders, and metabolic conditions. Despite its importance, accurate detection of alternative splicing events remains challenging. Comprehensive alternative splicing event detection typically requires deep sequencing with over 100 million reads; however, much of the publicly accessible RNA sequencing data is of lower sequencing depth. Recent advances, particularly deep learning models working with genomic sequences, offer new avenues for predicting alternative splicing without reliance on high sequencing depth data. Our study addresses the question: Can we utilize the vast repository of publicly available RNA sequencing data for comprehensive alternative splicing detection, despite the low sequencing depth? Our results demonstrate the potential of sequence-based deep learning tools such as AlphaGenome, SpliceAI and DeepSplice for initial hypothesis development and as additional filters in standard RNA sequencing pipelines, especially when sequencing depth is limited. Nonetheless, validation with higher sequencing depths remains essential for confirmation of splice events. Overall, our findings underscore the need for integrative methods combining genomic sequence data and RNA sequencing data for the prediction of tissue- and condition-specific alternative splicing in resource-limited settings.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790623/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qisheng Pan, Stephanie Portelli, Thanh Binh Nguyen, David B Ascher
Drug resistance caused by mutations is a significant global health concern. One way to better understand this phenomenon is by studying changes in protein-ligand binding affinity upon mutation. While recent advances in protein modelling, such as AlphaFold2 and AlphaFold3, have transformed structural assessments, their utility in predicting mutation-induced binding affinity changes remains underexplored. We evaluated various mutation-based methods and scoring functions using computer-generated protein-ligand complexes. Compared to a baseline using experimental structures, we observed a performance drop ranging from 5% to 30% across different computational models. Specifically, using experimental receptors with docked ligands resulted in a ~5% drop, similar to that observed with AlphaFold3 models (~5%), despite the latter offering lower ligand root mean square deviation. However, using AlphaFold2 receptors with docking led to a greater performance loss (10%-20%), comparable to homology models with high sequence identity. Homology models based on low-identity templates showed over 30% decline. These performance differences were most pronounced for interface mutations and low molecular weight ligands. While AlphaFold models offer accurate protein and interaction predictions, they lack mutation-specific information, such as dynamic changes, highlighting the need for complementary mutation-aware methods for reliable analysis. Our findings provide insights into interpreting mutation effects on ligand binding using predicted structures and can guide more robust assessments of drug resistance mechanisms in silico.
{"title":"Systematic evaluation of computational tools to predict the effects of mutations on protein-ligand binding affinity in the absence of experimental structures.","authors":"Qisheng Pan, Stephanie Portelli, Thanh Binh Nguyen, David B Ascher","doi":"10.1093/bib/bbag035","DOIUrl":"10.1093/bib/bbag035","url":null,"abstract":"<p><p>Drug resistance caused by mutations is a significant global health concern. One way to better understand this phenomenon is by studying changes in protein-ligand binding affinity upon mutation. While recent advances in protein modelling, such as AlphaFold2 and AlphaFold3, have transformed structural assessments, their utility in predicting mutation-induced binding affinity changes remains underexplored. We evaluated various mutation-based methods and scoring functions using computer-generated protein-ligand complexes. Compared to a baseline using experimental structures, we observed a performance drop ranging from 5% to 30% across different computational models. Specifically, using experimental receptors with docked ligands resulted in a ~5% drop, similar to that observed with AlphaFold3 models (~5%), despite the latter offering lower ligand root mean square deviation. However, using AlphaFold2 receptors with docking led to a greater performance loss (10%-20%), comparable to homology models with high sequence identity. Homology models based on low-identity templates showed over 30% decline. These performance differences were most pronounced for interface mutations and low molecular weight ligands. While AlphaFold models offer accurate protein and interaction predictions, they lack mutation-specific information, such as dynamic changes, highlighting the need for complementary mutation-aware methods for reliable analysis. Our findings provide insights into interpreting mutation effects on ligand binding using predicted structures and can guide more robust assessments of drug resistance mechanisms in silico.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146123814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antimicrobial resistance poses a significant challenge to conventional antibiotics, underscoring the urgent need for alternative therapeutic strategies. Antimicrobial peptides (AMPs) have emerged as promising candidates due to their broad-spectrum antibacterial activity and distinct mechanisms of action. This study presents ANIA, a deep learning framework developed to predict the minimum inhibitory concentration (MIC) values of AMPs against three clinically significant bacteria: Staphylococcus aureus, Escherichia coli, and Pseudomonas aeruginosa. ANIA leverages Chaos Game Representation (CGR) to transform AMP sequences into frequency-based image features, which are subsequently processed through a hybrid architecture comprising stacked Inception modules, a Transformer encoder, and a regression head. This integrative architecture enables ANIA to capture both local motif-based features and global contextual patterns embedded within AMP sequences. In benchmarking experiments, ANIA achieved notably superior performance compared to existing tools, including ESKAPEE-Pred, AMPActiPred, and esAMPMIC, achieving higher correlation coefficients and lower predictive errors across all bacteria targets, with the most pronounced improvement observed for P. aeruginosa, a pathogen renowned for its multidrug resistance. Specifically, ANIA achieved PCCs of 0.75-0.79 and MSEs of 0.23-0.26 across all species. Furthermore, motif-based interpretability analyses combining Grad-CAM visualizations, correlation heatmaps, motif frequency distributions, and hydrophobicity profiling revealed biologically meaningful subregions within the CGR matrix that are plausibly associated with antimicrobial efficacy. In conclusion, this study develops ANIA as a robust predictive tool for MIC estimation, offering valuable insights into the design of effective antimicrobial agents and contributing to the fight against antimicrobial resistance. A user-friendly web server for ANIA is available at https://biomics.lab.nycu.edu.tw/ANIA/.
{"title":"ANIA: an inception-attention network for predicting minimum inhibitory concentration of antimicrobial peptides.","authors":"Yen-Peng Chiu, Lantian Yao, Yun Tang, Chia-Ru Chung, Yuxuan Pang, Ying-Chih Chiang, Tzong-Yi Lee","doi":"10.1093/bib/bbag023","DOIUrl":"https://doi.org/10.1093/bib/bbag023","url":null,"abstract":"<p><p>Antimicrobial resistance poses a significant challenge to conventional antibiotics, underscoring the urgent need for alternative therapeutic strategies. Antimicrobial peptides (AMPs) have emerged as promising candidates due to their broad-spectrum antibacterial activity and distinct mechanisms of action. This study presents ANIA, a deep learning framework developed to predict the minimum inhibitory concentration (MIC) values of AMPs against three clinically significant bacteria: Staphylococcus aureus, Escherichia coli, and Pseudomonas aeruginosa. ANIA leverages Chaos Game Representation (CGR) to transform AMP sequences into frequency-based image features, which are subsequently processed through a hybrid architecture comprising stacked Inception modules, a Transformer encoder, and a regression head. This integrative architecture enables ANIA to capture both local motif-based features and global contextual patterns embedded within AMP sequences. In benchmarking experiments, ANIA achieved notably superior performance compared to existing tools, including ESKAPEE-Pred, AMPActiPred, and esAMPMIC, achieving higher correlation coefficients and lower predictive errors across all bacteria targets, with the most pronounced improvement observed for P. aeruginosa, a pathogen renowned for its multidrug resistance. Specifically, ANIA achieved PCCs of 0.75-0.79 and MSEs of 0.23-0.26 across all species. Furthermore, motif-based interpretability analyses combining Grad-CAM visualizations, correlation heatmaps, motif frequency distributions, and hydrophobicity profiling revealed biologically meaningful subregions within the CGR matrix that are plausibly associated with antimicrobial efficacy. In conclusion, this study develops ANIA as a robust predictive tool for MIC estimation, offering valuable insights into the design of effective antimicrobial agents and contributing to the fight against antimicrobial resistance. A user-friendly web server for ANIA is available at https://biomics.lab.nycu.edu.tw/ANIA/.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyong Yu, Jia Wen, Yang Gao, Zhaoqing Yang, Xu Wang, Yan Lu, Jiayou Chu, Dilinuer Maimaitiyiming, Shuhua Xu
The Yugur and Uyghur people of northwestern China share documented Early Medieval origins, yet the evolutionary processes that shaped their present-day genomes remain unresolved. Here, we generate high-coverage whole-genome sequences for the Yugurs and compare them with Uyghur genomes to reconstruct their demographic histories, ancestry profiles, and adaptive trajectories. Both groups derive from mixtures of East Eurasian ancestry (EEA) and West Eurasian ancestry (WEA) but in sharply contrasting proportions: the Yugur retain predominantly EEA (~90%), whereas the Uyghur harbor a near-equal balance. Modeling reveals distinct episodes of admixture in Gansu and Xinjiang, with identity-by-descent patterns indicating persistent but substantially reduced genetic continuity (FST = 0.021). Strikingly, despite their EEA-rich background, the Yugur show WEA-shifted allele frequencies at craniofacial loci, including EDAR and LIMS1, suggesting subtle trait convergence. Signals of recent positive selection further differentiate the two populations: the Yugur display strong selection on the FADS locus linked to lipid metabolism, whereas both groups exhibit selection at PPARA but with greater intensity in the Uyghur, consistent with their higher WEA. Functional enrichment analyses highlight overlapping immune and metabolic pathways, consistent with shared biological patterns shaped by demographic history and long-term residence in Northwestern China. Together, these findings show how divergent admixture proportions and region-specific natural selection have produced distinct genomic architectures in two historically related populations along the Silk Road.
{"title":"Divergent Eurasian ancestry and local adaptation shape the genetic landscapes of the Yugur and Uyghur.","authors":"Siyong Yu, Jia Wen, Yang Gao, Zhaoqing Yang, Xu Wang, Yan Lu, Jiayou Chu, Dilinuer Maimaitiyiming, Shuhua Xu","doi":"10.1093/bib/bbag041","DOIUrl":"https://doi.org/10.1093/bib/bbag041","url":null,"abstract":"<p><p>The Yugur and Uyghur people of northwestern China share documented Early Medieval origins, yet the evolutionary processes that shaped their present-day genomes remain unresolved. Here, we generate high-coverage whole-genome sequences for the Yugurs and compare them with Uyghur genomes to reconstruct their demographic histories, ancestry profiles, and adaptive trajectories. Both groups derive from mixtures of East Eurasian ancestry (EEA) and West Eurasian ancestry (WEA) but in sharply contrasting proportions: the Yugur retain predominantly EEA (~90%), whereas the Uyghur harbor a near-equal balance. Modeling reveals distinct episodes of admixture in Gansu and Xinjiang, with identity-by-descent patterns indicating persistent but substantially reduced genetic continuity (FST = 0.021). Strikingly, despite their EEA-rich background, the Yugur show WEA-shifted allele frequencies at craniofacial loci, including EDAR and LIMS1, suggesting subtle trait convergence. Signals of recent positive selection further differentiate the two populations: the Yugur display strong selection on the FADS locus linked to lipid metabolism, whereas both groups exhibit selection at PPARA but with greater intensity in the Uyghur, consistent with their higher WEA. Functional enrichment analyses highlight overlapping immune and metabolic pathways, consistent with shared biological patterns shaped by demographic history and long-term residence in Northwestern China. Together, these findings show how divergent admixture proportions and region-specific natural selection have produced distinct genomic architectures in two historically related populations along the Silk Road.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Zhang, Ming Li, David M Haas, C Noel Bairey Merz, Tsegaselassie Workalemahu, Kelli Ryckman, Janet M Catov, Lisa D Levine, Alexa Freedman, George R Saade, Jiaqi Hu, Hongyu Zhao, Xihao Li, Nianjun Liu, Qi Yan
Mendelian randomization (MR) has become an important technique for establishing causal relationships between risk factors and health outcomes. By using genetic variants as instrumental variables, it can mitigate bias due to confounding and reverse causation in observational studies. Current MR analyses have predominantly used common genetic variants as instruments, which represent only part of the genetic architecture of complex traits. Rare variants, which can have larger effect sizes and provide unique biological insights, have been understudied due to statistical and methodological challenges. We introduce MR-common and annotation-informed rare variants (MR-CARV), a novel framework integrating common and rare genetic variants in two-sample MR. This method leverages comprehensive genetic data made available by high-throughput sequencing technologies and large-scale consortia. Rare variants are aggregated into functional categories, such as gene-coding, gene-noncoding, and nongene regions, by leveraging variant annotations and biological impact as weights. The effects of rare variant sets are then estimated with STAARpipeline and combined with the estimated effects of common variants by the existing MR methods. Simulation studies demonstrate that MR-CARV maintains robust type I error and achieves higher statistical power, with up to a 66.3% relative increase compared with existing methods only based on common variants. Consistent with these findings, application to real data on high-density lipoprotein cholesterol (HDL-C) and preeclampsia showed that MR-CARV [inverse variance weighted (IVW)] yielded a more precise and statistically significant effect estimate (-0.020, SE = 0.0102, $P$ =.0470) than IVW using only common variants (-0.023, SE = 0.0123, $P$ =.0659).
{"title":"A novel two-sample Mendelian randomization framework integrating common and rare variants: application to assess the effect of HDL-C on preeclampsia risk.","authors":"Yu Zhang, Ming Li, David M Haas, C Noel Bairey Merz, Tsegaselassie Workalemahu, Kelli Ryckman, Janet M Catov, Lisa D Levine, Alexa Freedman, George R Saade, Jiaqi Hu, Hongyu Zhao, Xihao Li, Nianjun Liu, Qi Yan","doi":"10.1093/bib/bbaf649","DOIUrl":"10.1093/bib/bbaf649","url":null,"abstract":"<p><p>Mendelian randomization (MR) has become an important technique for establishing causal relationships between risk factors and health outcomes. By using genetic variants as instrumental variables, it can mitigate bias due to confounding and reverse causation in observational studies. Current MR analyses have predominantly used common genetic variants as instruments, which represent only part of the genetic architecture of complex traits. Rare variants, which can have larger effect sizes and provide unique biological insights, have been understudied due to statistical and methodological challenges. We introduce MR-common and annotation-informed rare variants (MR-CARV), a novel framework integrating common and rare genetic variants in two-sample MR. This method leverages comprehensive genetic data made available by high-throughput sequencing technologies and large-scale consortia. Rare variants are aggregated into functional categories, such as gene-coding, gene-noncoding, and nongene regions, by leveraging variant annotations and biological impact as weights. The effects of rare variant sets are then estimated with STAARpipeline and combined with the estimated effects of common variants by the existing MR methods. Simulation studies demonstrate that MR-CARV maintains robust type I error and achieves higher statistical power, with up to a 66.3% relative increase compared with existing methods only based on common variants. Consistent with these findings, application to real data on high-density lipoprotein cholesterol (HDL-C) and preeclampsia showed that MR-CARV [inverse variance weighted (IVW)] yielded a more precise and statistically significant effect estimate (-0.020, SE = 0.0102, $P$ =.0470) than IVW using only common variants (-0.023, SE = 0.0123, $P$ =.0659).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenli Zhai, Lingyun Sun, Wenwei Fang, Yidan Dong, Chunxiao Cheng, Yuanjiao Liu, Yuan Zhou, Jiadong Ji, Lang Wu, An Pan, Eric R Gamazon, Xiong-Fei Pan, Dan Zhou
Genetics-informed proteome-wide association studies (PWASs) provide an effective way to uncover proteomic mechanisms underlying complex diseases. PWAS relies on an ancestry-matched reference panel to model the impact of genetically determined protein expression on phenotype. However, reference panels from underrepresented populations remain relatively limited. We developed a multi-ancestry framework to enhance protein prediction in these populations by integrating diverse information-sharing strategies into a Multi-Ancestry Best-performing Model (MABM). Results indicated that MABM increased the prediction performance with higher performance observed in both cross-validation and an external dataset. Leveraging the Biobank Japan, we identified three times as many significant PWAS associations using MABM as using Lasso model. Notably, 47.5% of the MABM specific associations were reproduced in independent East Asian datasets with concordant effect sizes. Furthermore, MABM enhanced decision-making in gene/protein prioritization for functional validation for complex traits by validating well-established associations and uncovering novel trait-related candidates. The benefits of MABM were further validated in additional ancestries and demonstrated in brain tissue-based PWAS, underscoring its broad applicability. Our findings close critical gaps in multi-omics research among underrepresented populations and facilitate trait-relevant protein discovery in underrepresented populations.
{"title":"Cross-ancestry information transfer framework improves protein abundance prediction and protein-trait association identification.","authors":"Wenli Zhai, Lingyun Sun, Wenwei Fang, Yidan Dong, Chunxiao Cheng, Yuanjiao Liu, Yuan Zhou, Jiadong Ji, Lang Wu, An Pan, Eric R Gamazon, Xiong-Fei Pan, Dan Zhou","doi":"10.1093/bib/bbaf707","DOIUrl":"10.1093/bib/bbaf707","url":null,"abstract":"<p><p>Genetics-informed proteome-wide association studies (PWASs) provide an effective way to uncover proteomic mechanisms underlying complex diseases. PWAS relies on an ancestry-matched reference panel to model the impact of genetically determined protein expression on phenotype. However, reference panels from underrepresented populations remain relatively limited. We developed a multi-ancestry framework to enhance protein prediction in these populations by integrating diverse information-sharing strategies into a Multi-Ancestry Best-performing Model (MABM). Results indicated that MABM increased the prediction performance with higher performance observed in both cross-validation and an external dataset. Leveraging the Biobank Japan, we identified three times as many significant PWAS associations using MABM as using Lasso model. Notably, 47.5% of the MABM specific associations were reproduced in independent East Asian datasets with concordant effect sizes. Furthermore, MABM enhanced decision-making in gene/protein prioritization for functional validation for complex traits by validating well-established associations and uncovering novel trait-related candidates. The benefits of MABM were further validated in additional ancestries and demonstrated in brain tissue-based PWAS, underscoring its broad applicability. Our findings close critical gaps in multi-omics research among underrepresented populations and facilitate trait-relevant protein discovery in underrepresented populations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruhai Chen, Jiekai Chen, Lingling Shi, Jiangping He
Chromatin topological structure is critical for gene regulation. Hi-C based experiments have significantly advanced our understanding chromatin organization. Numerous computational tools have been developed to identify various structural levels of chromatin, ranging from compartments to loops. However, there remains a lack of specialized tools for identifying non-homologous inter-chromatin contacts (NHCCs), which play important roles in chromosome territories. In this study, we present iceDP, a tool that leverages the Density Peaks clustering algorithm to identify local high-density regions within inter-chromatin. These regions undergo two subsequent filtering steps to eliminate obvious false positives. When applied to three Hi-C datasets, iceDP accurately identified known NHCCs, including olfactory receptor genes in mature olfactory sensory neurons and Polycomb repressive complex-regulated developmental genes in mouse embryonic stem cells (mESCs). Notably, iceDP also uncovered previously unreported transcriptionally active NHCCs. Compared to diffHiC and FitHiC, iceDP exhibited superior performance with the highest positive rate. Moreover, iceDP is compatible with a wide range of chromatin conformation capture techniques, including in-situ Hi-C, Micro-C, HiChIP, and BL-HiC, demonstrating its versatility and utility.
{"title":"iceDP: identifying inter-chromatin engagement via density peaks clustering algorithm.","authors":"Ruhai Chen, Jiekai Chen, Lingling Shi, Jiangping He","doi":"10.1093/bib/bbaf704","DOIUrl":"10.1093/bib/bbaf704","url":null,"abstract":"<p><p>Chromatin topological structure is critical for gene regulation. Hi-C based experiments have significantly advanced our understanding chromatin organization. Numerous computational tools have been developed to identify various structural levels of chromatin, ranging from compartments to loops. However, there remains a lack of specialized tools for identifying non-homologous inter-chromatin contacts (NHCCs), which play important roles in chromosome territories. In this study, we present iceDP, a tool that leverages the Density Peaks clustering algorithm to identify local high-density regions within inter-chromatin. These regions undergo two subsequent filtering steps to eliminate obvious false positives. When applied to three Hi-C datasets, iceDP accurately identified known NHCCs, including olfactory receptor genes in mature olfactory sensory neurons and Polycomb repressive complex-regulated developmental genes in mouse embryonic stem cells (mESCs). Notably, iceDP also uncovered previously unreported transcriptionally active NHCCs. Compared to diffHiC and FitHiC, iceDP exhibited superior performance with the highest positive rate. Moreover, iceDP is compatible with a wide range of chromatin conformation capture techniques, including in-situ Hi-C, Micro-C, HiChIP, and BL-HiC, demonstrating its versatility and utility.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Circular RNA (circRNA) represents a critical class of regulatory RNAs with distinctive structural and functional features. The functions of circRNAs are modulated by various RNA modifications. Here, we present CircRM, a nanopore direct RNA sequencing-based computational method for profiling RNA modifications in circRNAs at single-base and single-molecule resolution. By integrating circRNA detection, read-level modification detection, and quantitative assessment of methylation rates, CircRM identified 427 high-confidence circRNAs and enables systematic characterization of three major modifications, m5C (AUC = 0.855), m6A (AUC = 0.817) and m1A (AUC = 0.769). It revealed distinct modification patterns compared with linear RNAs, highlighting RNA-type-specific regulations. We also identified the key features of circRNA-specific modifications, such as the enrichment near the back-splice junctions. Cross-cell line analyses further demonstrated conserved and cell-type-specific modification patterns. Together, these findings reveal, at the computational level, a unique epitranscriptomic landscape associated with circRNAs and establish CircRM as a powerful tool for advancing the study of RNA modifications in circular RNA biology. CircRM is free accessible at: https://github.com/jiayiAnnie17/CircRM.
{"title":"CircRM: profiling circular RNA modifications from nanopore direct RNA sequencing.","authors":"Jiayi Li, Shenglun Chen, Zhixing Wu, Haozhe Wang, Rong Xia, Jia Meng, Yuxin Zhang","doi":"10.1093/bib/bbaf726","DOIUrl":"10.1093/bib/bbaf726","url":null,"abstract":"<p><p>Circular RNA (circRNA) represents a critical class of regulatory RNAs with distinctive structural and functional features. The functions of circRNAs are modulated by various RNA modifications. Here, we present CircRM, a nanopore direct RNA sequencing-based computational method for profiling RNA modifications in circRNAs at single-base and single-molecule resolution. By integrating circRNA detection, read-level modification detection, and quantitative assessment of methylation rates, CircRM identified 427 high-confidence circRNAs and enables systematic characterization of three major modifications, m5C (AUC = 0.855), m6A (AUC = 0.817) and m1A (AUC = 0.769). It revealed distinct modification patterns compared with linear RNAs, highlighting RNA-type-specific regulations. We also identified the key features of circRNA-specific modifications, such as the enrichment near the back-splice junctions. Cross-cell line analyses further demonstrated conserved and cell-type-specific modification patterns. Together, these findings reveal, at the computational level, a unique epitranscriptomic landscape associated with circRNAs and establish CircRM as a powerful tool for advancing the study of RNA modifications in circular RNA biology. CircRM is free accessible at: https://github.com/jiayiAnnie17/CircRM.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}