Spatially resolved transcriptomics (SRT) measures transcriptomes of cells within intact biological tissues, providing unprecedented opportunities to investigate tissue micro-environments, where spatial domains are modeled as clusters of spatially neighboring cells. Current methods for the identification of spatial domain from SRT mainly rely on expression profiles and spatial coordinates of cells, which ignore intercellular interactions among them, resulting in high sensitivity and low accuracy. To bridge these gaps, we introduce a novel framework, called SiDMGF (Signal-based Domain identification with Multi-Graph Fusion), that integrates gene set-derived signaling and spatial graphs to jointly model biological context, spatial information, and gene expression of cell embedding, thereby dramatically improving accuracy and robustness of performance of algorithms for spatial domain identification. Experimental results demonstrate that SiDMGF consistently outperforms state-of-the-art methods across multiple benchmark datasets and achieves superior domain identification performance on diverse spatial sequence platforms. Furthermore, we demonstrate that the proposed SiDMGF can also be effectively applied to cancer-related tissue samples, accurately delineating micro-environment heterogeneity within tumor slice.
{"title":"Signal-based spatial domain identification of spatially resolved transcriptomics with multigraph fusion.","authors":"Yaxiong Ma, Yu Wang, Xiaoke Ma","doi":"10.1093/bib/bbag052","DOIUrl":"10.1093/bib/bbag052","url":null,"abstract":"<p><p>Spatially resolved transcriptomics (SRT) measures transcriptomes of cells within intact biological tissues, providing unprecedented opportunities to investigate tissue micro-environments, where spatial domains are modeled as clusters of spatially neighboring cells. Current methods for the identification of spatial domain from SRT mainly rely on expression profiles and spatial coordinates of cells, which ignore intercellular interactions among them, resulting in high sensitivity and low accuracy. To bridge these gaps, we introduce a novel framework, called SiDMGF (Signal-based Domain identification with Multi-Graph Fusion), that integrates gene set-derived signaling and spatial graphs to jointly model biological context, spatial information, and gene expression of cell embedding, thereby dramatically improving accuracy and robustness of performance of algorithms for spatial domain identification. Experimental results demonstrate that SiDMGF consistently outperforms state-of-the-art methods across multiple benchmark datasets and achieves superior domain identification performance on diverse spatial sequence platforms. Furthermore, we demonstrate that the proposed SiDMGF can also be effectively applied to cancer-related tissue samples, accurately delineating micro-environment heterogeneity within tumor slice.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12893220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146164232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyi Chen, Yongxin Hao, Yuhong Su, Hans Ågren, Mingan Chen, Zhehuan Fan, Duanhua Cao, Jiacheng Xiong, Wei Zhang, Jin Liu, Xutong Li, Mingyue Zheng, Xi Cheng, Dingyan Wang, Dan Teng
G protein-coupled receptors (GPCRs) represent the largest membrane protein family and remain central targets in drug discovery. Ligand efficacy reflects the ability to modulate receptor conformational states and extends beyond binding affinity to underpin functional selectivity. However, most computational approaches still emphasize affinity prediction, with limited capacity to capture the conformational dynamics driving efficacy. Here, we introduce Dynamic-GLEP, a structure- and mechanism-aware framework that integrates molecular dynamics (MD)-derived conformational ensembles with transfer learning on equivariant graph neural networks. By constructing multi-conformation receptor-ligand complexes and fine-tuning the EquiScore model, Dynamic-GLEP identifies conformation-dependent interaction features to distinguish agonists from nonagonists. Applied to the 5-HT1A receptor, the framework achieved an area under the curve (AUC) of 0.74 in cross-validation and 0.71 on an external Food and Drug Administration (FDA)-related dataset. Comparative analyses showed that Holo-based models are advantageous for scaffold optimization, whereas Apo-derived ensembles provided greater adaptability to chemically diverse ligands. Furthermore, extension to the adenosine A2A receptor yielded high performance (AUC > 0.85), underscoring the method's robustness and transferability under data-scarce conditions. Collectively, these results highlight Dynamic-GLEP as a reliable and interpretable platform for ligand efficacy prediction in Class A GPCRs, with broad potential to support virtual screening, candidate prioritization, and mechanism-driven drug design.
{"title":"Dynamic-GLEP: a dynamics-informed deep learning framework for ligand efficacy prediction in representative Class A GPCRs.","authors":"Zhiyi Chen, Yongxin Hao, Yuhong Su, Hans Ågren, Mingan Chen, Zhehuan Fan, Duanhua Cao, Jiacheng Xiong, Wei Zhang, Jin Liu, Xutong Li, Mingyue Zheng, Xi Cheng, Dingyan Wang, Dan Teng","doi":"10.1093/bib/bbag049","DOIUrl":"10.1093/bib/bbag049","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs) represent the largest membrane protein family and remain central targets in drug discovery. Ligand efficacy reflects the ability to modulate receptor conformational states and extends beyond binding affinity to underpin functional selectivity. However, most computational approaches still emphasize affinity prediction, with limited capacity to capture the conformational dynamics driving efficacy. Here, we introduce Dynamic-GLEP, a structure- and mechanism-aware framework that integrates molecular dynamics (MD)-derived conformational ensembles with transfer learning on equivariant graph neural networks. By constructing multi-conformation receptor-ligand complexes and fine-tuning the EquiScore model, Dynamic-GLEP identifies conformation-dependent interaction features to distinguish agonists from nonagonists. Applied to the 5-HT1A receptor, the framework achieved an area under the curve (AUC) of 0.74 in cross-validation and 0.71 on an external Food and Drug Administration (FDA)-related dataset. Comparative analyses showed that Holo-based models are advantageous for scaffold optimization, whereas Apo-derived ensembles provided greater adaptability to chemically diverse ligands. Furthermore, extension to the adenosine A2A receptor yielded high performance (AUC > 0.85), underscoring the method's robustness and transferability under data-scarce conditions. Collectively, these results highlight Dynamic-GLEP as a reliable and interpretable platform for ligand efficacy prediction in Class A GPCRs, with broad potential to support virtual screening, candidate prioritization, and mechanism-driven drug design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12900074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146177725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrections to the following abstracts.","authors":"","doi":"10.1093/bib/bbag080","DOIUrl":"10.1093/bib/bbag080","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12888816/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146156209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate prediction of ligand-induced activity for G-protein-coupled receptors (GPCRs) is a cornerstone of drug discovery, yet it is challenged by the need to model allosteric communication-the long-range signaling linking ligand binding to distal conformational changes. Prevailing sequence-based models often fail to capture these three-dimensional dynamics, a limitation frequently masked by averaged performance on simpler Class A targets. To address this, we introduce GPCRact, a novel framework that models the biophysical principles of allosteric modulation in GPCR activation. It first constructs a high-resolution, three-dimensional structure-aware graph from the heavy-atom coordinates of functionally critical residues at binding and allosteric sites. A dual attention architecture then captures the activation process: cross-attention encodes the initial ligand-protein interaction at the binding site, whereas self-attention learns the subsequent intra-protein signal propagation. This hierarchical architecture is built upon an E(n)-Equivariant Graph Neural Network (EGNN) to explicitly model conformational consequences of ligand binding, and is further refined with a tailored loss function and inference logic to mitigate error propagation. Underpinned by GPCRactDB, a comprehensive database we constructed for this study, GPCRact not only achieves state-of-the-art performance but also demonstrates robustly superior accuracy on a curated benchmark of allosterically complex receptors where existing models systematically underperform. Crucially, analysis of the learned attention weights confirms that the model identifies biologically validated allosteric pathways, offering a significant step toward resolving the black box nature of previous methods. Thus, GPCRact provides a more accurate, interpretable, and mechanistically-grounded solution to a long-standing challenge, paving the way for effective structure-guided drug discovery.
{"title":"GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling.","authors":"Hyojin Son, Gwan-Su Yi","doi":"10.1093/bib/bbaf719","DOIUrl":"10.1093/bib/bbaf719","url":null,"abstract":"<p><p>Accurate prediction of ligand-induced activity for G-protein-coupled receptors (GPCRs) is a cornerstone of drug discovery, yet it is challenged by the need to model allosteric communication-the long-range signaling linking ligand binding to distal conformational changes. Prevailing sequence-based models often fail to capture these three-dimensional dynamics, a limitation frequently masked by averaged performance on simpler Class A targets. To address this, we introduce GPCRact, a novel framework that models the biophysical principles of allosteric modulation in GPCR activation. It first constructs a high-resolution, three-dimensional structure-aware graph from the heavy-atom coordinates of functionally critical residues at binding and allosteric sites. A dual attention architecture then captures the activation process: cross-attention encodes the initial ligand-protein interaction at the binding site, whereas self-attention learns the subsequent intra-protein signal propagation. This hierarchical architecture is built upon an E(n)-Equivariant Graph Neural Network (EGNN) to explicitly model conformational consequences of ligand binding, and is further refined with a tailored loss function and inference logic to mitigate error propagation. Underpinned by GPCRactDB, a comprehensive database we constructed for this study, GPCRact not only achieves state-of-the-art performance but also demonstrates robustly superior accuracy on a curated benchmark of allosterically complex receptors where existing models systematically underperform. Crucially, analysis of the learned attention weights confirms that the model identifies biologically validated allosteric pathways, offering a significant step toward resolving the black box nature of previous methods. Thus, GPCRact provides a more accurate, interpretable, and mechanistically-grounded solution to a long-standing challenge, paving the way for effective structure-guided drug discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bosheng Song, Jiayi Zhang, Ying Liu, Yuansheng Liu, Jing Jiang, Sisi Yuan, Xia Zhen, Yiping Liu
Molecular representation learning (MRL) is afoundation in leveraging computational methods for drug discovery, enabling the transformation of molecular structure and properties into numerical vectors. These vectors serve as input for machine learning models and facilitate the prediction and analysis of molecular attributes, functions, and reactions. The advent of foundation models has introduced both new opportunities and challenges to MRL. These models have improved generalizability and migration in scarce data. Through pretraining and fine-tuning, foundation models can be adapted to various domains. Their robust encoding and generative abilities also allow the transformation of molecular data into more expressive forms. This paper provides a detailed review of current mainstream molecular descriptors and datasets, focusing primarily on the representation of small molecules while excluding larger molecules such as proteins and peptides. It classifies foundation models into two primary categories based on the form of input: unimodal-based and multimodal-based models. For each category, representative models are identified and their advantages and disadvantages evaluated. Moreover, we systematically summarize four core pretraining strategies for MRL foundation models, analyzing their task designs, applicable scenarios, and impacts on downstream performance. In addition, the application of molecular representation foundation models in drug discovery and development is discussed, together with the current status of model interpretability. The paper concludes with insights into the future directions of MRL foundation models.
{"title":"A systematic review of molecular representation learning foundation models.","authors":"Bosheng Song, Jiayi Zhang, Ying Liu, Yuansheng Liu, Jing Jiang, Sisi Yuan, Xia Zhen, Yiping Liu","doi":"10.1093/bib/bbaf703","DOIUrl":"10.1093/bib/bbaf703","url":null,"abstract":"<p><p>Molecular representation learning (MRL) is afoundation in leveraging computational methods for drug discovery, enabling the transformation of molecular structure and properties into numerical vectors. These vectors serve as input for machine learning models and facilitate the prediction and analysis of molecular attributes, functions, and reactions. The advent of foundation models has introduced both new opportunities and challenges to MRL. These models have improved generalizability and migration in scarce data. Through pretraining and fine-tuning, foundation models can be adapted to various domains. Their robust encoding and generative abilities also allow the transformation of molecular data into more expressive forms. This paper provides a detailed review of current mainstream molecular descriptors and datasets, focusing primarily on the representation of small molecules while excluding larger molecules such as proteins and peptides. It classifies foundation models into two primary categories based on the form of input: unimodal-based and multimodal-based models. For each category, representative models are identified and their advantages and disadvantages evaluated. Moreover, we systematically summarize four core pretraining strategies for MRL foundation models, analyzing their task designs, applicable scenarios, and impacts on downstream performance. In addition, the application of molecular representation foundation models in drug discovery and development is discussed, together with the current status of model interpretability. The paper concludes with insights into the future directions of MRL foundation models.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenjie Feng, Xiaowen Sun, Xintao Song, Lei Bao, Weikang Gong, Renmin Han
Understanding RNA conformational dynamics is essential to understand its roles in complex biological processes. While computational methods have revolutionized the prediction of static 3D RNA structures, predicting local flexibility directly from structure remains a significant challenge. We developed DeepRMSF, a deep learning-based method that leverages atomic-level descriptions of RNA to predict vibrational flexibility given a tertiary structure. Trained on MD-derived root-mean-square fluctuations(RMSF), DeepRMSF was benchmarked on 371 nonredundant RNAs, with 311 RNAs used for five-fold cross-validation (PCC = 0.7219-0.7464) and 60 RNAs as an independent test set (PCC = 0.734), ensuring minimal sequence/structural similarity between sets. DeepRMSF predicts the local flexibility of medium-sized RNAs (~75 nucleotides) in ~8.2 s, achieving >3000-fold speed-up over MD simulations while maintaining strong extrapolative accuracy. Rather than replacing MD, DeepRMSF offers a scalable and practical alternative for transcriptome-scale screening of RNA flexibility, facilitating studies on RNA structure-dynamics-function relationships and supporting computational modeling in RNA biology.
{"title":"DeepRMSF: a deep learning-based automated approach for predicting atomic-level flexibility in RNA structure.","authors":"Chenjie Feng, Xiaowen Sun, Xintao Song, Lei Bao, Weikang Gong, Renmin Han","doi":"10.1093/bib/bbaf720","DOIUrl":"10.1093/bib/bbaf720","url":null,"abstract":"<p><p>Understanding RNA conformational dynamics is essential to understand its roles in complex biological processes. While computational methods have revolutionized the prediction of static 3D RNA structures, predicting local flexibility directly from structure remains a significant challenge. We developed DeepRMSF, a deep learning-based method that leverages atomic-level descriptions of RNA to predict vibrational flexibility given a tertiary structure. Trained on MD-derived root-mean-square fluctuations(RMSF), DeepRMSF was benchmarked on 371 nonredundant RNAs, with 311 RNAs used for five-fold cross-validation (PCC = 0.7219-0.7464) and 60 RNAs as an independent test set (PCC = 0.734), ensuring minimal sequence/structural similarity between sets. DeepRMSF predicts the local flexibility of medium-sized RNAs (~75 nucleotides) in ~8.2 s, achieving >3000-fold speed-up over MD simulations while maintaining strong extrapolative accuracy. Rather than replacing MD, DeepRMSF offers a scalable and practical alternative for transcriptome-scale screening of RNA flexibility, facilitating studies on RNA structure-dynamics-function relationships and supporting computational modeling in RNA biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein-carbohydrate interactions play an important role in many biological processes and functions, like inflammation, signal transduction, and cell adhesion. In our work, we will study non-covalent carbohydrate binding sites. In this paper, we aim to build a deep-learning model to predict non-covalent protein-carbohydrate binding sites. We were motivated by the fact that experimental approaches for predicting these sites are expensive. So, computational tools are necessary for identifying these interactions. We explored several sequence-based features as well as structural features. We also leveraged protein language model embeddings. We analyzed different architectures and selected the most suitable deep learning architecture for our finalized prediction model, DeepCPBSite. DeepCPBSite is an ensemble model that combines three separate models with three approaches (random undersampling, weighted oversampling, and class-weighted loss) built on the ResNet+FNN architecture. We made separate datasets from three sources: RCSB, UniProt, and CASP. We also compared the structural features extracted from the structures predicted by AlphaFold and ESMFold in the context of our prediction tasks. We employed three different feature selection techniques and finally did a SHAP (SHapley Additive exPlanations) analysis on the structural features after categorizing the proteins based on their organism information. DeepCPBSite achieved 78.7% balanced accuracy and 59.6% sensitivity on the TS53 set, outperforming the second-best competitor, DeepGlycanSite, by 1.16% and 2.94%, respectively. Additionally, its F1, MCC, and AUPR scores outperformed other state-of-the-art methods, with improvements ranging from 3.77%-47.6%, 3.84%-32.7%, and 8.18%-60.21%, respectively.
{"title":"Predicting protein-carbohydrate binding sites: a deep learning approach integrating protein language model embeddings and structural features.","authors":"Md Muhaiminul Islam Nafi, M Saifur Rahman","doi":"10.1093/bib/bbag008","DOIUrl":"10.1093/bib/bbag008","url":null,"abstract":"<p><p>Protein-carbohydrate interactions play an important role in many biological processes and functions, like inflammation, signal transduction, and cell adhesion. In our work, we will study non-covalent carbohydrate binding sites. In this paper, we aim to build a deep-learning model to predict non-covalent protein-carbohydrate binding sites. We were motivated by the fact that experimental approaches for predicting these sites are expensive. So, computational tools are necessary for identifying these interactions. We explored several sequence-based features as well as structural features. We also leveraged protein language model embeddings. We analyzed different architectures and selected the most suitable deep learning architecture for our finalized prediction model, DeepCPBSite. DeepCPBSite is an ensemble model that combines three separate models with three approaches (random undersampling, weighted oversampling, and class-weighted loss) built on the ResNet+FNN architecture. We made separate datasets from three sources: RCSB, UniProt, and CASP. We also compared the structural features extracted from the structures predicted by AlphaFold and ESMFold in the context of our prediction tasks. We employed three different feature selection techniques and finally did a SHAP (SHapley Additive exPlanations) analysis on the structural features after categorizing the proteins based on their organism information. DeepCPBSite achieved 78.7% balanced accuracy and 59.6% sensitivity on the TS53 set, outperforming the second-best competitor, DeepGlycanSite, by 1.16% and 2.94%, respectively. Additionally, its F1, MCC, and AUPR scores outperformed other state-of-the-art methods, with improvements ranging from 3.77%-47.6%, 3.84%-32.7%, and 8.18%-60.21%, respectively.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoqing Hu, Mengmeng Sang, Hao Wang, Jia Ge, Lin Xu, Stephen S-T Yau
Transcription factors (TFs) orchestrate cellular programs by activating or repressing gene expression in response to diverse stimuli. Although advances in experimental and computational biology have expanded our understanding of TFs, existing prediction methods still struggle to accurately capture TF-target regulatory relationships and determine their directionality (activation versus inhibition). Here, we propose ACNVE-K, an integrative framework combining k-mer decomposition with asymmetric covariance natural vector encoding to convert amino acid sequences into multidimensional feature vectors. Using Leveraging eXtreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) algorithms, we constructed five predictive models for TF identification, target gene inference, and regulatory direction classification. Benchmarking analyses demonstrated that XGBoost achieved the highest predictive performance across human and mouse genomes, particularly with updated genome annotations. The 5-mer configuration provided an optimal balance between feature richness and computational efficiency. Collectively, ACNVE-K offers a robust and interpretable framework for decoding transcriptional regulation, facilitating advances in precision medicine, regulatory genomics, and machine-learning-based gene network reconstruction.
{"title":"Exploring potential transcription factors and their regulatory relationships based on asymmetric covariance natural vector encoding method and machine learning algorithms.","authors":"Guoqing Hu, Mengmeng Sang, Hao Wang, Jia Ge, Lin Xu, Stephen S-T Yau","doi":"10.1093/bib/bbag044","DOIUrl":"10.1093/bib/bbag044","url":null,"abstract":"<p><p>Transcription factors (TFs) orchestrate cellular programs by activating or repressing gene expression in response to diverse stimuli. Although advances in experimental and computational biology have expanded our understanding of TFs, existing prediction methods still struggle to accurately capture TF-target regulatory relationships and determine their directionality (activation versus inhibition). Here, we propose ACNVE-K, an integrative framework combining k-mer decomposition with asymmetric covariance natural vector encoding to convert amino acid sequences into multidimensional feature vectors. Using Leveraging eXtreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) algorithms, we constructed five predictive models for TF identification, target gene inference, and regulatory direction classification. Benchmarking analyses demonstrated that XGBoost achieved the highest predictive performance across human and mouse genomes, particularly with updated genome annotations. The 5-mer configuration provided an optimal balance between feature richness and computational efficiency. Collectively, ACNVE-K offers a robust and interpretable framework for decoding transcriptional regulation, facilitating advances in precision medicine, regulatory genomics, and machine-learning-based gene network reconstruction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fanyi Huang, Mi Zhou, Yanjia Chen, Sha Hua, Yanxin Han, Yingze Fan, Qingchuan Li, Zhuoyan Sun, Ke Yang, Qiang Zhao, Wei Jin
Hypertrophic cardiomyopathy (HCM) is a condition where approximately 65% of patients exhibit myocardial fibrosis, indicated by late gadolinium enhancement, with the severity and extent of fibrosis being positively correlated with the risk of sudden cardiac death. While fibroblast activation in HCM has been noted in previous studies, the underlying regulatory mechanisms have not been thoroughly explored. In this study, we analyzed the latest single-nucleus sequencing (snRNA-seq) datasets related to HCM caused by the two most common mutations. We also examined the largest existing snRNA-seq and spatial transcriptomics datasets of HCM for external validation. Additionally, we conducted preliminary histopathological and molecular biology experiments to validate our findings and explore potential mechanisms. Our analysis revealed a phenotypic transformation of macrophages in both cases of HCM. These pro-inflammatory macrophages, driven by the high expression of ENPP2, mediated intercellular interactions that influenced fibroblast activation. The resulting increase in lysophosphatidic acid appeared to act as a plausible intermediary. Activated fibroblasts secreted substantial amounts of COL14A1, which is a critical component of myocardial fibrosis. These findings were consistent across different genetic backgrounds, suggesting their universal applicability in most HCM cases. Our study provides valuable insights into the mechanisms underlying myocardial fibrosis in HCM, highlighting the role of macrophage transformation and fibroblast activation. These findings offer potential for the identification of novel diagnostic or prognostic biomarkers and the development of targeted therapies with clinical translational potential.
{"title":"Cross-dataset transcriptomic analyses identify a conserved ENPP2+ macrophage-fibroblast activation axis in hypertrophic cardiomyopathy.","authors":"Fanyi Huang, Mi Zhou, Yanjia Chen, Sha Hua, Yanxin Han, Yingze Fan, Qingchuan Li, Zhuoyan Sun, Ke Yang, Qiang Zhao, Wei Jin","doi":"10.1093/bib/bbag036","DOIUrl":"10.1093/bib/bbag036","url":null,"abstract":"<p><p>Hypertrophic cardiomyopathy (HCM) is a condition where approximately 65% of patients exhibit myocardial fibrosis, indicated by late gadolinium enhancement, with the severity and extent of fibrosis being positively correlated with the risk of sudden cardiac death. While fibroblast activation in HCM has been noted in previous studies, the underlying regulatory mechanisms have not been thoroughly explored. In this study, we analyzed the latest single-nucleus sequencing (snRNA-seq) datasets related to HCM caused by the two most common mutations. We also examined the largest existing snRNA-seq and spatial transcriptomics datasets of HCM for external validation. Additionally, we conducted preliminary histopathological and molecular biology experiments to validate our findings and explore potential mechanisms. Our analysis revealed a phenotypic transformation of macrophages in both cases of HCM. These pro-inflammatory macrophages, driven by the high expression of ENPP2, mediated intercellular interactions that influenced fibroblast activation. The resulting increase in lysophosphatidic acid appeared to act as a plausible intermediary. Activated fibroblasts secreted substantial amounts of COL14A1, which is a critical component of myocardial fibrosis. These findings were consistent across different genetic backgrounds, suggesting their universal applicability in most HCM cases. Our study provides valuable insights into the mechanisms underlying myocardial fibrosis in HCM, highlighting the role of macrophage transformation and fibroblast activation. These findings offer potential for the identification of novel diagnostic or prognostic biomarkers and the development of targeted therapies with clinical translational potential.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146123805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Epistasis detection is hindered by multiple challenges, including the proliferation of analytic tools and the diverse methodological choices made in Genome-Wide Association Interaction Studies (GWAIS). These factors often produce inconsistent and only partially overlapping results, with individual methods emphasizing distinct aspects of epistasis. Although comparative evaluations of GWAIS approaches exist, they generally do not identify the factors responsible for methodological discrepancies or assess their implications for biomedical research. Consequently, it remains unclear which features of GWAIS strategies contribute most to these differences and which methods are most appropriate for revealing specific genetic architectures. Here, we present a workflow designed to characterize heterogeneity in GWAIS results and derive practical recommendations systematically. First, we assess non-replicability by comparing single nucleotide polymorphisms-pair rankings and Statistical Epistasis Networks (SENs)-graphs in which nodes represent genetic loci and edges denote epistatic interactions-to identify clusters of protocols with similar outcomes. SENs provide a structured framework for visualizing and comparing variation in epistasis detection, enabling prioritization of interactions recurrently identified across methods. Second, we propose strategies to reduce heterogeneity and enhance robustness, with particular emphasis on interpretability. Notably, we demonstrate that differences among SENs can be informative rather than disadvantageous, as they yield complementary perspectives on disease genetics. Finally, we highlight the benefits of informed SEN aggregation, showing how this approach can strengthen the utility of GWAIS for elucidating biological mechanisms relevant to disease prevention, diagnosis, and management.
{"title":"Turning heterogeneity of statistical epistasis networks to an advantage.","authors":"Diane Duroux, Federico Melograna, Héctor Climente-González, Bowen Fan, Andrew Walakira, Edoardo Efrem Gervasoni, Zuqi Li, Damian Roqueiro, Fabio Stella, Kristel Van Steen","doi":"10.1093/bib/bbaf699","DOIUrl":"10.1093/bib/bbaf699","url":null,"abstract":"<p><p>Epistasis detection is hindered by multiple challenges, including the proliferation of analytic tools and the diverse methodological choices made in Genome-Wide Association Interaction Studies (GWAIS). These factors often produce inconsistent and only partially overlapping results, with individual methods emphasizing distinct aspects of epistasis. Although comparative evaluations of GWAIS approaches exist, they generally do not identify the factors responsible for methodological discrepancies or assess their implications for biomedical research. Consequently, it remains unclear which features of GWAIS strategies contribute most to these differences and which methods are most appropriate for revealing specific genetic architectures. Here, we present a workflow designed to characterize heterogeneity in GWAIS results and derive practical recommendations systematically. First, we assess non-replicability by comparing single nucleotide polymorphisms-pair rankings and Statistical Epistasis Networks (SENs)-graphs in which nodes represent genetic loci and edges denote epistatic interactions-to identify clusters of protocols with similar outcomes. SENs provide a structured framework for visualizing and comparing variation in epistasis detection, enabling prioritization of interactions recurrently identified across methods. Second, we propose strategies to reduce heterogeneity and enhance robustness, with particular emphasis on interpretability. Notably, we demonstrate that differences among SENs can be informative rather than disadvantageous, as they yield complementary perspectives on disease genetics. Finally, we highlight the benefits of informed SEN aggregation, showing how this approach can strengthen the utility of GWAIS for elucidating biological mechanisms relevant to disease prevention, diagnosis, and management.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}