Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag024
Soo Bin Kwon, Jason Ernst
Motivation: Identifying pairwise associations between genomic loci is an important challenge for which large and diverse collections of epigenomic and transcription factor (TF) binding data can potentially be informative.
Results: We developed Learning Evidence of Pairwise Association from Epigenomic and TF binding data (LEPAE). LEPAE uses neural networks to quantify evidence of association for pairs of genomic windows from large-scale epigenomic and TF binding data along with distance information. We applied LEPAE using thousands of human datasets. We show using additional data that LEPAE captures biologically meaningful pairwise relationships between genomic loci, and we expect LEPAE scores to be a resource.
Availability and implementation: The LEPAE scores and the software are available at https://github.com/ernstlab/LEPAE.
{"title":"Learning a pairwise epigenomic and transcription factor binding association score across the human genome.","authors":"Soo Bin Kwon, Jason Ernst","doi":"10.1093/bioinformatics/btag024","DOIUrl":"10.1093/bioinformatics/btag024","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying pairwise associations between genomic loci is an important challenge for which large and diverse collections of epigenomic and transcription factor (TF) binding data can potentially be informative.</p><p><strong>Results: </strong>We developed Learning Evidence of Pairwise Association from Epigenomic and TF binding data (LEPAE). LEPAE uses neural networks to quantify evidence of association for pairs of genomic windows from large-scale epigenomic and TF binding data along with distance information. We applied LEPAE using thousands of human datasets. We show using additional data that LEPAE captures biologically meaningful pairwise relationships between genomic loci, and we expect LEPAE scores to be a resource.</p><p><strong>Availability and implementation: </strong>The LEPAE scores and the software are available at https://github.com/ernstlab/LEPAE.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12910503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btaf679
Nguyen Khoa Tran, My Ky Huynh, Alexander D Kotman, Martin Jürgens, Thomas Kurz, Sascha Dietrich, Gunnar W Klau, Nan Qin
Motivation: Live-cell imaging-based drug screening increases the likelihood of identifying effective and safe drugs by providing dynamic, high-content, and physiologically relevant data. As a result, it improves the success rate of drug development and facilitates the translation of benchside discoveries to bedside applications. Despite these advantages, no comprehensive metrics currently exist to evaluate dose-time-dependent drug responses. To address this gap, we established a systematic framework to assess drug effects across a range of concentrations and exposure durations simultaneously. This metric enables more accurate evaluation of drug responses measured by live-cell imaging.
Results: We employed treatment concentrations ranging from 0 to 10 μM and performed live-cell imaging-based measurements over a 120-h incubation period. To analyze the experimental data, we developed VUScope, a new mathematical model combining the 4-parameter logistic curve and a logistic function to characterize dose-time-dependent responses. This enabled us to calculate the Growth Rate Inhibition Volume Under the dose-time-response Surface (GRIVUS), which serves as a critical metric for assessing dynamic drug responses. Furthermore, our mathematical model allowed us to predict long-term treatment responses based on short-term drug responses. We validated the predictive capabilities of our model using independent datasets and observed that VUScope enhances prediction accuracy and offers deeper insights into drug effects than previously possible. By integrating VUScope into high-throughput drug screening platforms, we can further improve the efficacy of drug development and treatment selection.
Availability and implementation: We have made VUScope more accessible to users conducting pharmacological studies by uploading a detailed description, example datasets, and the source code to vuscope.albi.hhu.de, https://github.com/AlBi-HHU/VUScope, and https://doi.org/10.5281/zenodo.17610533.
{"title":"VUScope: a mathematical model for evaluating image-based drug response measurements and predicting long-term incubation outcomes.","authors":"Nguyen Khoa Tran, My Ky Huynh, Alexander D Kotman, Martin Jürgens, Thomas Kurz, Sascha Dietrich, Gunnar W Klau, Nan Qin","doi":"10.1093/bioinformatics/btaf679","DOIUrl":"10.1093/bioinformatics/btaf679","url":null,"abstract":"<p><strong>Motivation: </strong>Live-cell imaging-based drug screening increases the likelihood of identifying effective and safe drugs by providing dynamic, high-content, and physiologically relevant data. As a result, it improves the success rate of drug development and facilitates the translation of benchside discoveries to bedside applications. Despite these advantages, no comprehensive metrics currently exist to evaluate dose-time-dependent drug responses. To address this gap, we established a systematic framework to assess drug effects across a range of concentrations and exposure durations simultaneously. This metric enables more accurate evaluation of drug responses measured by live-cell imaging.</p><p><strong>Results: </strong>We employed treatment concentrations ranging from 0 to 10 μM and performed live-cell imaging-based measurements over a 120-h incubation period. To analyze the experimental data, we developed VUScope, a new mathematical model combining the 4-parameter logistic curve and a logistic function to characterize dose-time-dependent responses. This enabled us to calculate the Growth Rate Inhibition Volume Under the dose-time-response Surface (GRIVUS), which serves as a critical metric for assessing dynamic drug responses. Furthermore, our mathematical model allowed us to predict long-term treatment responses based on short-term drug responses. We validated the predictive capabilities of our model using independent datasets and observed that VUScope enhances prediction accuracy and offers deeper insights into drug effects than previously possible. By integrating VUScope into high-throughput drug screening platforms, we can further improve the efficacy of drug development and treatment selection.</p><p><strong>Availability and implementation: </strong>We have made VUScope more accessible to users conducting pharmacological studies by uploading a detailed description, example datasets, and the source code to vuscope.albi.hhu.de, https://github.com/AlBi-HHU/VUScope, and https://doi.org/10.5281/zenodo.17610533.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12904834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag027
Jakob Agamia, Martin Zacharias
Motivation: The rational design of chemical compounds that bind to a desired protein target molecule is a major goal of drug discovery. Most current molecular docking but also fragment-based buildup or machine learning-based generative drug design approaches employ a rigid protein target structure.
Results: Based on recent progress in predicting protein structures and complexes with chemical compounds, we have designed an approach, AI-MCLig, to optimize a chemical compound bound to a fully flexible and conformationally adaptable protein binding region. During a Monte Carlo (MC)-type simulation to randomly change a chemical compound, the target protein-compound complex is completely rebuilt at every MC step using the Chai-1 protein structure prediction program. Besides compound flexibility it allows the protein to adapt to the chemically changing compound. MC protocols based on atom-/bond-type changes or based on combining larger chemical fragments have been tested. Simulations on four test targets resulted in potential ligands that show very good binding scores comparable to experimentally known binders using several different scoring schemes. The MC-based compound design approach is complementary to existing approaches and could help for the rapid design of putative binders including induced fit of the protein target.
Availability and implementation: Datasets, examples, and source code are available on our public GitHub repository https://github.com/JakobAgamia/AI-MCLig and on Zenodo at https://doi.org/10.5281/zenodo.17800140.
{"title":"De novo protein-ligand design including protein flexibility and conformational adaptation.","authors":"Jakob Agamia, Martin Zacharias","doi":"10.1093/bioinformatics/btag027","DOIUrl":"10.1093/bioinformatics/btag027","url":null,"abstract":"<p><strong>Motivation: </strong>The rational design of chemical compounds that bind to a desired protein target molecule is a major goal of drug discovery. Most current molecular docking but also fragment-based buildup or machine learning-based generative drug design approaches employ a rigid protein target structure.</p><p><strong>Results: </strong>Based on recent progress in predicting protein structures and complexes with chemical compounds, we have designed an approach, AI-MCLig, to optimize a chemical compound bound to a fully flexible and conformationally adaptable protein binding region. During a Monte Carlo (MC)-type simulation to randomly change a chemical compound, the target protein-compound complex is completely rebuilt at every MC step using the Chai-1 protein structure prediction program. Besides compound flexibility it allows the protein to adapt to the chemically changing compound. MC protocols based on atom-/bond-type changes or based on combining larger chemical fragments have been tested. Simulations on four test targets resulted in potential ligands that show very good binding scores comparable to experimentally known binders using several different scoring schemes. The MC-based compound design approach is complementary to existing approaches and could help for the rapid design of putative binders including induced fit of the protein target.</p><p><strong>Availability and implementation: </strong>Datasets, examples, and source code are available on our public GitHub repository https://github.com/JakobAgamia/AI-MCLig and on Zenodo at https://doi.org/10.5281/zenodo.17800140.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12944823/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btaf667
Erick Armingol, Reid O Larsen, Lia Gale, Martin Cequeira, Hratch M Baghdassarian, Nathan E Lewis
Summary: Cell-cell communication dynamically changes across time while involving diverse cell populations and ligand types such as proteins and metabolites. Single-cell transcriptomics enables its inference, but existing tools typically analyze ligand types separately and overlook their coordinated activity. Here, we present Tensor-cell2cell v2, a computational tool that can jointly analyze protein- and metabolite-mediated communication over time using coupled tensor component analysis, while preserving each modality of inferred communication scores independently, as well as their data structures and distributions. Applied to brain organoid development, Tensor-cell2cell v2 uncovers dynamic, coordinated communication programs involving key proteins and metabolites across relevant cell types and specific time points.
Availability and implementation: Tensor-cell2cell v2 and its new coupled tensor component analysis are implemented in Python and available as part of the cell2cell framework at https://github.com/earmingol/cell2cell. This python library is available on PyPI. Code for the analyses of this manuscript can be found in a Code Ocean capsule at https://doi.org/10.24433/CO.0061424.v3, where analyses can be also run and reproduced online. Tutorials can be found at https://cell2cell.readthedocs.io.
{"title":"Tensor-cell2cell v2 unravels coordinated dynamics of protein- and metabolite-mediated cell-cell communication.","authors":"Erick Armingol, Reid O Larsen, Lia Gale, Martin Cequeira, Hratch M Baghdassarian, Nathan E Lewis","doi":"10.1093/bioinformatics/btaf667","DOIUrl":"10.1093/bioinformatics/btaf667","url":null,"abstract":"<p><strong>Summary: </strong>Cell-cell communication dynamically changes across time while involving diverse cell populations and ligand types such as proteins and metabolites. Single-cell transcriptomics enables its inference, but existing tools typically analyze ligand types separately and overlook their coordinated activity. Here, we present Tensor-cell2cell v2, a computational tool that can jointly analyze protein- and metabolite-mediated communication over time using coupled tensor component analysis, while preserving each modality of inferred communication scores independently, as well as their data structures and distributions. Applied to brain organoid development, Tensor-cell2cell v2 uncovers dynamic, coordinated communication programs involving key proteins and metabolites across relevant cell types and specific time points.</p><p><strong>Availability and implementation: </strong>Tensor-cell2cell v2 and its new coupled tensor component analysis are implemented in Python and available as part of the cell2cell framework at https://github.com/earmingol/cell2cell. This python library is available on PyPI. Code for the analyses of this manuscript can be found in a Code Ocean capsule at https://doi.org/10.24433/CO.0061424.v3, where analyses can be also run and reproduced online. Tutorials can be found at https://cell2cell.readthedocs.io.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12937581/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146260370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag031
Favour James, Dexter Pratt, Christopher Churas, Augustin Luna
Motivation: Knowledge graphs (KGs) are powerful tools for structuring and analyzing biological information due to their ability to represent data and improve queries across heterogeneous datasets. However, constructing KGs from unstructured literature remains challenging due to the cost and expertise required for manual curation. Prior works have explored text-mining techniques to automate this process, but have limitations that impact their ability to capture complex relationships fully. Traditional text-mining methods struggle with understanding context across sentences. Additionally, these methods lack expert-level background knowledge, making it difficult to infer relationships that require awareness of concepts indirectly described in the text. Large Language Models (LLMs) present an opportunity to overcome these challenges. LLMs are trained on diverse literature, equipping them with contextual knowledge that enables more accurate information extraction.
Results: We present textToKnowledgeGraph, an artificial intelligence tool using LLMs to extract interactions from individual publications directly in Biological Expression Language (BEL). BEL was chosen for its compact, detailed representation of biological relationships, enabling structured, computationally accessible encoding. This work makes several contributions. (i) Development of the open-source Python textToKnowledgeGraph package (pypi.org/project/texttoknowledgegraph) for BEL extraction from scientific articles, usable from the command line and within other projects, (ii) an interactive application within Cytoscape Web to simplify extraction and exploration, (iii) a dataset of extractions that have been both computationally and manually reviewed to support future fine-tuning efforts.
Availability and implementation: https://github.com/ndexbio/llm-text-to-knowledge-graph.
{"title":"textToKnowledgeGraph: generation of molecular interaction knowledge graphs using large language models for exploration in Cytoscape.","authors":"Favour James, Dexter Pratt, Christopher Churas, Augustin Luna","doi":"10.1093/bioinformatics/btag031","DOIUrl":"10.1093/bioinformatics/btag031","url":null,"abstract":"<p><strong>Motivation: </strong>Knowledge graphs (KGs) are powerful tools for structuring and analyzing biological information due to their ability to represent data and improve queries across heterogeneous datasets. However, constructing KGs from unstructured literature remains challenging due to the cost and expertise required for manual curation. Prior works have explored text-mining techniques to automate this process, but have limitations that impact their ability to capture complex relationships fully. Traditional text-mining methods struggle with understanding context across sentences. Additionally, these methods lack expert-level background knowledge, making it difficult to infer relationships that require awareness of concepts indirectly described in the text. Large Language Models (LLMs) present an opportunity to overcome these challenges. LLMs are trained on diverse literature, equipping them with contextual knowledge that enables more accurate information extraction.</p><p><strong>Results: </strong>We present textToKnowledgeGraph, an artificial intelligence tool using LLMs to extract interactions from individual publications directly in Biological Expression Language (BEL). BEL was chosen for its compact, detailed representation of biological relationships, enabling structured, computationally accessible encoding. This work makes several contributions. (i) Development of the open-source Python textToKnowledgeGraph package (pypi.org/project/texttoknowledgegraph) for BEL extraction from scientific articles, usable from the command line and within other projects, (ii) an interactive application within Cytoscape Web to simplify extraction and exploration, (iii) a dataset of extractions that have been both computationally and manually reviewed to support future fine-tuning efforts.</p><p><strong>Availability and implementation: </strong>https://github.com/ndexbio/llm-text-to-knowledge-graph.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12916169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146004727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag049
Chaojie Wang, Xin Yu
Motivation: Capturing spatial structure is fundamental to the analysis of spatial transcriptomics data. However, most existing methods focus on clustering within individual tissue slices and often ignore the high inter-slice similarity inherent in multi-slice datasets.
Results: To address this limitation, we propose STransfer, a novel transfer learning framework that combines graph convolutional networks (GCNs) with positive pointwise mutual information (PPMI) to model both local and global spatial dependencies. An attention-based module is introduced to fuse features from multiple graphs into unified node representations, facilitating the learning of low-dimensional embeddings that jointly encode gene expression and spatial context. By transferring knowledge from labeled slices to adjacent unlabeled ones, STransfer significantly enhances clustering accuracy while reducing manual annotation costs. Extensive experiments demonstrate that STransfer consistently outperforms state-of-the-art methods in both spatial modeling and cross-slice transfer performance.
Availability and implementation: The code for STransfer has been uploaded to GitHub: https://github.com/Saki-JSU/Publications/tree/main/STransfer.
{"title":"STransfer: a transfer learning-enhanced graph convolutional network for clustering spatial transcriptomics data.","authors":"Chaojie Wang, Xin Yu","doi":"10.1093/bioinformatics/btag049","DOIUrl":"10.1093/bioinformatics/btag049","url":null,"abstract":"<p><strong>Motivation: </strong>Capturing spatial structure is fundamental to the analysis of spatial transcriptomics data. However, most existing methods focus on clustering within individual tissue slices and often ignore the high inter-slice similarity inherent in multi-slice datasets.</p><p><strong>Results: </strong>To address this limitation, we propose STransfer, a novel transfer learning framework that combines graph convolutional networks (GCNs) with positive pointwise mutual information (PPMI) to model both local and global spatial dependencies. An attention-based module is introduced to fuse features from multiple graphs into unified node representations, facilitating the learning of low-dimensional embeddings that jointly encode gene expression and spatial context. By transferring knowledge from labeled slices to adjacent unlabeled ones, STransfer significantly enhances clustering accuracy while reducing manual annotation costs. Extensive experiments demonstrate that STransfer consistently outperforms state-of-the-art methods in both spatial modeling and cross-slice transfer performance.</p><p><strong>Availability and implementation: </strong>The code for STransfer has been uploaded to GitHub: https://github.com/Saki-JSU/Publications/tree/main/STransfer.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12900540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btaf682
Yan Liu, Sheng Guan, He Yan, Long-Chen Shen, Yiheng Zhu, Ji-Peng Qiang, Guo Wei
Motivation: Accurate cell type annotation is essential in scATAC-seq analysis, as it underpins the characterization of cellular heterogeneity, the identification of regulatory elements, and downstream biological discovery. However, current annotation methods still face major challenges. First, although some approaches attempt to integrate genomic sequence information, they typically rely on shallow sequence representations and thus fail to capture the long-range dependencies and regulatory signals encoded in DNA. Second, substantial batch effects introduced by different platforms, sequencing batches, or tissue sources remain insufficiently addressed. Existing models often lack robust distribution alignment and domain generalization capabilities, leading to confounding non-biological variation and reduced annotation accuracy across datasets.
Results: To overcome these limitations, we propose seqAlignATAC, a two-stage intra-modality annotation framework that integrates sequence-derived embeddings with domain adaptation. In the first stage, we employ a large-scale pretrained nucleotide language model to extract low-dimensional, biologically informative representations from the genomic sequences of chromatin-accessible peaks. In the second stage, these embeddings are fed into a supervised neural network equipped with an adaptive alignment module to mitigate batch effects and harmonize feature distributions between labeled reference and unlabeled target datasets. Extensive experiments across multiple settings demonstrate that seqAlignATAC achieves competitive accuracy and robustness, effectively leveraging genome-level information while alleviating batch-induced distributional discrepancies.
Availability and implementation: The source code of seqAlignATAC is available at: https://github.com/BioCS-Lab/seqAlignATAC.
{"title":"Genome- and peak-informed two-stage framework for scATAC-seq cell type identification.","authors":"Yan Liu, Sheng Guan, He Yan, Long-Chen Shen, Yiheng Zhu, Ji-Peng Qiang, Guo Wei","doi":"10.1093/bioinformatics/btaf682","DOIUrl":"10.1093/bioinformatics/btaf682","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate cell type annotation is essential in scATAC-seq analysis, as it underpins the characterization of cellular heterogeneity, the identification of regulatory elements, and downstream biological discovery. However, current annotation methods still face major challenges. First, although some approaches attempt to integrate genomic sequence information, they typically rely on shallow sequence representations and thus fail to capture the long-range dependencies and regulatory signals encoded in DNA. Second, substantial batch effects introduced by different platforms, sequencing batches, or tissue sources remain insufficiently addressed. Existing models often lack robust distribution alignment and domain generalization capabilities, leading to confounding non-biological variation and reduced annotation accuracy across datasets.</p><p><strong>Results: </strong>To overcome these limitations, we propose seqAlignATAC, a two-stage intra-modality annotation framework that integrates sequence-derived embeddings with domain adaptation. In the first stage, we employ a large-scale pretrained nucleotide language model to extract low-dimensional, biologically informative representations from the genomic sequences of chromatin-accessible peaks. In the second stage, these embeddings are fed into a supervised neural network equipped with an adaptive alignment module to mitigate batch effects and harmonize feature distributions between labeled reference and unlabeled target datasets. Extensive experiments across multiple settings demonstrate that seqAlignATAC achieves competitive accuracy and robustness, effectively leveraging genome-level information while alleviating batch-induced distributional discrepancies.</p><p><strong>Availability and implementation: </strong>The source code of seqAlignATAC is available at: https://github.com/BioCS-Lab/seqAlignATAC.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12930843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145847054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btaf672
Yajing Hao, Tal Kafri, Fei Zou
Motivation: High-dimensional sequencing data, such as RNA-Seq for gene expression and ATAC-Seq for chromatin accessibility, are widely used in studying systems biology. Accessible chromatin allows transcription factors and regulatory elements to bind to DNA, thereby regulating transcription through the activation or repression of target genes. The association analysis of RNA-Seq and ATAC-Seq data provides insights into gene regulatory mechanisms. Most existing analytic tools exclusively focus on cis-associations, despite regulatory elements being able to physically interact with distant target genes. Furthermore, conventional approaches often utilize Pearson or Spearman correlations, which ignore the count-based nature of RNA-Seq data.
Results: To address these limitations, we introduce PETScan, a computationally efficient genome-wide PEak-Transcript Score-based association analysis, utilizing negative binomial models to better accommodate RNA-Seq data. We leverage score tests and matrix calculations for improved computational efficiency, and combine an empirical permutation method with genomic control to ensure valid p-value calculations in studies with limited sample sizes. In real-world datasets, PETScan achieved three orders of magnitude faster than Wald tests, while identifying similar significant gene-peak pairs.
Availability: The PETScan R package is available on GitHub at https://github.com/yajing-hao/PETScan.
{"title":"PETScan: score-based genome-wide association analysis of RNA-Seq and ATAC-Seq data.","authors":"Yajing Hao, Tal Kafri, Fei Zou","doi":"10.1093/bioinformatics/btaf672","DOIUrl":"10.1093/bioinformatics/btaf672","url":null,"abstract":"<p><strong>Motivation: </strong>High-dimensional sequencing data, such as RNA-Seq for gene expression and ATAC-Seq for chromatin accessibility, are widely used in studying systems biology. Accessible chromatin allows transcription factors and regulatory elements to bind to DNA, thereby regulating transcription through the activation or repression of target genes. The association analysis of RNA-Seq and ATAC-Seq data provides insights into gene regulatory mechanisms. Most existing analytic tools exclusively focus on cis-associations, despite regulatory elements being able to physically interact with distant target genes. Furthermore, conventional approaches often utilize Pearson or Spearman correlations, which ignore the count-based nature of RNA-Seq data.</p><p><strong>Results: </strong>To address these limitations, we introduce PETScan, a computationally efficient genome-wide PEak-Transcript Score-based association analysis, utilizing negative binomial models to better accommodate RNA-Seq data. We leverage score tests and matrix calculations for improved computational efficiency, and combine an empirical permutation method with genomic control to ensure valid p-value calculations in studies with limited sample sizes. In real-world datasets, PETScan achieved three orders of magnitude faster than Wald tests, while identifying similar significant gene-peak pairs.</p><p><strong>Availability: </strong>The PETScan R package is available on GitHub at https://github.com/yajing-hao/PETScan.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12930850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146260343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-03DOI: 10.1093/bioinformatics/btag052
David Köhler, Niklas Kleinenkuhnen, Kiarash Rastegar, Till Baar, Chrysa Nikopoulou, Vangelis Kondylis, Vlada Milchevskaya, Matthias Schmid, Peter Tessarz, Achim Tresch
Motivation: We introduce a statistical approach for pattern recognition in multivariate spatial transcriptomics data.
Results: Our algorithm constructs a projection of the data onto a low-dimensional feature space which is optimal in maximizing Moran's I, a measure of spatial dependency. This projection mitigates non-spatial variation and outperforms principal components analysis for pre-processing. Patterns of spatially variable genes are well represented in this feature space, and their projection can be shown to be a denoising operation. Our framework does not require any parameter tuning, and it furthermore gives rise to a calibrated, powerful test of spatial gene expression.
Availability and implementation: The algorithm is implemented in the open source software R and is available at https://github.com/IMSBCompBio/SpaCo.
{"title":"A spectral dimension reduction technique that improves pattern detection in multivariate spatial data.","authors":"David Köhler, Niklas Kleinenkuhnen, Kiarash Rastegar, Till Baar, Chrysa Nikopoulou, Vangelis Kondylis, Vlada Milchevskaya, Matthias Schmid, Peter Tessarz, Achim Tresch","doi":"10.1093/bioinformatics/btag052","DOIUrl":"10.1093/bioinformatics/btag052","url":null,"abstract":"<p><strong>Motivation: </strong>We introduce a statistical approach for pattern recognition in multivariate spatial transcriptomics data.</p><p><strong>Results: </strong>Our algorithm constructs a projection of the data onto a low-dimensional feature space which is optimal in maximizing Moran's I, a measure of spatial dependency. This projection mitigates non-spatial variation and outperforms principal components analysis for pre-processing. Patterns of spatially variable genes are well represented in this feature space, and their projection can be shown to be a denoising operation. Our framework does not require any parameter tuning, and it furthermore gives rise to a calibrated, powerful test of spatial gene expression.</p><p><strong>Availability and implementation: </strong>The algorithm is implemented in the open source software R and is available at https://github.com/IMSBCompBio/SpaCo.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12925250/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146097705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Dendritic spines, postsynaptic structures characterized by their complex shapes, provide the essential structural foundation for synaptic function. Their shape is dynamic, undergoing alterations in various conditions, notably during neurodegenerative disorders like Alzheimer's disease. The dramatically increasing prevalence of such diseases highlights an urgent need for effective treatments. A key strategy in developing these treatments involves evaluating how dendritic spine morphology responds to potential therapeutic compounds. Although a link between spine shape and function is recognized, its precise nature is still not fully elucidated. Consequently, advancing our understanding of dendritic spines in both health and disease necessitates the urgent development of more effective methods for assessing their morphology.
Results: This study introduces qualitatively new 3D dendritic shape descriptors based on spherical harmonics and Zernike moments and proposes a bases on them clustering approach for grouping dendritic spines with similar shapes applied to 3D polygonal spines meshes acquired from Z-stack dendrite images. By integrating these methods, we achieve improved differentiation between normal and pathological spines represented by the Alzheimer's disease in vitro model, offering a more precise representation of morphological diversity. Additionally, the proposed spherical harmonics approach enables dendritic spine reconstruction from vector-based shape representations, providing a novel tool for studying structural changes associated with neurodegeneration and possibilities for synthetic dendritic spines dataset generation.
Availability and implementation: The software used for experiments is public and available at https://github.com/Biomed-imaging-lab/SpineTool with the DOI: 10.5281/zenodo.17359066. Descriptors codebase is available at https://github.com/Biomed-imaging-lab/Spine-Shape-Descriptors with the DOI: 10.5281/zenodo.17302859.
{"title":"3D dendritic spines shape descriptors for efficient classification and morphology analysis in control and Alzheimer's disease modeling neurons.","authors":"Daria Smirnova, Anita Ustinova, Viacheslav Chukanov, Ekaterina Pchitskaya","doi":"10.1093/bioinformatics/btag025","DOIUrl":"10.1093/bioinformatics/btag025","url":null,"abstract":"<p><strong>Motivation: </strong>Dendritic spines, postsynaptic structures characterized by their complex shapes, provide the essential structural foundation for synaptic function. Their shape is dynamic, undergoing alterations in various conditions, notably during neurodegenerative disorders like Alzheimer's disease. The dramatically increasing prevalence of such diseases highlights an urgent need for effective treatments. A key strategy in developing these treatments involves evaluating how dendritic spine morphology responds to potential therapeutic compounds. Although a link between spine shape and function is recognized, its precise nature is still not fully elucidated. Consequently, advancing our understanding of dendritic spines in both health and disease necessitates the urgent development of more effective methods for assessing their morphology.</p><p><strong>Results: </strong>This study introduces qualitatively new 3D dendritic shape descriptors based on spherical harmonics and Zernike moments and proposes a bases on them clustering approach for grouping dendritic spines with similar shapes applied to 3D polygonal spines meshes acquired from Z-stack dendrite images. By integrating these methods, we achieve improved differentiation between normal and pathological spines represented by the Alzheimer's disease in vitro model, offering a more precise representation of morphological diversity. Additionally, the proposed spherical harmonics approach enables dendritic spine reconstruction from vector-based shape representations, providing a novel tool for studying structural changes associated with neurodegeneration and possibilities for synthetic dendritic spines dataset generation.</p><p><strong>Availability and implementation: </strong>The software used for experiments is public and available at https://github.com/Biomed-imaging-lab/SpineTool with the DOI: 10.5281/zenodo.17359066. Descriptors codebase is available at https://github.com/Biomed-imaging-lab/Spine-Shape-Descriptors with the DOI: 10.5281/zenodo.17302859.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12891915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}