Pub Date : 2026-03-20eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag036
Chunlei Wu, Hongfang Liu, Jason Flannick, Mark A Musen, Andrew I Su, Lawrence E Hunter, Thomas M Powers, Cathy H Wu
Motivation: Knowledge graphs (KGs), collectively as a knowledge network, have become critical tools for knowledge discovery in computable and explainable knowledge systems. Due to the semantic and structural complexities of biomedical data, these KGs need to enable dynamic reasoning over large evolving graphs and support fit-for-purpose abstraction. Crucially, this requires establishing standards, preserving provenance and enforcing policy constraints for actionable discovery.
Results: A recent meeting of leading scientists discussed the opportunities, challenges, and future directions of a biomedical knowledge network. Here we present six desiderata inspired by the meeting: (i) inference and reasoning in biomedical KGs need domain-centric approaches, (ii) harmonized and accessible standards are required for knowledge graph representation and metadata, (iii) robust validation of biomedical KGs needs multilayered, context-aware approaches that are both rigorous and scalable, (iv) the evolving and synergistic relationship between KGs and large language models is essential in empowering AI-driven biomedical discovery, (v) integrated development environments, public repositories, and governance frameworks are essential for secure and reproducible knowledge graph sharing, and (vi) robust validation, provenance, and ethical governance are critical for trustworthy biomedical KGs. Addressing these key issues will be essential to realize the promises of a biomedical knowledge network in advancing biomedicine.
{"title":"Desiderata for a biomedical knowledge network: opportunities, challenges and future directions.","authors":"Chunlei Wu, Hongfang Liu, Jason Flannick, Mark A Musen, Andrew I Su, Lawrence E Hunter, Thomas M Powers, Cathy H Wu","doi":"10.1093/bioadv/vbag036","DOIUrl":"https://doi.org/10.1093/bioadv/vbag036","url":null,"abstract":"<p><strong>Motivation: </strong>Knowledge graphs (KGs), collectively as a knowledge network, have become critical tools for knowledge discovery in computable and explainable knowledge systems. Due to the semantic and structural complexities of biomedical data, these KGs need to enable dynamic reasoning over large evolving graphs and support fit-for-purpose abstraction. Crucially, this requires establishing standards, preserving provenance and enforcing policy constraints for actionable discovery.</p><p><strong>Results: </strong>A recent meeting of leading scientists discussed the opportunities, challenges, and future directions of a biomedical knowledge network. Here we present six desiderata inspired by the meeting: (i) inference and reasoning in biomedical KGs need domain-centric approaches, (ii) harmonized and accessible standards are required for knowledge graph representation and metadata, (iii) robust validation of biomedical KGs needs multilayered, context-aware approaches that are both rigorous and scalable, (iv) the evolving and synergistic relationship between KGs and large language models is essential in empowering AI-driven biomedical discovery, (v) integrated development environments, public repositories, and governance frameworks are essential for secure and reproducible knowledge graph sharing, and (vi) robust validation, provenance, and ethical governance are critical for trustworthy biomedical KGs. Addressing these key issues will be essential to realize the promises of a biomedical knowledge network in advancing biomedicine.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag036"},"PeriodicalIF":2.8,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13004217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-10eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag071
Chandana Tennakoon, Thibaut Freville, Tim Downing
Motivation: Constructing and studying pangenome variation graphs (PVGs) supports new insights into viral genomic diversity. This is because such pangenomes are less prone to reference bias, which affects mutation detection. Interpreting the information arising from this is challenging, so automating these processes to allow exploratory investigations for PVG optimisation is essential. Moreover, existing methods do not scale well to the smaller virus genome sizes and to facilitate analysis in laptop environments. To address this, we developed an easily deployable pipeline to facilitate the rapid creation of virus PVGs that applies a broad range of analyses to these PVGs.
Results: We present Panalyze, a computationally scalable virus PVG construction, analysis and annotation tool implemented in NextFlow and containerised in Docker. Panalyze uses NextFlow to efficiently complete tasks across multiple compute nodes and in diverse computing environments. Panalyze can also operate on a single thread on a standard laptop, and analyse sequence lengths of any size. We illustrate how Panalyze works and the valuable outputs it can generate using a range of common viral pathogens.
Availability and implementation: Panalyze is released under a MIT open-source license, available on GitHub with documentation accessible at https://github.com/downingtim/Panalyze/.
{"title":"Panalyze: automated virus pangenome variation graph construction, analysis and annotation.","authors":"Chandana Tennakoon, Thibaut Freville, Tim Downing","doi":"10.1093/bioadv/vbag071","DOIUrl":"https://doi.org/10.1093/bioadv/vbag071","url":null,"abstract":"<p><strong>Motivation: </strong>Constructing and studying pangenome variation graphs (PVGs) supports new insights into viral genomic diversity. This is because such pangenomes are less prone to reference bias, which affects mutation detection. Interpreting the information arising from this is challenging, so automating these processes to allow exploratory investigations for PVG optimisation is essential. Moreover, existing methods do not scale well to the smaller virus genome sizes and to facilitate analysis in laptop environments. To address this, we developed an easily deployable pipeline to facilitate the rapid creation of virus PVGs that applies a broad range of analyses to these PVGs.</p><p><strong>Results: </strong>We present Panalyze, a computationally scalable virus PVG construction, analysis and annotation tool implemented in NextFlow and containerised in Docker. Panalyze uses NextFlow to efficiently complete tasks across multiple compute nodes and in diverse computing environments. Panalyze can also operate on a single thread on a standard laptop, and analyse sequence lengths of any size. We illustrate how Panalyze works and the valuable outputs it can generate using a range of common viral pathogens.</p><p><strong>Availability and implementation: </strong>Panalyze is released under a MIT open-source license, available on GitHub with documentation accessible at https://github.com/downingtim/Panalyze/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag071"},"PeriodicalIF":2.8,"publicationDate":"2026-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13005692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-26eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag068
Medha Pandey, Anoosha Paruchuri, M Michael Gromiha
Motivation: Cancer is driven by genetic changes, known as mutations, that lead to the uncontrolled division of cells. The functional significance of a vast number of these cancer somatic mutations is unknown, and it is one of the major challenges in cancer research. In this study, we performed an integrative analysis of 30 tumor types from PAN-cancer mutation data collected from the COSMIC database. We have analyzed a set of 61 364 missense mutations (57 535 drivers and 3829 passengers) from 682 cancer-causing genes and derived various important features from amino acid sequences, predicted AlphaFold structures, and amino acid contact networks. We observed that the motif-based preference, neighboring residue information, residue depth, and disorder regions around the site of mutation are important for the discrimination of drivers and passengers.
Results: We further developed cancer-specific computational models to discriminate cancer-causing and passenger mutations using deep learning, and the integration of AlphaFold predicted structure information improved the pathogenicity prediction of mutations. Our method achieved an average classification accuracy of 84.06% with 10-fold cross-validation.
Availability and implementation: The prediction server is available at https://web.iitm.ac.in/bioinfo2/PANDriver/index.html. We envisage that the AI-based prediction models would be an important tool to identify driver mutations and could extend the scope of precision medicine for cancer.
{"title":"Classification of driver and passenger mutations in different cancer types using deep neural networks.","authors":"Medha Pandey, Anoosha Paruchuri, M Michael Gromiha","doi":"10.1093/bioadv/vbag068","DOIUrl":"https://doi.org/10.1093/bioadv/vbag068","url":null,"abstract":"<p><strong>Motivation: </strong>Cancer is driven by genetic changes, known as mutations, that lead to the uncontrolled division of cells. The functional significance of a vast number of these cancer somatic mutations is unknown, and it is one of the major challenges in cancer research. In this study, we performed an integrative analysis of 30 tumor types from PAN-cancer mutation data collected from the COSMIC database. We have analyzed a set of 61 364 missense mutations (57 535 drivers and 3829 passengers) from 682 cancer-causing genes and derived various important features from amino acid sequences, predicted AlphaFold structures, and amino acid contact networks. We observed that the motif-based preference, neighboring residue information, residue depth, and disorder regions around the site of mutation are important for the discrimination of drivers and passengers.</p><p><strong>Results: </strong>We further developed cancer-specific computational models to discriminate cancer-causing and passenger mutations using deep learning, and the integration of AlphaFold predicted structure information improved the pathogenicity prediction of mutations. Our method achieved an average classification accuracy of 84.06% with 10-fold cross-validation.</p><p><strong>Availability and implementation: </strong>The prediction server is available at https://web.iitm.ac.in/bioinfo2/PANDriver/index.html. We envisage that the AI-based prediction models would be an important tool to identify driver mutations and could extend the scope of precision medicine for cancer.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag068"},"PeriodicalIF":2.8,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12989160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-19eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag062
Lorenzo Spina, Nicholas P Howard, Stijn Vanderzande, Giorgio Tumino, Michela Troggio, Eric van de Weg, Diego Micheletti, Luca Bianco
Motivation: Genotyping datasets generated via the Thermo Fisher Axiom® array are generally big, as they comprise tens of thousands of markers and hundreds of individuals, and currently, no automatic data curation pipelines are available for this kind of data. This leaves researchers with only time-consuming manual analysis as the current standard for processing these complex genotyping datasets. There is a clear need for a more efficient, streamlined approach to handle the specific quality control challenges inherent in this platform.
Results: AxioSAFE (Axiom SNP Assessment and Filtering Engine) is a semi-automatic computer tool for the curation of single nucleotide polymorphism (SNP) genotyping datasets generated via Thermo Fisher Axiom® array experiments. AxioSAFE provides an alternative methodology to cover a set of data curation operations, including steps such as a ploidy check, SNP filtering, Mendelian error analysis, and phasing. AxioSAFE identifies major occurrences of problematic SNPs and samples, including those not caught by the Axiom array default QC filters. Further functionality is included to let the user review identified problematic SNP classes.
Availability and implementation: AxioSAFE is a Python program that can be either used via the command line interface or through a graphical user interface (GUI) and is provided as a Docker container available on DockerHub at https://hub.docker.com/r/lzspin/axiosafe, which includes all required libraries, software, and a tutorial dataset. The source code and documentation are available at https://bitbucket.org/lzspin/axiosafe/. The apple dataset used for the development of AxioSAFE is available at DOI: https://doi.org/10.5281/zenodo.18034024.
{"title":"AxioSAFE: an accessible, semi-automatic filtering tool for the curation of genotyping datasets.","authors":"Lorenzo Spina, Nicholas P Howard, Stijn Vanderzande, Giorgio Tumino, Michela Troggio, Eric van de Weg, Diego Micheletti, Luca Bianco","doi":"10.1093/bioadv/vbag062","DOIUrl":"10.1093/bioadv/vbag062","url":null,"abstract":"<p><strong>Motivation: </strong>Genotyping datasets generated via the Thermo Fisher Axiom<sup>®</sup> array are generally big, as they comprise tens of thousands of markers and hundreds of individuals, and currently, no automatic data curation pipelines are available for this kind of data. This leaves researchers with only time-consuming manual analysis as the current standard for processing these complex genotyping datasets. There is a clear need for a more efficient, streamlined approach to handle the specific quality control challenges inherent in this platform.</p><p><strong>Results: </strong>AxioSAFE (Axiom SNP Assessment and Filtering Engine) is a semi-automatic computer tool for the curation of single nucleotide polymorphism (SNP) genotyping datasets generated via Thermo Fisher Axiom<sup>®</sup> array experiments. AxioSAFE provides an alternative methodology to cover a set of data curation operations, including steps such as a ploidy check, SNP filtering, Mendelian error analysis, and phasing. AxioSAFE identifies major occurrences of problematic SNPs and samples, including those not caught by the Axiom array default QC filters. Further functionality is included to let the user review identified problematic SNP classes.</p><p><strong>Availability and implementation: </strong>AxioSAFE is a Python program that can be either used via the command line interface or through a graphical user interface (GUI) and is provided as a Docker container available on DockerHub at https://hub.docker.com/r/lzspin/axiosafe, which includes all required libraries, software, and a tutorial dataset. The source code and documentation are available at https://bitbucket.org/lzspin/axiosafe/. The apple dataset used for the development of AxioSAFE is available at DOI: https://doi.org/10.5281/zenodo.18034024.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag062"},"PeriodicalIF":2.8,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12967218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivations: Classifying hundreds of thousands of protein sequences by function remains a significant computational challenge. Building on the ProfileView method for identifying functional classes and subclasses, our goal is to achieve large-scale classification of proteins from extensive databases and ongoing high-throughput sequencing efforts, ultimately producing comprehensive sets of sequences that share the same function.
Results: By applying deep learning techniques, SPIN learns discriminative patterns in functionally related sequences, allowing the classification of hundreds of thousands of sequences into a defined number of functional classes. SPIN offers an effective compromise between small, family-specific protein language models (pLMs) and computational cost, with a time complexity linear in the number of sequences. It enables the identification of family-specific conserved residues, providing insight into the functional nuances of protein subclasses. By enhancing the scalability of protein function predictors, SPIN advances our understanding of protein functions and their evolutionary relationships.
Availability and implementation: The data and code that support the findings of this study are publicly available at https://gitlab.lcqb.upmc.fr/andrea.mancini/SPIN.
{"title":"Scaling the profile of life by function with SPIN.","authors":"Andrea Mancini, Vinh-Son Pho, Alessandro Bianchi, Gianluca Lombardi, Chujun Lyu, Alessandra Carbone","doi":"10.1093/bioadv/vbag064","DOIUrl":"https://doi.org/10.1093/bioadv/vbag064","url":null,"abstract":"<p><strong>Motivations: </strong>Classifying hundreds of thousands of protein sequences by function remains a significant computational challenge. Building on the ProfileView method for identifying functional classes and subclasses, our goal is to achieve large-scale classification of proteins from extensive databases and ongoing high-throughput sequencing efforts, ultimately producing comprehensive sets of sequences that share the same function.</p><p><strong>Results: </strong>By applying deep learning techniques, SPIN learns discriminative patterns in functionally related sequences, allowing the classification of hundreds of thousands of sequences into a defined number of functional classes. SPIN offers an effective compromise between small, family-specific protein language models (pLMs) and computational cost, with a time complexity linear in the number of sequences. It enables the identification of family-specific conserved residues, providing insight into the functional nuances of protein subclasses. By enhancing the scalability of protein function predictors, SPIN advances our understanding of protein functions and their evolutionary relationships.</p><p><strong>Availability and implementation: </strong>The data and code that support the findings of this study are publicly available at https://gitlab.lcqb.upmc.fr/andrea.mancini/SPIN.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag064"},"PeriodicalIF":2.8,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12970593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-18eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag063
Carmen Li, Sydney Schock, Abigail Costa, Amir Mitchell
Summary: Antibiotic susceptibility testing (AST) is routinely used to evaluate microbial responses to antimicrobials. We present AssiST, a convolutional neural network (CNN) pipeline that classifies microbial growth in scanned 96-well broth microdilution plates to infer drug susceptibility at scale. AssiST accommodates diverse growth morphologies and supports a user-configurable mapping from phenotype to susceptibility calls, enabling flexible use across microorganism species, media types, and drugs. AssiST allows labs to convert flatbed-scanner images into reproducible drug sensitivity readouts with a standard personal computer.
Availability and implementation: AssiST is distributed as a MATLAB library and is freely available for non-commercial use. Code, documentation, and training/inference instructions are available at https://github.com/Mitchell-SysBio/AssiST/. We also provide pre-trained models and a library of sample images. The software accepts image files from standard flatbed scanners. We commit to maintaining the repository for at least 2 years post-publication.
{"title":"AssiST: convolutional neural network for analysis of antibiotic susceptibility testing.","authors":"Carmen Li, Sydney Schock, Abigail Costa, Amir Mitchell","doi":"10.1093/bioadv/vbag063","DOIUrl":"10.1093/bioadv/vbag063","url":null,"abstract":"<p><strong>Summary: </strong>Antibiotic susceptibility testing (AST) is routinely used to evaluate microbial responses to antimicrobials. We present AssiST, a convolutional neural network (CNN) pipeline that classifies microbial growth in scanned 96-well broth microdilution plates to infer drug susceptibility at scale. AssiST accommodates diverse growth morphologies and supports a user-configurable mapping from phenotype to susceptibility calls, enabling flexible use across microorganism species, media types, and drugs. AssiST allows labs to convert flatbed-scanner images into reproducible drug sensitivity readouts with a standard personal computer.</p><p><strong>Availability and implementation: </strong>AssiST is distributed as a MATLAB library and is freely available for non-commercial use. Code, documentation, and training/inference instructions are available at https://github.com/Mitchell-SysBio/AssiST/. We also provide pre-trained models and a library of sample images. The software accepts image files from standard flatbed scanners. We commit to maintaining the repository for at least 2 years post-publication.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag063"},"PeriodicalIF":2.8,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12952218/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147349799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-18eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag061
Daniel Toro-Domínguez, Chang Wang, Iván Ellson-Lancho, Jordi Martorell-Marugán, Pedro Carmona-Sáez, Marta E Alarcón-Riquelme, Frédéric Baribaud
Motivation: Systemic lupus erythematosus patients exhibit a broad clinical spectrum of manifestations and suffer from high rates of treatment failure. These can be attributed to disease heterogeneity due to differentially dysregulated pathways. Precision medicine considering the individualized molecular disease driving mechanisms is a promising strategy to address challenges imposed by disease heterogeneity. Available patient blood transcriptome data coupled with pathway-based single-sample scoring approaches have been extensively employed to reveal molecular footprints of disease states and progression as well as delineate population heterogeneity. However, systemic understanding of pathways involved in disease pathogenesis remains lacking.
Results: We created a SLE-diseaseome, an integrative multi-cohort collection of disease-relevant functional gene sets. This resource contains a comprehensive collection of disease-specific gene signatures combining knowledge from several pathway databases and signature sources robustly defined by integrating multiple studies. It offers reliable and extensive reference signatures in a disease-specific manner for functional interpretation of molecular data from clinical studies.
Availability and implementation: The code used to run the pipeline and the R object containing the SLE-diseaseome collection are available at https://github.com/dtordom/SLEDiseaseome.
{"title":"SLE-diseaseome: a comprehensive meta-collection of systemic lupus erythematosus relevant functional pathways.","authors":"Daniel Toro-Domínguez, Chang Wang, Iván Ellson-Lancho, Jordi Martorell-Marugán, Pedro Carmona-Sáez, Marta E Alarcón-Riquelme, Frédéric Baribaud","doi":"10.1093/bioadv/vbag061","DOIUrl":"https://doi.org/10.1093/bioadv/vbag061","url":null,"abstract":"<p><strong>Motivation: </strong>Systemic lupus erythematosus patients exhibit a broad clinical spectrum of manifestations and suffer from high rates of treatment failure. These can be attributed to disease heterogeneity due to differentially dysregulated pathways. Precision medicine considering the individualized molecular disease driving mechanisms is a promising strategy to address challenges imposed by disease heterogeneity. Available patient blood transcriptome data coupled with pathway-based single-sample scoring approaches have been extensively employed to reveal molecular footprints of disease states and progression as well as delineate population heterogeneity. However, systemic understanding of pathways involved in disease pathogenesis remains lacking.</p><p><strong>Results: </strong>We created a SLE-diseaseome, an integrative multi-cohort collection of disease-relevant functional gene sets. This resource contains a comprehensive collection of disease-specific gene signatures combining knowledge from several pathway databases and signature sources robustly defined by integrating multiple studies. It offers reliable and extensive reference signatures in a disease-specific manner for functional interpretation of molecular data from clinical studies.</p><p><strong>Availability and implementation: </strong>The code used to run the pipeline and the R object containing the SLE-diseaseome collection are available at https://github.com/dtordom/SLEDiseaseome.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag061"},"PeriodicalIF":2.8,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12989159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag058
Julien Charest, Katarina Priselac, Georg H Reischer, Andreas H Farnleitner, Robert L Mach, Astrid R Mach-Aigner
Motivation: The exponential growth of open-access scientific literature presents researchers with unprecedented opportunities but also poses a significant challenge: how to efficiently identify and prioritize relevant publications in a transparent and customizable manner. Existing search engines index large volumes of biomedical literature but rarely provide user-defined ranking options, reproducibility, or integration of domain-specific criteria. This gap is particularly limiting for specialized fields, where nuanced keyword combinations, literature recency, and contextual interpretation are critical.
Results: We present HERMES, an open-source literature mining tool for targeted retrieval and ranking of full-text open-access publications from PubMed Central (PMC). HERMES employs a composite scoring algorithm that integrates keyword frequency, citation counts, and publication age to prioritize publications. It further supports summarization, biomedical entity recognition, and PDF report generation. An intuitive graphical user interface (GUI) allows researchers without programming expertise to perform complex literature mining tasks, while multithreaded processing ensures efficiency for large-scale queries. HERMES provides a reproducible and adaptable framework for literature discovery, empowering researchers to rapidly identify relevant literature and promoting transparency and community-driven extension.
Availability and implementation: HERMES (version 1.2) is implemented in Python (3.11). The source code is freely available on GitHub at https://github.com/julien-charest/hermes and is distributed under the GPL-3 license.
动机:开放获取科学文献的指数级增长为研究人员提供了前所未有的机会,但也提出了重大挑战:如何以透明和可定制的方式有效地识别和优先考虑相关出版物。现有的搜索引擎索引了大量的生物医学文献,但很少提供用户定义的排名选项、可重复性或特定领域标准的集成。这种差距在专业领域尤其有限,在这些领域,微妙的关键字组合、文献近代性和上下文解释至关重要。结果:我们提出了一个开源文献挖掘工具HERMES,用于有针对性地检索PubMed Central (PMC)的全文开放获取出版物并对其进行排名。HERMES采用了一种综合评分算法,该算法集成了关键词频率、引用次数和出版时间,从而对出版物进行优先排序。它进一步支持摘要、生物医学实体识别和PDF报告生成。直观的图形用户界面(GUI)允许没有编程专业知识的研究人员执行复杂的文献挖掘任务,而多线程处理确保了大规模查询的效率。HERMES为文献发现提供了一个可复制和适应性强的框架,使研究人员能够快速识别相关文献,并促进透明度和社区驱动的扩展。可用性和实现:HERMES(版本1.2)在Python(3.11)中实现。源代码可以在GitHub (https://github.com/julien-charest/hermes)上免费获得,并根据GPL-3许可发布。
{"title":"HERMES: an open-source mining tool for open-access literature.","authors":"Julien Charest, Katarina Priselac, Georg H Reischer, Andreas H Farnleitner, Robert L Mach, Astrid R Mach-Aigner","doi":"10.1093/bioadv/vbag058","DOIUrl":"https://doi.org/10.1093/bioadv/vbag058","url":null,"abstract":"<p><strong>Motivation: </strong>The exponential growth of open-access scientific literature presents researchers with unprecedented opportunities but also poses a significant challenge: how to efficiently identify and prioritize relevant publications in a transparent and customizable manner. Existing search engines index large volumes of biomedical literature but rarely provide user-defined ranking options, reproducibility, or integration of domain-specific criteria. This gap is particularly limiting for specialized fields, where nuanced keyword combinations, literature recency, and contextual interpretation are critical.</p><p><strong>Results: </strong>We present HERMES, an open-source literature mining tool for targeted retrieval and ranking of full-text open-access publications from PubMed Central (PMC). HERMES employs a composite scoring algorithm that integrates keyword frequency, citation counts, and publication age to prioritize publications. It further supports summarization, biomedical entity recognition, and PDF report generation. An intuitive graphical user interface (GUI) allows researchers without programming expertise to perform complex literature mining tasks, while multithreaded processing ensures efficiency for large-scale queries. HERMES provides a reproducible and adaptable framework for literature discovery, empowering researchers to rapidly identify relevant literature and promoting transparency and community-driven extension.</p><p><strong>Availability and implementation: </strong>HERMES (version 1.2) is implemented in Python (3.11). The source code is freely available on GitHub at https://github.com/julien-charest/hermes and is distributed under the GPL-3 license.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag058"},"PeriodicalIF":2.8,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12952204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147349773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Understanding how signaling networks differ across molecular subgroups of Parkinson's disease (PD) is essential for gaining further mechanistic insights and advancing therapeutic development for the disease. This study introduces an integrative, stratified computational framework to characterize subgroup-specific changes in kinase-transcription factors' (TFs) interactions using transcriptomic profiles.
Results: Differential expression analysis was leveraged to identify kinases with altered expression across various PD subgroups, while transcription factor activity inferred by multi-sample Virtual Inference of Protein-activity by Enriched Regulon revealed dysregulated transcription relative to controls. Phosphorylation data from SIGNOR 4.0 enabled the construction of kinase-TF subnetworks, which were analysed via pathway enrichment to reveal affected biological pathways. Comparative analyses and modeling revealed both shared and distinct signaling features among PD stratified subgroups. A recurring pattern across multiple groups involved STAT family-specific activation downstream of receptor and non-receptor tyrosine kinases, consistently with a conserved inflammatory and pro-survival signaling axis. In contrast, PD_LRRK2 showed selective involvement of immune-metabolic pathways, including AMPK to HNF4A and PAK5 to NF- B, while PD_GBA and prodromal cohorts were characterized by stress and apoptosis-related mechanisms involving MAPK10 (JNK3), TP53, and hormone receptor pathways (AR and ESR1). Overall, this novel stratified computational framework integrates gene expression, infers subtle TF activity, identifies differentially expressed kinases, and leverages mechanistic interaction data to unveil signaling heterogeneity in PD. Identifying regulators and subgroup-specific network features provides opportunities to inform, influence, and enable the unveiling of novel biomarkers and develop more effective and proactive precision therapeutics.
Availability and implementation: Source code is available at https://github.com/xyzhou218/Kin_TF_net.
{"title":"Stratified signaling network remodeling of kinase-transcription factors' interactions in Parkinson's disease.","authors":"Xiaoyan Zhou, Luca Parisi, Sicen Liu, Ziqi Cheng, Hanwen Liang, Mansour Youseffi, Farideh Javid, Renfei Ma","doi":"10.1093/bioadv/vbag059","DOIUrl":"https://doi.org/10.1093/bioadv/vbag059","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding how signaling networks differ across molecular subgroups of Parkinson's disease (PD) is essential for gaining further mechanistic insights and advancing therapeutic development for the disease. This study introduces an integrative, stratified computational framework to characterize subgroup-specific changes in kinase-transcription factors' (TFs) interactions using transcriptomic profiles.</p><p><strong>Results: </strong>Differential expression analysis was leveraged to identify kinases with altered expression across various PD subgroups, while transcription factor activity inferred by multi-sample Virtual Inference of Protein-activity by Enriched Regulon revealed dysregulated transcription relative to controls. Phosphorylation data from SIGNOR 4.0 enabled the construction of kinase-TF subnetworks, which were analysed via pathway enrichment to reveal affected biological pathways. Comparative analyses and modeling revealed both shared and distinct signaling features among PD stratified subgroups. A recurring pattern across multiple groups involved STAT family-specific activation downstream of receptor and non-receptor tyrosine kinases, consistently with a conserved inflammatory and pro-survival signaling axis. In contrast, PD_LRRK2 showed selective involvement of immune-metabolic pathways, including AMPK to HNF4A and PAK5 to NF- <math><mi>κ</mi></math> B, while PD_GBA and prodromal cohorts were characterized by stress and apoptosis-related mechanisms involving MAPK10 (JNK3), TP53, and hormone receptor pathways (AR and ESR1). Overall, this novel stratified computational framework integrates gene expression, infers subtle TF activity, identifies differentially expressed kinases, and leverages mechanistic interaction data to unveil signaling heterogeneity in PD. Identifying regulators and subgroup-specific network features provides opportunities to inform, influence, and enable the unveiling of novel biomarkers and develop more effective and proactive precision therapeutics.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/xyzhou218/Kin_TF_net.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag059"},"PeriodicalIF":2.8,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955839/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147357791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag057
Shashank Gupta, Veronica Quarato, Wanxin Lai, Carl M Kobel, Velma T E Aho, Arturo Vera-Ponce de León, Sabina Leanti La Rosa, Simen R Sandve, Phillip B Pope, Torgeir R Hvidsten
Holo-omics leverages omics datasets to explore the interactions between hosts and their associated microbiomes. Although the generation of omics data from matching host and microbiome samples is steadily increasing, there remains a scarcity of computational tools capable of integrating and visualizing this data to facilitate the prediction and interpretation of host-microbiome interactions. We present OmniCorr, an R package designed to: (i) manage the complexity of omics data by clustering co-varying features (e.g. genes, proteins, and metabolites) into modules, (ii) visualize correlations of these modules across different omics layers, host-microbiome interfaces, and metadata, and (iii) identify statistically significant associations indicative of putative host-microbiome interactions. OmniCorr's utility is demonstrated using datasets from two systems: (i) Atlantic salmon, integrating host transcriptomics with metagenomics and metatranscriptomics to explore dietary impacts, and (ii) cattle, combining host proteomics with metaproteomics to investigate methane emission variability. Availability and implementation: OmniCorr is freely available at https://github.com/shashank-KU/OmniCorr.
{"title":"OmniCorr: an R-package for visualizing putative host-microbiome interactions using multi-omics data.","authors":"Shashank Gupta, Veronica Quarato, Wanxin Lai, Carl M Kobel, Velma T E Aho, Arturo Vera-Ponce de León, Sabina Leanti La Rosa, Simen R Sandve, Phillip B Pope, Torgeir R Hvidsten","doi":"10.1093/bioadv/vbag057","DOIUrl":"10.1093/bioadv/vbag057","url":null,"abstract":"<p><p>Holo-omics leverages omics datasets to explore the interactions between hosts and their associated microbiomes. Although the generation of omics data from matching host and microbiome samples is steadily increasing, there remains a scarcity of computational tools capable of integrating and visualizing this data to facilitate the prediction and interpretation of host-microbiome interactions. We present <b>OmniCorr</b>, an R package designed to: (i) manage the complexity of omics data by clustering co-varying features (e.g. genes, proteins, and metabolites) into modules, (ii) visualize correlations of these modules across different omics layers, host-microbiome interfaces, and metadata, and (iii) identify statistically significant associations indicative of putative host-microbiome interactions. OmniCorr's utility is demonstrated using datasets from two systems: (i) Atlantic salmon, integrating host transcriptomics with metagenomics and metatranscriptomics to explore dietary impacts, and (ii) cattle, combining host proteomics with metaproteomics to investigate methane emission variability. <i>Availability and implementation</i>: OmniCorr is freely available at https://github.com/shashank-KU/OmniCorr.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag057"},"PeriodicalIF":2.8,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12961270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147379767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}