Pub Date : 2026-03-05DOI: 10.1093/bioinformatics/btag106
Paul J Jost, Frank T Bergmann, Daniel Weindl, Jan Hasenauer
Motivation: Parameter estimation is a cornerstone of data-driven modeling in systems biology. Yet, constructing such problems in a reproducible and accessible manner remains challenging. The PEtab format has established itself as a powerful community standard to encode parameter estimation problems, promoting interoperability and reusability. However, its reliance on multiple interlinked files-often edited manually-can introduce inconsistencies, and new users often struggle to navigate them. Here, we present PEtab-GUI, an open-source Python application designed to streamline the creation, editing, and validation of PEtab problems through an intuitive graphical user interface. PEtab-GUI integrates all PEtab components, including SBML models and tabular files, into a single environment with live error-checking and customizable defaults. Interactive visualization and simulation capabilities enable users to inspect the relationship between the model and the data. PEtab-GUI lowers the barrier to entry for specifying standardized parameter estimation problems, making dynamic modeling more accessible, especially in educational and interdisciplinary settings.
Availability and implementation: PEtab-GUI is implemented in Python, open-source under a 3-Clause BSD license. The code, designed to be modular and extensible, is hosted on https://github.com/PEtab-dev/PEtab-GUI, available as a Zenodo repository at https://doi.org/10.5281/zenodo.15355752, and can be installed from PyPI.
{"title":"PEtab-GUI: A graphical user interface to create, edit and inspect PEtab parameter estimation problems.","authors":"Paul J Jost, Frank T Bergmann, Daniel Weindl, Jan Hasenauer","doi":"10.1093/bioinformatics/btag106","DOIUrl":"https://doi.org/10.1093/bioinformatics/btag106","url":null,"abstract":"<p><strong>Motivation: </strong>Parameter estimation is a cornerstone of data-driven modeling in systems biology. Yet, constructing such problems in a reproducible and accessible manner remains challenging. The PEtab format has established itself as a powerful community standard to encode parameter estimation problems, promoting interoperability and reusability. However, its reliance on multiple interlinked files-often edited manually-can introduce inconsistencies, and new users often struggle to navigate them. Here, we present PEtab-GUI, an open-source Python application designed to streamline the creation, editing, and validation of PEtab problems through an intuitive graphical user interface. PEtab-GUI integrates all PEtab components, including SBML models and tabular files, into a single environment with live error-checking and customizable defaults. Interactive visualization and simulation capabilities enable users to inspect the relationship between the model and the data. PEtab-GUI lowers the barrier to entry for specifying standardized parameter estimation problems, making dynamic modeling more accessible, especially in educational and interdisciplinary settings.</p><p><strong>Availability and implementation: </strong>PEtab-GUI is implemented in Python, open-source under a 3-Clause BSD license. The code, designed to be modular and extensible, is hosted on https://github.com/PEtab-dev/PEtab-GUI, available as a Zenodo repository at https://doi.org/10.5281/zenodo.15355752, and can be installed from PyPI.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147367698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag088
Serena Rosignoli, Sophie Taraglio, Francesco Di Luzio, Elisa Lustrino, Dario Marzella, Arne Elofsson, Massimo Panella, Alessandro Paiardini
Motivation: G-quadruplex-binding proteins (G4BPs) play key roles in RNA metabolism and stress response, yet their identification remains experimentally challenging. Here, we present a deep learning (DL) framework for the prediction of RNA G4BPs (RG4BPs), integrating diverse encoding strategies and neural architectures. Our best-performing model, which includes ESM-2 protein language model embeddings and consists of an LSTM architecture, achieved 86% accuracy in distinguishing RG4BPs from non-binder proteins. The application of this model to the human proteome uncovered 2160 high-confidence RG4BP candidates, many of which display intrinsically disordered regions (IDRs) and enrichment in stress granule organelles. These findings reveal a potential link between G-quadruplex recognition and cellular stress responses. To enable easy and broad access to the framework, we developed G4REP, a web server for RG4BP prediction and analysis. Overall, an effective approach to explore the RG4BPs landscape and uncover novel players in RNA regulation is provided.
Availability: Source code for the G4REP Model training and evaluation is available at: https://github.com/G4REP/G4REPmodel and at https://doi.org/10.5281/zenodo.17963046. G4REP Server is hosted at: https://schubert.bio.uniroma1.it/g4/.
{"title":"A deep learning framework for comprehensive prediction of human RNA G-quadruplex-binding proteins.","authors":"Serena Rosignoli, Sophie Taraglio, Francesco Di Luzio, Elisa Lustrino, Dario Marzella, Arne Elofsson, Massimo Panella, Alessandro Paiardini","doi":"10.1093/bioinformatics/btag088","DOIUrl":"10.1093/bioinformatics/btag088","url":null,"abstract":"<p><strong>Motivation: </strong>G-quadruplex-binding proteins (G4BPs) play key roles in RNA metabolism and stress response, yet their identification remains experimentally challenging. Here, we present a deep learning (DL) framework for the prediction of RNA G4BPs (RG4BPs), integrating diverse encoding strategies and neural architectures. Our best-performing model, which includes ESM-2 protein language model embeddings and consists of an LSTM architecture, achieved 86% accuracy in distinguishing RG4BPs from non-binder proteins. The application of this model to the human proteome uncovered 2160 high-confidence RG4BP candidates, many of which display intrinsically disordered regions (IDRs) and enrichment in stress granule organelles. These findings reveal a potential link between G-quadruplex recognition and cellular stress responses. To enable easy and broad access to the framework, we developed G4REP, a web server for RG4BP prediction and analysis. Overall, an effective approach to explore the RG4BPs landscape and uncover novel players in RNA regulation is provided.</p><p><strong>Availability: </strong>Source code for the G4REP Model training and evaluation is available at: https://github.com/G4REP/G4REPmodel and at https://doi.org/10.5281/zenodo.17963046. G4REP Server is hosted at: https://schubert.bio.uniroma1.it/g4/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146230027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag087
Can Firtina, Maximilian Mordig, Harun Mustafa, Sayan Goswami, Nika Mansouri Ghiasi, Stefano Mercogliano, Furkan Eris, Joel Lindegger, André Kahles, Onur Mutlu
Motivation: Raw nanopore signal analysis is a common approach in genomics to provide fast and resource-efficient analysis without translating the signals to bases (i.e. without basecalling). However, existing solutions cannot interpret raw signals directly if a reference genome is unknown due to a lack of accurate mechanisms to handle increased noise in pairwise raw signal comparison. Our goal is to enable the direct analysis of raw signals without a reference genome. To this end, we propose Rawsamble, the first mechanism that can identify regions of similarity between all raw signal pairs, known as all-vs-all overlapping, using a hash-based search mechanism.
Results: We use these overlaps to construct de novo assembly graphs with an existing assembler, miniasm, off-the-shelf. To our knowledge, these are the first de novo assemblies ever constructed directly from raw signals without basecalling. Our extensive evaluations across multiple genomes of varying sizes show that Rawsamble provides a significant speedup (on average by 5.01× and up to 23.10×) and reduces peak memory usage (on average by 5.74× and up to by 22.00×) compared to a conventional genome assembly pipeline using the state-of-the-art tools for basecalling (Dorado's fastest mode) and overlapping (minimap2) on a CPU. We find that around one-third of Rawsamble's overlapping pairs are also found by minimap2. We find that when we use overlapping reads from Rawsamble, we can construct unitigs that are (i) as accurate as those built from minimap2's overlaps and (ii) up to half a chromosome in length (e.g. 2.3 million bases for E. coli).
Availability and implementation: Rawsamble is available at https://github.com/CMU-SAFARI/RawHash. We also provide the scripts to fully reproduce our results on our GitHub page.
{"title":"Rawsamble: overlapping raw nanopore signals using a hash-based seeding mechanism.","authors":"Can Firtina, Maximilian Mordig, Harun Mustafa, Sayan Goswami, Nika Mansouri Ghiasi, Stefano Mercogliano, Furkan Eris, Joel Lindegger, André Kahles, Onur Mutlu","doi":"10.1093/bioinformatics/btag087","DOIUrl":"10.1093/bioinformatics/btag087","url":null,"abstract":"<p><strong>Motivation: </strong>Raw nanopore signal analysis is a common approach in genomics to provide fast and resource-efficient analysis without translating the signals to bases (i.e. without basecalling). However, existing solutions cannot interpret raw signals directly if a reference genome is unknown due to a lack of accurate mechanisms to handle increased noise in pairwise raw signal comparison. Our goal is to enable the direct analysis of raw signals without a reference genome. To this end, we propose Rawsamble, the first mechanism that can identify regions of similarity between all raw signal pairs, known as all-vs-all overlapping, using a hash-based search mechanism.</p><p><strong>Results: </strong>We use these overlaps to construct de novo assembly graphs with an existing assembler, miniasm, off-the-shelf. To our knowledge, these are the first de novo assemblies ever constructed directly from raw signals without basecalling. Our extensive evaluations across multiple genomes of varying sizes show that Rawsamble provides a significant speedup (on average by 5.01× and up to 23.10×) and reduces peak memory usage (on average by 5.74× and up to by 22.00×) compared to a conventional genome assembly pipeline using the state-of-the-art tools for basecalling (Dorado's fastest mode) and overlapping (minimap2) on a CPU. We find that around one-third of Rawsamble's overlapping pairs are also found by minimap2. We find that when we use overlapping reads from Rawsamble, we can construct unitigs that are (i) as accurate as those built from minimap2's overlaps and (ii) up to half a chromosome in length (e.g. 2.3 million bases for E. coli).</p><p><strong>Availability and implementation: </strong>Rawsamble is available at https://github.com/CMU-SAFARI/RawHash. We also provide the scripts to fully reproduce our results on our GitHub page.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12975284/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag058
Joan Segura, Ruben Sanchez-Garcia, Sebastian Bittrich, Yana Rose, Stephen K Burley, Jose M Duarte
Motivation: The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures.
Results: Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.
Availability and implementation: Source code available at https://github.com/bioinsilico/rcsb-embedding-search. Source code DOI: https://doi.org/10.6084/m9.figshare.30546698.v1. Benchmark datasets DOI: https://doi.org/10.6084/m9.figshare.30546650.v1. Web server prototype available at: http://embedding-search.rcsb.org/.
{"title":"Multi-scale structural similarity embedding search across entire proteomes.","authors":"Joan Segura, Ruben Sanchez-Garcia, Sebastian Bittrich, Yana Rose, Stephen K Burley, Jose M Duarte","doi":"10.1093/bioinformatics/btag058","DOIUrl":"10.1093/bioinformatics/btag058","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures.</p><p><strong>Results: </strong>Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.</p><p><strong>Availability and implementation: </strong>Source code available at https://github.com/bioinsilico/rcsb-embedding-search. Source code DOI: https://doi.org/10.6084/m9.figshare.30546698.v1. Benchmark datasets DOI: https://doi.org/10.6084/m9.figshare.30546650.v1. Web server prototype available at: http://embedding-search.rcsb.org/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12955762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146115223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag056
Jiaying Hu, Yihang Du, Suyang Hou, Yueyang Ding, Jinyan Li, Hao Wu, Xiaobo Sun
Motivation: Spatial clustering is a critical analytical task in spatial transcriptomics (ST) that aids in uncovering the spatial molecular mechanisms underlying biological phenotypes. Along with the numerous spatial clustering methods, there comes the imperative need for an effective metric to evaluate their performance. An ideal metric should consider three factors: label agreement, spatial organization, and error severity. However, existing evaluation metrics focus solely on either label agreement or spatial organization, leading to biased and misleading evaluations.
Results: To fill this gap, we propose CEMUSA, a novel graph-based metric that integrates these factors into a unified evaluation framework. Extensive testing on both simulated and real datasets demonstrate CEMUSA's superiority over conventional metrics in differentiating clustering results with subtle differences in topology and error severity, while maintaining computational efficiency.
Availability and implementation: The source code and data are freely available at https://github.com/YihDu/CEMUSA. CEMUSA is implemented as an R package at https://yihdu.github.io/CEMUSA.
{"title":"CEMUSA: a graph-based integrative metric for evaluating clusters in spatial transcriptomics.","authors":"Jiaying Hu, Yihang Du, Suyang Hou, Yueyang Ding, Jinyan Li, Hao Wu, Xiaobo Sun","doi":"10.1093/bioinformatics/btag056","DOIUrl":"10.1093/bioinformatics/btag056","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial clustering is a critical analytical task in spatial transcriptomics (ST) that aids in uncovering the spatial molecular mechanisms underlying biological phenotypes. Along with the numerous spatial clustering methods, there comes the imperative need for an effective metric to evaluate their performance. An ideal metric should consider three factors: label agreement, spatial organization, and error severity. However, existing evaluation metrics focus solely on either label agreement or spatial organization, leading to biased and misleading evaluations.</p><p><strong>Results: </strong>To fill this gap, we propose CEMUSA, a novel graph-based metric that integrates these factors into a unified evaluation framework. Extensive testing on both simulated and real datasets demonstrate CEMUSA's superiority over conventional metrics in differentiating clustering results with subtle differences in topology and error severity, while maintaining computational efficiency.</p><p><strong>Availability and implementation: </strong>The source code and data are freely available at https://github.com/YihDu/CEMUSA. CEMUSA is implemented as an R package at https://yihdu.github.io/CEMUSA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12960911/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146151369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag105
Gaëlle Letort, Tanya Foley, Ilona Mignerey, Laure Bally-Cuif, Nicolas Dray
Summary: Characterizing the distribution of biological marker expression at the single cell level in whole tissues requires diverse image analysis steps, such as segmentation of cells and nuclei, detection of RNA transcripts (or other staining), or their mapping (e.g. assigning nuclei/RNA dots to their corresponding cell). Several software programs or algorithms have been developed for each step independently but integrating them into a comprehensive pipeline for the quantification of individual cells from 3D imaging samples remains a significant challenge. We developed FishFeats, an open-source and flexible napari plugin, to perform all these steps together within the same framework, taking advantage of available and efficient software applications. The primary core of our pipeline is to propose a user-friendly tool for users who do not have a computational background. FishFeats streamlines extracting quantitative information from multimodal 3D fluorescent microscopy images (smFISH expression in individual cells, immunohistochemical staining, cell morphologies, cell classification) to a unified "cell-by-cell" table for downstream analysis, without requiring any coding. Our second focus is to propose and ease manual correction of each step to further improve accuracy, which can be critical for many biological studies.
Availability and implementation: FishFeats is open source under the BSD-3 license, freely available on github: https://github.com/gletort/FishFeats (DOI 10.5281/zenodo.17701225). FishFeats is developed in python, as a napari plugin for the user interface. Documentation is available in the github pages: https://gletort.github.io/FishFeats/. To report an issue using FishFeats or contributing to it please file an issue in the github repository https://github.com/gletort/FishFeats/issues.
摘要:在整个组织的单细胞水平上表征生物标记表达的分布需要不同的图像分析步骤,如细胞和细胞核的分割,RNA转录物的检测(或其他染色),或它们的定位(例如,将细胞核/RNA点分配到相应的细胞)。已经为每一步独立开发了几个软件程序或算法,但将它们集成到一个全面的管道中,用于从3D成像样本中定量单个细胞仍然是一个重大挑战。我们开发了fishfeat,一个开源和灵活的napari插件(sofronview et al. 2025),在同一框架内一起执行所有这些步骤,利用可用和高效的软件应用程序。我们管道的主要核心是为没有计算背景的用户提供一个用户友好的工具。fish壮举简化了从多模态3D荧光显微镜图像中提取定量信息(单个细胞中的smFISH表达,免疫组织化学染色,细胞形态学,细胞分类)到统一的“逐细胞”表进行下游分析,而无需任何编码。我们的第二个重点是提出并简化每个步骤的人工校正,以进一步提高准确性,这对许多生物学研究至关重要。可用性:fishfeat在BSD-3许可下是开源的,可以在github上免费获得:https://github.com/gletort/FishFeats (DOI 10.5281/zenodo.17701225)。fishfeat是用python开发的,作为用户界面的napari插件。文档可在github页面中获得:https://gletort.github.io/FishFeats/.Contact:要报告使用fishfeat或为其做出贡献的问题,请在github存储库中提交问题https://github.com/gletort/FishFeats/issues.Supplementary信息:补充数据可在Bioinformatics在线获取。
{"title":"FishFeats: streamlined quantification of multimodal labeling at the single-cell level in 3D tissues.","authors":"Gaëlle Letort, Tanya Foley, Ilona Mignerey, Laure Bally-Cuif, Nicolas Dray","doi":"10.1093/bioinformatics/btag105","DOIUrl":"10.1093/bioinformatics/btag105","url":null,"abstract":"<p><strong>Summary: </strong>Characterizing the distribution of biological marker expression at the single cell level in whole tissues requires diverse image analysis steps, such as segmentation of cells and nuclei, detection of RNA transcripts (or other staining), or their mapping (e.g. assigning nuclei/RNA dots to their corresponding cell). Several software programs or algorithms have been developed for each step independently but integrating them into a comprehensive pipeline for the quantification of individual cells from 3D imaging samples remains a significant challenge. We developed FishFeats, an open-source and flexible napari plugin, to perform all these steps together within the same framework, taking advantage of available and efficient software applications. The primary core of our pipeline is to propose a user-friendly tool for users who do not have a computational background. FishFeats streamlines extracting quantitative information from multimodal 3D fluorescent microscopy images (smFISH expression in individual cells, immunohistochemical staining, cell morphologies, cell classification) to a unified \"cell-by-cell\" table for downstream analysis, without requiring any coding. Our second focus is to propose and ease manual correction of each step to further improve accuracy, which can be critical for many biological studies.</p><p><strong>Availability and implementation: </strong>FishFeats is open source under the BSD-3 license, freely available on github: https://github.com/gletort/FishFeats (DOI 10.5281/zenodo.17701225). FishFeats is developed in python, as a napari plugin for the user interface. Documentation is available in the github pages: https://gletort.github.io/FishFeats/. To report an issue using FishFeats or contributing to it please file an issue in the github repository https://github.com/gletort/FishFeats/issues.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13003315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag070
Olivia Angelin-Bonnet, Lindy Guo, Roy Storey, Susan Thomson
Motivation: In the past decades, many statistical methods for integrating multi-omics data have been developed. They have been implemented into software tools, which differ widely in their programming choices, such as the format required for data input, or the format of the generated integration results. This lack of standards renders cumbersome and time-intensive the application and comparison of different integration tools to the same multi-omics dataset.
Results: We have developed the moiraine R package for constructing reproducible multi-omics integration pipelines, which enables users to apply one or more statistical methods for multi-omics integration to their own multi-omics dataset. moiraine facilitates the preprocessing of the omics datasets and automates their formatting for the integration step. It simplifies the interpretation and evaluation of the integration results through the construction of visualizations in which metadata about samples and features can easily be included. Crucially, it enables the comparison of results obtained with different integration tools, allowing users to assess the robustness of their results.
Availability and implementation: The moiraine R package is publicly available at https://github.com/Plant-Food-Research-Open/moiraine; an archival snapshot of the package is available on Zenodo at https://doi.org/10.5281/zenodo.17172718. A detailed tutorial is available at https://plant-food-research-open.github.io/moiraine-manual/.
{"title":"moiraine: an R package to construct reproducible pipelines for the application and comparison of multi-omics integration methods.","authors":"Olivia Angelin-Bonnet, Lindy Guo, Roy Storey, Susan Thomson","doi":"10.1093/bioinformatics/btag070","DOIUrl":"10.1093/bioinformatics/btag070","url":null,"abstract":"<p><strong>Motivation: </strong>In the past decades, many statistical methods for integrating multi-omics data have been developed. They have been implemented into software tools, which differ widely in their programming choices, such as the format required for data input, or the format of the generated integration results. This lack of standards renders cumbersome and time-intensive the application and comparison of different integration tools to the same multi-omics dataset.</p><p><strong>Results: </strong>We have developed the moiraine R package for constructing reproducible multi-omics integration pipelines, which enables users to apply one or more statistical methods for multi-omics integration to their own multi-omics dataset. moiraine facilitates the preprocessing of the omics datasets and automates their formatting for the integration step. It simplifies the interpretation and evaluation of the integration results through the construction of visualizations in which metadata about samples and features can easily be included. Crucially, it enables the comparison of results obtained with different integration tools, allowing users to assess the robustness of their results.</p><p><strong>Availability and implementation: </strong>The moiraine R package is publicly available at https://github.com/Plant-Food-Research-Open/moiraine; an archival snapshot of the package is available on Zenodo at https://doi.org/10.5281/zenodo.17172718. A detailed tutorial is available at https://plant-food-research-open.github.io/moiraine-manual/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12960915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag079
Leonid Chindelevitch, Åsa K Hedman, Dmitri Bichko, Daniel Ziemek
Motivation: Traditional genome-wide association studies (GWAS) aim to uncover the genetic variants associated with a single phenotype of interest (typically a disease), and to elucidate its genotypic architecture. However, many of today's GWAS simultaneously measure multiple related phenotypes, leading to the possibility of pursuing the reverse aim of elucidating the "phenotypic architecture" of a single genetic variant. In other words, we may ask what combination of measured phenotypes is associated with a given genotypic variant. ReverseGWAS is an algorithmic platform for answering such questions in the context of large-scale multi-phenotype GWAS.
Results: We demonstrate the effectiveness of ReverseGWAS on simulated data, showing its ability to identify logical combinations of phenotypes with a reasonable amount of noise. We then apply it to a selection of combined phenotypes from the UK Biobank, obtaining 719 candidate associations using autoimmune diseases and 205 using common ICD10 codes. We find that the majority of these associations (546/719 and 111/205, respectively) successfully replicate in an independent cohort, FinnGen.
Availability and implementation: The source code of ReverseGWAS is freely available to non-commercial users as an installable R package at https://github.com/Leonardini/rgwas.
{"title":"ReverseGWAS identifies combined phenotypes associated with a genotype in GWA studies.","authors":"Leonid Chindelevitch, Åsa K Hedman, Dmitri Bichko, Daniel Ziemek","doi":"10.1093/bioinformatics/btag079","DOIUrl":"10.1093/bioinformatics/btag079","url":null,"abstract":"<p><strong>Motivation: </strong>Traditional genome-wide association studies (GWAS) aim to uncover the genetic variants associated with a single phenotype of interest (typically a disease), and to elucidate its genotypic architecture. However, many of today's GWAS simultaneously measure multiple related phenotypes, leading to the possibility of pursuing the reverse aim of elucidating the \"phenotypic architecture\" of a single genetic variant. In other words, we may ask what combination of measured phenotypes is associated with a given genotypic variant. ReverseGWAS is an algorithmic platform for answering such questions in the context of large-scale multi-phenotype GWAS.</p><p><strong>Results: </strong>We demonstrate the effectiveness of ReverseGWAS on simulated data, showing its ability to identify logical combinations of phenotypes with a reasonable amount of noise. We then apply it to a selection of combined phenotypes from the UK Biobank, obtaining 719 candidate associations using autoimmune diseases and 205 using common ICD10 codes. We find that the majority of these associations (546/719 and 111/205, respectively) successfully replicate in an independent cohort, FinnGen.</p><p><strong>Availability and implementation: </strong>The source code of ReverseGWAS is freely available to non-commercial users as an installable R package at https://github.com/Leonardini/rgwas.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13003317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146215170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag100
Jarno N Alanko, Ilya B Slizovskiy, Daniel Lokshtanov, Travis Gagie, Noelle R Noyes, Christina Boucher
We clarify the design principles and evaluation choices underlying Syotti, a robust and scalable probe-design tool developed to support large, heterogeneous bacterial datasets with minimal parameter tuning. We highlight Syotti's ability to perform simultaneous large-scale designs and its effectiveness as a reliable alternative when existing tools such as CATCH are not well suited to the problem setting.
{"title":"Response to: \"best practices when benchmarking CATCH for the design of genome enrichment probes\".","authors":"Jarno N Alanko, Ilya B Slizovskiy, Daniel Lokshtanov, Travis Gagie, Noelle R Noyes, Christina Boucher","doi":"10.1093/bioinformatics/btag100","DOIUrl":"10.1093/bioinformatics/btag100","url":null,"abstract":"<p><p>We clarify the design principles and evaluation choices underlying Syotti, a robust and scalable probe-design tool developed to support large, heterogeneous bacterial datasets with minimal parameter tuning. We highlight Syotti's ability to perform simultaneous large-scale designs and its effectiveness as a reliable alternative when existing tools such as CATCH are not well suited to the problem setting.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12996868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147349903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-28DOI: 10.1093/bioinformatics/btag073
Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, Katja Steiger
Motivation: Spatial omics data provides unprecedented insights into disease biology, yet its complexity introduces significant challenges in data analysis. Comprehensive analysis requires frameworks that integrate diverse modalities and enable joint processing of multiple datasets and corresponding metadata.
Results: To address these challenges, we introduce InSituPy, a versatile and scalable framework for analyzing spatial omics data from the multi-sample level down to the cellular and subcellular level. Its hierarchical data structure organizes all relevant data modalities per sample and links them to their corresponding metadata, enabling scalable analysis of large patient cohorts using spatial omics technologies. Interactive visualization tools within InSituPy enable seamless integration of histopathological expertise, promoting collaborative hypothesis generation in translational research. Additionally, InSituPy includes built-in analytical algorithms and interfaces with external tools, establishing a standardized workflow for multi-sample spatial omics data analysis.
Availability: The Python package InSituPy is publicly available on GitHub (https://github.com/SpatialPathology/InSituPy) and PyPi (https://pypi.org/project/insitupy-spatial/), and archived on Zenodo (DOI: 10.5281/zenodo.18459471). Tutorials and documentation for InSituPy are available at https://insitupy.readthedocs.io/. All code to replicate the results shown in this manuscript can be found in the GitHub repository. Scripts to connect QuPath and InSituPy can be found at https://github.com/SpatialPathology/InSituPy-QuPath. All data required to complete the tutorials is publicly available, and functions to download the data have been implemented. A Zulip community chat for user support and discussion is accessible at https://insitupy.zulipchat.com.
{"title":"InSituPy: a framework for histology-guided, multi-sample analysis of single-cell spatial omics data.","authors":"Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, Katja Steiger","doi":"10.1093/bioinformatics/btag073","DOIUrl":"10.1093/bioinformatics/btag073","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial omics data provides unprecedented insights into disease biology, yet its complexity introduces significant challenges in data analysis. Comprehensive analysis requires frameworks that integrate diverse modalities and enable joint processing of multiple datasets and corresponding metadata.</p><p><strong>Results: </strong>To address these challenges, we introduce InSituPy, a versatile and scalable framework for analyzing spatial omics data from the multi-sample level down to the cellular and subcellular level. Its hierarchical data structure organizes all relevant data modalities per sample and links them to their corresponding metadata, enabling scalable analysis of large patient cohorts using spatial omics technologies. Interactive visualization tools within InSituPy enable seamless integration of histopathological expertise, promoting collaborative hypothesis generation in translational research. Additionally, InSituPy includes built-in analytical algorithms and interfaces with external tools, establishing a standardized workflow for multi-sample spatial omics data analysis.</p><p><strong>Availability: </strong>The Python package InSituPy is publicly available on GitHub (https://github.com/SpatialPathology/InSituPy) and PyPi (https://pypi.org/project/insitupy-spatial/), and archived on Zenodo (DOI: 10.5281/zenodo.18459471). Tutorials and documentation for InSituPy are available at https://insitupy.readthedocs.io/. All code to replicate the results shown in this manuscript can be found in the GitHub repository. Scripts to connect QuPath and InSituPy can be found at https://github.com/SpatialPathology/InSituPy-QuPath. All data required to complete the tutorials is publicly available, and functions to download the data have been implemented. A Zulip community chat for user support and discussion is accessible at https://insitupy.zulipchat.com.</p><p><strong>Contact: </strong>j.wirth@tum.de, katja.steiger@tum.de.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12988772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}