首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
Robustly interrogating machine learning-based scoring functions: what are they learning?
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf040
Guy Durant, Fergus Boyles, Kristian Birchall, Brian Marsden, Charlotte M Deane

Motivation: Machine learning-based scoring functions (MLBSFs) have been found to exhibit inconsistent performance on different benchmarks and be prone to learning dataset bias. For the field to develop MLBSFs that learn a generalizable understanding of physics, a more rigorous understanding of how they perform is required.

Results: In this work, we compared the performance of a diverse set of popular MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, and PointVS) to our proposed baseline models that can only learn dataset biases on a range of benchmarks. We found that these baseline models were competitive in accuracy to these MLBSFs in almost all proposed benchmarks, indicating these models only learn dataset biases. Our tests and provided platform, ToolBoxSF, will enable researchers to robustly interrogate MLBSF performance and determine the effect of dataset biases on their predictions.

Availability and implementation: https://github.com/guydurant/toolboxsf.

{"title":"Robustly interrogating machine learning-based scoring functions: what are they learning?","authors":"Guy Durant, Fergus Boyles, Kristian Birchall, Brian Marsden, Charlotte M Deane","doi":"10.1093/bioinformatics/btaf040","DOIUrl":"10.1093/bioinformatics/btaf040","url":null,"abstract":"<p><strong>Motivation: </strong>Machine learning-based scoring functions (MLBSFs) have been found to exhibit inconsistent performance on different benchmarks and be prone to learning dataset bias. For the field to develop MLBSFs that learn a generalizable understanding of physics, a more rigorous understanding of how they perform is required.</p><p><strong>Results: </strong>In this work, we compared the performance of a diverse set of popular MLBSFs (RFScore, SIGN, OnionNet-2, Pafnucy, and PointVS) to our proposed baseline models that can only learn dataset biases on a range of benchmarks. We found that these baseline models were competitive in accuracy to these MLBSFs in almost all proposed benchmarks, indicating these models only learn dataset biases. Our tests and provided platform, ToolBoxSF, will enable researchers to robustly interrogate MLBSF performance and determine the effect of dataset biases on their predictions.</p><p><strong>Availability and implementation: </strong>https://github.com/guydurant/toolboxsf.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tttrlib: modular software for integrating fluorescence spectroscopy, imaging, and molecular modeling. Tttrlib:集成荧光光谱,成像和分子建模的模块化软件。
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf025
Thomas-Otavio Peulen, Katherina Hemmen, Annemarie Greife, Benjamin M Webb, Suren Felekyan, Andrej Sali, Claus A M Seidel, Hugo Sanabria, Katrin G Heinze

Summary: We introduce software for reading, writing and processing fluorescence single-molecule and image spectroscopy data and developing analysis pipelines to unify various spectroscopic analysis tools. Our software can be used for processing multiple experiment types, e.g. for time-resolved single-molecule spectroscopy, laser scanning microscopy, fluorescence correlation spectroscopy and image correlation spectroscopy. The software is file format agnostic and processes multiple time-resolved data formats and outputs. Our software eliminates the need for data conversion and mitigates data archiving issues.

Availability and implementation: tttrlib is available via pip (https://pypi.org/project/tttrlib/) and bioconda while the open-source code is available via GitHub (https://github.com/fluorescence-tools/tttrlib). Presented examples and additional documentation demonstrating how to implement in vitro and live-cell image spectroscopy analysis are available at https://docs.peulen.xyz/tttrlib and https://zenodo.org/records/14002224.

摘要:我们引入了荧光单分子和图像光谱数据的读写和处理软件,并开发了分析管道,以统一各种光谱分析工具。我们的软件可用于处理多种实验类型,例如时间分辨单分子光谱,激光扫描显微镜,荧光相关光谱和图像相关光谱。该软件是文件格式无关的,并处理多种时间分辨的数据格式和输出。我们的软件消除了数据转换的需要,并减轻了数据存档问题。可用性和实现:tttrlib可通过pip (https://pypi.org/project/tttrlib/)和bioconda获得,而开源代码可通过GitHub (https://github.com/fluorescence-tools/tttrlib)获得。演示如何在体外和活细胞图像光谱分析的示例和附加文档可在https://docs.peulen.xyz/tttrlib和https://zenodo.org/records/14002224.Supplementary上获得信息:补充数据可在Bioinformatics在线获得。
{"title":"tttrlib: modular software for integrating fluorescence spectroscopy, imaging, and molecular modeling.","authors":"Thomas-Otavio Peulen, Katherina Hemmen, Annemarie Greife, Benjamin M Webb, Suren Felekyan, Andrej Sali, Claus A M Seidel, Hugo Sanabria, Katrin G Heinze","doi":"10.1093/bioinformatics/btaf025","DOIUrl":"10.1093/bioinformatics/btaf025","url":null,"abstract":"<p><strong>Summary: </strong>We introduce software for reading, writing and processing fluorescence single-molecule and image spectroscopy data and developing analysis pipelines to unify various spectroscopic analysis tools. Our software can be used for processing multiple experiment types, e.g. for time-resolved single-molecule spectroscopy, laser scanning microscopy, fluorescence correlation spectroscopy and image correlation spectroscopy. The software is file format agnostic and processes multiple time-resolved data formats and outputs. Our software eliminates the need for data conversion and mitigates data archiving issues.</p><p><strong>Availability and implementation: </strong>tttrlib is available via pip (https://pypi.org/project/tttrlib/) and bioconda while the open-source code is available via GitHub (https://github.com/fluorescence-tools/tttrlib). Presented examples and additional documentation demonstrating how to implement in vitro and live-cell image spectroscopy analysis are available at https://docs.peulen.xyz/tttrlib and https://zenodo.org/records/14002224.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scatterbar: an R package for visualizing proportional data across spatially resolved coordinates.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf047
Dee Velazquez, Jean Fan

Motivation: Displaying proportional data across many spatially resolved coordinates is a challenging but important data visualization task, particularly for spatially resolved transcriptomics data. Scatter pie plots are one type of commonly used data visualization for such data but present perceptual challenges that may lead to difficulties in interpretation. Increasing the visual saliency of such data visualizations can help viewers more accurately identify proportional trends and compare proportional differences across spatial locations.

Results: We developed scatterbar, an open-source R package that extends ggplot2, to visualize proportional data across many spatially resolved coordinates using scatter stacked bar plots. We apply scatterbar to visualize deconvolved cell-type proportions from a spatial transcriptomics dataset of the adult mouse brain to demonstrate how scatter stacked bar plots can enhance the distinguishability of proportional distributions compared to scatter pie plots.

Availability and implementation: scatterbar is available on CRAN https://cran.r-project.org/package=scatterbar with additional documentation and tutorials at https://jef.works/scatterbar/.

{"title":"scatterbar: an R package for visualizing proportional data across spatially resolved coordinates.","authors":"Dee Velazquez, Jean Fan","doi":"10.1093/bioinformatics/btaf047","DOIUrl":"10.1093/bioinformatics/btaf047","url":null,"abstract":"<p><strong>Motivation: </strong>Displaying proportional data across many spatially resolved coordinates is a challenging but important data visualization task, particularly for spatially resolved transcriptomics data. Scatter pie plots are one type of commonly used data visualization for such data but present perceptual challenges that may lead to difficulties in interpretation. Increasing the visual saliency of such data visualizations can help viewers more accurately identify proportional trends and compare proportional differences across spatial locations.</p><p><strong>Results: </strong>We developed scatterbar, an open-source R package that extends ggplot2, to visualize proportional data across many spatially resolved coordinates using scatter stacked bar plots. We apply scatterbar to visualize deconvolved cell-type proportions from a spatial transcriptomics dataset of the adult mouse brain to demonstrate how scatter stacked bar plots can enhance the distinguishability of proportional distributions compared to scatter pie plots.</p><p><strong>Availability and implementation: </strong>scatterbar is available on CRAN https://cran.r-project.org/package=scatterbar with additional documentation and tutorials at https://jef.works/scatterbar/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11829801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143071297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAMIL: channel attention-based multiple instance learning for whole slide image classification. CAMIL:基于多实例学习的全幻灯片图像分类。
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf024
Jinyang Mao, Junlin Xu, Xianfang Tang, Yongjin Liu, Heaven Zhao, Geng Tian, Jialiang Yang

Motivation: The classification task based on whole-slide images (WSIs) is a classic problem in computational pathology. Multiple instance learning (MIL) provides a robust framework for analyzing whole slide images with slide-level labels at gigapixel resolution. However, existing MIL models typically focus on modeling the relationships between instances while neglecting the variability across the channel dimensions of instances, which prevents the model from fully capturing critical information in the channel dimension.

Results: To address this issue, we propose a plug-and-play module called Multi-scale Channel Attention Block (MCAB), which models the interdependencies between channels by leveraging local features with different receptive fields. By alternately stacking four layers of Transformer and MCAB, we designed a channel attention-based MIL model (CAMIL) capable of simultaneously modeling both inter-instance relationships and intra-channel dependencies. To verify the performance of the proposed CAMIL in classification tasks, several comprehensive experiments were conducted across three datasets: Camelyon16, TCGA-NSCLC, and TCGA-RCC. Empirical results demonstrate that, whether the feature extractor is pretrained on natural images or on WSIs, our CAMIL surpasses current state-of-the-art MIL models across multiple evaluation metrics.

Availability and implementation: All implementation code is available at https://github.com/maojy0914/CAMIL.

动机:基于全切片图像的分类任务是计算病理学中的一个经典问题。多实例学习(MIL)为分析具有十亿像素分辨率的幻灯片级标签的整个幻灯片图像提供了一个强大的框架。然而,现有的MIL模型通常侧重于对实例之间的关系进行建模,而忽略了实例的通道维度之间的可变性,这使得模型无法完全捕获通道维度中的关键信息。结果:为了解决这个问题,我们提出了一个即插即用的模块,称为多尺度通道注意块(MCAB),它通过利用具有不同接受域的局部特征来模拟通道之间的相互依赖性。通过交替叠加四层Transformer和MCAB,我们设计了一个基于通道注意的MIL模型(CAMIL),该模型能够同时建模实例间关系和通道内依赖关系。为了验证所提出的CAMIL在分类任务中的性能,我们在Camelyon16、TCGA-NSCLC和TCGA-RCC三个数据集上进行了多项综合实验。实证结果表明,无论特征提取器是在自然图像上还是在wsi上进行预训练,我们的CAMIL在多个评估指标上都超过了当前最先进的MIL模型。可用性:所有实现代码可在https://github.com/maojy0914/CAMIL.Supplementary上获得信息:补充数据可在Bioinformatics在线上获得。
{"title":"CAMIL: channel attention-based multiple instance learning for whole slide image classification.","authors":"Jinyang Mao, Junlin Xu, Xianfang Tang, Yongjin Liu, Heaven Zhao, Geng Tian, Jialiang Yang","doi":"10.1093/bioinformatics/btaf024","DOIUrl":"10.1093/bioinformatics/btaf024","url":null,"abstract":"<p><strong>Motivation: </strong>The classification task based on whole-slide images (WSIs) is a classic problem in computational pathology. Multiple instance learning (MIL) provides a robust framework for analyzing whole slide images with slide-level labels at gigapixel resolution. However, existing MIL models typically focus on modeling the relationships between instances while neglecting the variability across the channel dimensions of instances, which prevents the model from fully capturing critical information in the channel dimension.</p><p><strong>Results: </strong>To address this issue, we propose a plug-and-play module called Multi-scale Channel Attention Block (MCAB), which models the interdependencies between channels by leveraging local features with different receptive fields. By alternately stacking four layers of Transformer and MCAB, we designed a channel attention-based MIL model (CAMIL) capable of simultaneously modeling both inter-instance relationships and intra-channel dependencies. To verify the performance of the proposed CAMIL in classification tasks, several comprehensive experiments were conducted across three datasets: Camelyon16, TCGA-NSCLC, and TCGA-RCC. Empirical results demonstrate that, whether the feature extractor is pretrained on natural images or on WSIs, our CAMIL surpasses current state-of-the-art MIL models across multiple evaluation metrics.</p><p><strong>Availability and implementation: </strong>All implementation code is available at https://github.com/maojy0914/CAMIL.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11802473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiCForecast: dynamic network optical flow estimation algorithm for spatiotemporal Hi-C data forecasting.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf030
Dmitry Pinchuk, H M A Mohit Chowdhury, Abhishek Pandeya, Oluwatosin Oluwadare

Motivation: The exploration of the 3D organization of DNA within the nucleus in relation to various stages of cellular development has led to experiments generating spatiotemporal Hi-C data. However, there is limited spatiotemporal Hi-C data for many organisms, impeding the study of 3D genome dynamics. To overcome this limitation and advance our understanding of genome organization, it is crucial to develop methods for forecasting Hi-C data at future time points from existing timeseries Hi-C data.

Result: In this work, we designed a novel framework named HiCForecast, adopting a dynamic voxel flow algorithm to forecast future spatiotemporal Hi-C data. We evaluated how well our method generalizes forecasting data across different species and systems, ensuring performance in homogeneous, heterogeneous, and general contexts. Using both computational and biological evaluation metrics, our results show that HiCForecast outperforms the current state-of-the-art algorithm, emerging as an efficient and powerful tool for forecasting future spatiotemporal Hi-C datasets.

Availability and implementation: HiCForecast is publicly available at https://github.com/OluwadareLab/HiCForecast.

{"title":"HiCForecast: dynamic network optical flow estimation algorithm for spatiotemporal Hi-C data forecasting.","authors":"Dmitry Pinchuk, H M A Mohit Chowdhury, Abhishek Pandeya, Oluwatosin Oluwadare","doi":"10.1093/bioinformatics/btaf030","DOIUrl":"10.1093/bioinformatics/btaf030","url":null,"abstract":"<p><strong>Motivation: </strong>The exploration of the 3D organization of DNA within the nucleus in relation to various stages of cellular development has led to experiments generating spatiotemporal Hi-C data. However, there is limited spatiotemporal Hi-C data for many organisms, impeding the study of 3D genome dynamics. To overcome this limitation and advance our understanding of genome organization, it is crucial to develop methods for forecasting Hi-C data at future time points from existing timeseries Hi-C data.</p><p><strong>Result: </strong>In this work, we designed a novel framework named HiCForecast, adopting a dynamic voxel flow algorithm to forecast future spatiotemporal Hi-C data. We evaluated how well our method generalizes forecasting data across different species and systems, ensuring performance in homogeneous, heterogeneous, and general contexts. Using both computational and biological evaluation metrics, our results show that HiCForecast outperforms the current state-of-the-art algorithm, emerging as an efficient and powerful tool for forecasting future spatiotemporal Hi-C datasets.</p><p><strong>Availability and implementation: </strong>HiCForecast is publicly available at https://github.com/OluwadareLab/HiCForecast.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESCARGOT: an AI agent leveraging large language models, dynamic graph of thoughts, and biomedical knowledge graphs for enhanced reasoning.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf031
Nicholas Matsumoto, Hyunjun Choi, Jay Moran, Miguel E Hernandez, Mythreye Venkatesan, Xi Li, Jui-Hsuan Chang, Paul Wang, Jason H Moore

Motivation: LLMs like GPT-4, despite their advancements, often produce hallucinations and struggle with integrating external knowledge effectively. While Retrieval-Augmented Generation (RAG) attempts to address this by incorporating external information, it faces significant challenges such as context length limitations and imprecise vector similarity search. ESCARGOT aims to overcome these issues by combining LLMs with a dynamic Graph of Thoughts and biomedical knowledge graphs, improving output reliability, and reducing hallucinations.

Result: ESCARGOT significantly outperforms industry-standard RAG methods, particularly in open-ended questions that demand high precision. ESCARGOT also offers greater transparency in its reasoning process, allowing for the vetting of both code and knowledge requests, in contrast to the black-box nature of LLM-only or RAG-based approaches.

Availability and implementation: ESCARGOT is available as a pip package and on GitHub at: https://github.com/EpistasisLab/ESCARGOT.

{"title":"ESCARGOT: an AI agent leveraging large language models, dynamic graph of thoughts, and biomedical knowledge graphs for enhanced reasoning.","authors":"Nicholas Matsumoto, Hyunjun Choi, Jay Moran, Miguel E Hernandez, Mythreye Venkatesan, Xi Li, Jui-Hsuan Chang, Paul Wang, Jason H Moore","doi":"10.1093/bioinformatics/btaf031","DOIUrl":"10.1093/bioinformatics/btaf031","url":null,"abstract":"<p><strong>Motivation: </strong>LLMs like GPT-4, despite their advancements, often produce hallucinations and struggle with integrating external knowledge effectively. While Retrieval-Augmented Generation (RAG) attempts to address this by incorporating external information, it faces significant challenges such as context length limitations and imprecise vector similarity search. ESCARGOT aims to overcome these issues by combining LLMs with a dynamic Graph of Thoughts and biomedical knowledge graphs, improving output reliability, and reducing hallucinations.</p><p><strong>Result: </strong>ESCARGOT significantly outperforms industry-standard RAG methods, particularly in open-ended questions that demand high precision. ESCARGOT also offers greater transparency in its reasoning process, allowing for the vetting of both code and knowledge requests, in contrast to the black-box nature of LLM-only or RAG-based approaches.</p><p><strong>Availability and implementation: </strong>ESCARGOT is available as a pip package and on GitHub at: https://github.com/EpistasisLab/ESCARGOT.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRAmed: a conditional randomization test for high-dimensional mediation analysis in sparse microbiome data.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf038
Tiantian Liu, Xiangnan Xu, Tao Wang, Peirong Xu

Motivation: Numerous microbiome studies have revealed significant associations between the microbiome and human health and disease. These findings have motivated researchers to explore the causal role of the microbiome in human complex traits and diseases. However, the complexities of microbiome data pose challenges for statistical analysis and interpretation of causal effects.

Results: We introduced a novel statistical framework, CRAmed, for inferring the mediating role of the microbiome between treatment and outcome. CRAmed improved the interpretability of the mediation analysis by decomposing the natural indirect effect into two parts, corresponding to the presence-absence and abundance of a microbe, respectively. Comprehensive simulations demonstrated the superior performance of CRAmed in Recall, precision, and F1 score, with a notable level of robustness, compared to existing mediation analysis methods. Furthermore, two real data applications illustrated the effectiveness and interpretability of CRAmed. Our research revealed that CRAmed holds promise for uncovering the mediating role of the microbiome and understanding of the factors influencing host health.

Availability and implementation: The R package CRAmed implementing the proposed methods is available online at https://github.com/liudoubletian/CRAmed.

{"title":"CRAmed: a conditional randomization test for high-dimensional mediation analysis in sparse microbiome data.","authors":"Tiantian Liu, Xiangnan Xu, Tao Wang, Peirong Xu","doi":"10.1093/bioinformatics/btaf038","DOIUrl":"10.1093/bioinformatics/btaf038","url":null,"abstract":"<p><strong>Motivation: </strong>Numerous microbiome studies have revealed significant associations between the microbiome and human health and disease. These findings have motivated researchers to explore the causal role of the microbiome in human complex traits and diseases. However, the complexities of microbiome data pose challenges for statistical analysis and interpretation of causal effects.</p><p><strong>Results: </strong>We introduced a novel statistical framework, CRAmed, for inferring the mediating role of the microbiome between treatment and outcome. CRAmed improved the interpretability of the mediation analysis by decomposing the natural indirect effect into two parts, corresponding to the presence-absence and abundance of a microbe, respectively. Comprehensive simulations demonstrated the superior performance of CRAmed in Recall, precision, and F1 score, with a notable level of robustness, compared to existing mediation analysis methods. Furthermore, two real data applications illustrated the effectiveness and interpretability of CRAmed. Our research revealed that CRAmed holds promise for uncovering the mediating role of the microbiome and understanding of the factors influencing host health.</p><p><strong>Availability and implementation: </strong>The R package CRAmed implementing the proposed methods is available online at https://github.com/liudoubletian/CRAmed.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VirDetector: a bioinformatic pipeline for virus surveillance using nanopore sequencing. 病毒检测器:利用纳米孔测序进行病毒监测的生物信息学管道。
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf029
Nick Laurenz Kaiser, Martin H Groschup, Balal Sadeghi

Summary: Virus surveillance programmes are designed to counter the growing threat of viral outbreaks to human health. Nanopore sequencing, in particular, has proven to be suitable for this purpose, as it is readily available and provides rapid results. However, as special bioinformatic programs are required to extract the relevant information from the sequencing data, applications are needed that allow users without extensive bioinformatics knowledge to carry out the relevant analysis steps. We present VirDetector, a bioinformatic pipeline for virus surveillance using nanopore sequencing. The pipeline automatically installs all required programs and databases and allows all its steps to be executed with a single console command. After preprocessing the samples, including the possibility for basecalling, the pipeline classifies each sample taxonomically and reconstructs the viral consensus genomes, which are then used in phylogenetic analyses. This streamlined workflow provides a user-friendly and efficient solution for monitoring viral pathogens.

Availability and implementation: VirDetector is freely available at https://github.com/NLKaiser/VirDetector and https://zenodo.org/records/14637302 (10.5281/zenodo.14637302).

摘要:病毒监测规划旨在应对病毒暴发对人类健康日益严重的威胁。特别是纳米孔测序,已被证明适合这一目的,因为它容易获得并提供快速的结果。然而,由于需要特殊的生物信息学程序从测序数据中提取相关信息,因此需要允许没有广泛生物信息学知识的用户执行相关分析步骤的应用程序。我们介绍了VirDetector,这是一种利用纳米孔测序进行病毒监测的生物信息学管道。该管道自动安装所有所需的程序和数据库,并允许使用单个控制台命令执行其所有步骤。在对样本进行预处理后,包括对碱基调用的可能性,该管道对每个样本进行分类并重建病毒共识基因组,然后将其用于系统发育分析。这种简化的工作流程为监测病毒病原体提供了一种用户友好且高效的解决方案。可用性和实现:VirDetector免费提供:https://github.com/NLKaiser/VirDetector和https://zenodo.org/records/14637302 (10.5281/zenodo.14637302)。补充信息:补充数据可在生物信息学在线获取。
{"title":"VirDetector: a bioinformatic pipeline for virus surveillance using nanopore sequencing.","authors":"Nick Laurenz Kaiser, Martin H Groschup, Balal Sadeghi","doi":"10.1093/bioinformatics/btaf029","DOIUrl":"10.1093/bioinformatics/btaf029","url":null,"abstract":"<p><strong>Summary: </strong>Virus surveillance programmes are designed to counter the growing threat of viral outbreaks to human health. Nanopore sequencing, in particular, has proven to be suitable for this purpose, as it is readily available and provides rapid results. However, as special bioinformatic programs are required to extract the relevant information from the sequencing data, applications are needed that allow users without extensive bioinformatics knowledge to carry out the relevant analysis steps. We present VirDetector, a bioinformatic pipeline for virus surveillance using nanopore sequencing. The pipeline automatically installs all required programs and databases and allows all its steps to be executed with a single console command. After preprocessing the samples, including the possibility for basecalling, the pipeline classifies each sample taxonomically and reconstructs the viral consensus genomes, which are then used in phylogenetic analyses. This streamlined workflow provides a user-friendly and efficient solution for monitoring viral pathogens.</p><p><strong>Availability and implementation: </strong>VirDetector is freely available at https://github.com/NLKaiser/VirDetector and https://zenodo.org/records/14637302 (10.5281/zenodo.14637302).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11802467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEGA-GO: functions prediction of diverse protein sequence length using Multi-scalE Graph Adaptive neural network.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf032
Yujian Lee, Peng Gao, Yongqi Xu, Ziyang Wang, Shuaicheng Li, Jiaxing Chen

Motivation: The increasing accessibility of large-scale protein sequences through advanced sequencing technologies has necessitated the development of efficient and accurate methods for predicting protein function. Computational prediction models have emerged as a promising solution to expedite the annotation process. However, despite making significant progress in protein research, graph neural networks face challenges in capturing long-range structural correlations and identifying critical residues in protein graphs. Furthermore, existing models have limitations in effectively predicting the function of newly sequenced proteins that are not included in protein interaction networks. This highlights the need for novel approaches integrating protein structure and sequence data.

Results: We introduce Multi-scalE Graph Adaptive neural network (MEGA-GO), highlighting the capability of capturing diverse protein sequence length features from multiple scales. The unique graph adaptive neural network architecture of MEGA-GO enables a more nuanced extraction of graph structure features, effectively capturing intricate relationships within biological data. Experimental results demonstrate that MEGA-GO outperforms mainstream protein function prediction models in the accuracy of Gene Ontology term classification, yielding 33.4%, 68.9%, and 44.6% of area under the precision-recall curve on biological process, molecular function, and cellular component domains, respectively. The rest of the experimental results reveal that our model consistently surpasses the state-of-the-art methods.

Availability and implementation: The source code and data of MEGA-GO are available at https://github.com/Cheliosoops/MEGA-GO.

{"title":"MEGA-GO: functions prediction of diverse protein sequence length using Multi-scalE Graph Adaptive neural network.","authors":"Yujian Lee, Peng Gao, Yongqi Xu, Ziyang Wang, Shuaicheng Li, Jiaxing Chen","doi":"10.1093/bioinformatics/btaf032","DOIUrl":"10.1093/bioinformatics/btaf032","url":null,"abstract":"<p><strong>Motivation: </strong>The increasing accessibility of large-scale protein sequences through advanced sequencing technologies has necessitated the development of efficient and accurate methods for predicting protein function. Computational prediction models have emerged as a promising solution to expedite the annotation process. However, despite making significant progress in protein research, graph neural networks face challenges in capturing long-range structural correlations and identifying critical residues in protein graphs. Furthermore, existing models have limitations in effectively predicting the function of newly sequenced proteins that are not included in protein interaction networks. This highlights the need for novel approaches integrating protein structure and sequence data.</p><p><strong>Results: </strong>We introduce Multi-scalE Graph Adaptive neural network (MEGA-GO), highlighting the capability of capturing diverse protein sequence length features from multiple scales. The unique graph adaptive neural network architecture of MEGA-GO enables a more nuanced extraction of graph structure features, effectively capturing intricate relationships within biological data. Experimental results demonstrate that MEGA-GO outperforms mainstream protein function prediction models in the accuracy of Gene Ontology term classification, yielding 33.4%, 68.9%, and 44.6% of area under the precision-recall curve on biological process, molecular function, and cellular component domains, respectively. The rest of the experimental results reveal that our model consistently surpasses the state-of-the-art methods.</p><p><strong>Availability and implementation: </strong>The source code and data of MEGA-GO are available at https://github.com/Cheliosoops/MEGA-GO.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11810639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multidimensional scaling improves distance-based clustering for microbiome data.
Pub Date : 2025-02-04 DOI: 10.1093/bioinformatics/btaf042
Guanhua Chen, Xinyue Wang, Qiang Sun, Zheng-Zheng Tang

Motivation: Clustering patients into subgroups based on their microbial compositions can greatly enhance our understanding of the role of microbes in human health and disease etiology. Distance-based clustering methods, such as partitioning around medoids (PAM), are popular due to their computational efficiency and absence of distributional assumptions. However, the performance of these methods can be suboptimal when true cluster memberships are driven by differences in the abundance of only a few microbes, a situation known as the sparse signal scenario.

Results: We demonstrate that classical multidimensional scaling (MDS), a widely used dimensionality reduction technique, effectively denoises microbiome data and enhances the clustering performance of distance-based methods. We propose a two-step procedure that first applies MDS to project high-dimensional microbiome data into a low-dimensional space, followed by distance-based clustering using the low-dimensional data. Our extensive simulations demonstrate that our procedure offers superior performance compared to directly conducting distance-based clustering under the sparse signal scenario. The advantage of our procedure is further showcased in several real data applications.

Availability and implementation: The R package MDSMClust is available at https://github.com/wxy929/MDS-project.

{"title":"Multidimensional scaling improves distance-based clustering for microbiome data.","authors":"Guanhua Chen, Xinyue Wang, Qiang Sun, Zheng-Zheng Tang","doi":"10.1093/bioinformatics/btaf042","DOIUrl":"10.1093/bioinformatics/btaf042","url":null,"abstract":"<p><strong>Motivation: </strong>Clustering patients into subgroups based on their microbial compositions can greatly enhance our understanding of the role of microbes in human health and disease etiology. Distance-based clustering methods, such as partitioning around medoids (PAM), are popular due to their computational efficiency and absence of distributional assumptions. However, the performance of these methods can be suboptimal when true cluster memberships are driven by differences in the abundance of only a few microbes, a situation known as the sparse signal scenario.</p><p><strong>Results: </strong>We demonstrate that classical multidimensional scaling (MDS), a widely used dimensionality reduction technique, effectively denoises microbiome data and enhances the clustering performance of distance-based methods. We propose a two-step procedure that first applies MDS to project high-dimensional microbiome data into a low-dimensional space, followed by distance-based clustering using the low-dimensional data. Our extensive simulations demonstrate that our procedure offers superior performance compared to directly conducting distance-based clustering under the sparse signal scenario. The advantage of our procedure is further showcased in several real data applications.</p><p><strong>Availability and implementation: </strong>The R package MDSMClust is available at https://github.com/wxy929/MDS-project.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11814494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1