Pub Date : 2024-11-20DOI: 10.1093/bioinformatics/btae700
Malte Kuehl, Milagros N Wong, Nicola Wanner, Stefan Bonn, Victor G Puelles
Summary: Transcript quantification tools efficiently map bulk RNA sequencing reads to reference transcriptomes. However, their output consists of transcript count estimates that are subject to multiple biases and cannot be readily used with existing differential gene expression analysis tools in Python.Here we present pytximport, a Python implementation of the tximport R package that supports a variety of input formats, different modes of bias correction, inferential replicates, gene-level summarization of transcript counts, transcript-level exports, transcript-to-gene mapping generation and optional filtering of transcripts by biotype. pytximport is part of the scverse ecosystem of open-source Python software packages for omics analyses and includes both a Python as well as a command-line interface.With pytximport, we propose a bulk RNA sequencing analysis workflow based on Bioconda and scverse ecosystem packages, ensuring reproducible analyses through Snakemake rules. We apply this pipeline to a publicly available RNA-sequencing dataset, demonstrating how pytximport enables the creation of Python-centric workflows capable of providing insights into transcriptomic alterations.
Availability: pytximport is licensed under the GNU General Public License version 3. The source code is available at https://github.com/complextissue/pytximport and via Zenodo with DOI: 10.5281/zenodo.13907917. A related Snakemake workflow is available through GitHub at https://github.com/complextissue/snakemake-bulk-rna-seq-workflow and Zenodo with DOI: 10.5281/zenodo.12713811. Documentation and a vignette for new users are available at: https://pytximport.readthedocs.io.
Supplementary information: Supplementary Material is available at Bioinformatics online.
{"title":"Gene count estimation with pytximport enables reproducible analysis of bulk RNA sequencing data in Python.","authors":"Malte Kuehl, Milagros N Wong, Nicola Wanner, Stefan Bonn, Victor G Puelles","doi":"10.1093/bioinformatics/btae700","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae700","url":null,"abstract":"<p><strong>Summary: </strong>Transcript quantification tools efficiently map bulk RNA sequencing reads to reference transcriptomes. However, their output consists of transcript count estimates that are subject to multiple biases and cannot be readily used with existing differential gene expression analysis tools in Python.Here we present pytximport, a Python implementation of the tximport R package that supports a variety of input formats, different modes of bias correction, inferential replicates, gene-level summarization of transcript counts, transcript-level exports, transcript-to-gene mapping generation and optional filtering of transcripts by biotype. pytximport is part of the scverse ecosystem of open-source Python software packages for omics analyses and includes both a Python as well as a command-line interface.With pytximport, we propose a bulk RNA sequencing analysis workflow based on Bioconda and scverse ecosystem packages, ensuring reproducible analyses through Snakemake rules. We apply this pipeline to a publicly available RNA-sequencing dataset, demonstrating how pytximport enables the creation of Python-centric workflows capable of providing insights into transcriptomic alterations.</p><p><strong>Availability: </strong>pytximport is licensed under the GNU General Public License version 3. The source code is available at https://github.com/complextissue/pytximport and via Zenodo with DOI: 10.5281/zenodo.13907917. A related Snakemake workflow is available through GitHub at https://github.com/complextissue/snakemake-bulk-rna-seq-workflow and Zenodo with DOI: 10.5281/zenodo.12713811. Documentation and a vignette for new users are available at: https://pytximport.readthedocs.io.</p><p><strong>Supplementary information: </strong>Supplementary Material is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20DOI: 10.1093/bioinformatics/btae673
Chiyun Lee, Eyyüb S Ünlü, Nina F D White, Jacob Almagro-Garcia, Cristina Ariani, Richard D Pearson
Motivation: Monitoring the genomic evolution of Plasmodium falciparum-the most widespread and deadliest of the human-infecting malaria species-is critical for making decisions in response to changes in drug resistance, diagnostic test failures, and vaccine effectiveness. The MalariaGEN data resources are the world's largest whole genome sequencing databases for Plasmodium parasites. The size and complexity of such data is a barrier to many potential end users in both public health and academic research. A user-friendly method for accessing and exploring data on the genetic variation of P. falciparum would greatly enable efforts in studying and controlling malaria.
Results: We developed Pf-HaploAtlas, a web application enabling exploratory data analysis of genomic variation without requiring advanced technical expertise. The app provides analysis-ready data catalogues and visualisations of amino acid haplotypes for all 5,102 core P. falciparum genes. Pf-HaploAtlas facilitates comprehensive spatial and temporal exploration of genes and variants of interest by using data from 16,203 samples, from 33 countries, and spread between the years 1984 and 2018. The scope of Pf-HaploAtlas will expand with each new MalariaGEN Plasmodium data release.
Availability: Pf-HaploAtlas is available online for public use at https://apps.malariagen.net/pf-haploatlas, which allows users to download the underlying amino acid haplotype data for further analyses, and its source code is freely available on GitHub under the MIT licence at https://github.com/malariagen/pf-haploatlas.
{"title":"Pf-HaploAtlas: an interactive web app for spatiotemporal analysis of P. falciparum genes.","authors":"Chiyun Lee, Eyyüb S Ünlü, Nina F D White, Jacob Almagro-Garcia, Cristina Ariani, Richard D Pearson","doi":"10.1093/bioinformatics/btae673","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae673","url":null,"abstract":"<p><strong>Motivation: </strong>Monitoring the genomic evolution of Plasmodium falciparum-the most widespread and deadliest of the human-infecting malaria species-is critical for making decisions in response to changes in drug resistance, diagnostic test failures, and vaccine effectiveness. The MalariaGEN data resources are the world's largest whole genome sequencing databases for Plasmodium parasites. The size and complexity of such data is a barrier to many potential end users in both public health and academic research. A user-friendly method for accessing and exploring data on the genetic variation of P. falciparum would greatly enable efforts in studying and controlling malaria.</p><p><strong>Results: </strong>We developed Pf-HaploAtlas, a web application enabling exploratory data analysis of genomic variation without requiring advanced technical expertise. The app provides analysis-ready data catalogues and visualisations of amino acid haplotypes for all 5,102 core P. falciparum genes. Pf-HaploAtlas facilitates comprehensive spatial and temporal exploration of genes and variants of interest by using data from 16,203 samples, from 33 countries, and spread between the years 1984 and 2018. The scope of Pf-HaploAtlas will expand with each new MalariaGEN Plasmodium data release.</p><p><strong>Availability: </strong>Pf-HaploAtlas is available online for public use at https://apps.malariagen.net/pf-haploatlas, which allows users to download the underlying amino acid haplotype data for further analyses, and its source code is freely available on GitHub under the MIT licence at https://github.com/malariagen/pf-haploatlas.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1093/bioinformatics/btae684
Gang Wen, Limin Li
Motivation: High-throughput techniques have produced a large amount of high-dimensional multi-omics data, which makes it promising to predict patient survival outcomes more accurately. Recent work has showed the superiority of multi-omics data in survival analysis. However, it remains challenging to integrate multi-omics data to solve few-shot survival prediction problem, with only a few available training samples, especially for rare cancers.
Results: In this work, we propose a meta-learning framework for multi-omics few-shot survival analysis, namely MMOSurv, which enables to learn an effective multi-omics survival prediction model from a very few training samples of a specific cancer type, with the meta-knowledge across tasks from relevant cancer types. By assuming a deep Cox survival model with multiple omics, MMOSurv first learns an adaptable parameter initialization for the multi-omics survival model from abundant data of relevant cancers, and then adapts the parameters quickly and efficiently for the target cancer task with a very few training samples. Our experiments on eleven cancer types in TCGA datasets show that, compared to single-omics meta-learning methods, MMOSurv can better utilize the meta-information of similarities and relationships between different omics data from relevant cancer datasets to improve survival prediction of the target cancer with a very few multi-omics training samples. Furthermore, MMOSurv achieves better prediction performance than other state-of-the-art strategies such as multi-task learning and pre-training.
Availability and implementation: MMOSurv is freely available at https://github.com/LiminLi-xjtu/MMOSurv.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"MMOSurv: Meta-learning for few-shot survival analysis with multi-omics data.","authors":"Gang Wen, Limin Li","doi":"10.1093/bioinformatics/btae684","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae684","url":null,"abstract":"<p><strong>Motivation: </strong>High-throughput techniques have produced a large amount of high-dimensional multi-omics data, which makes it promising to predict patient survival outcomes more accurately. Recent work has showed the superiority of multi-omics data in survival analysis. However, it remains challenging to integrate multi-omics data to solve few-shot survival prediction problem, with only a few available training samples, especially for rare cancers.</p><p><strong>Results: </strong>In this work, we propose a meta-learning framework for multi-omics few-shot survival analysis, namely MMOSurv, which enables to learn an effective multi-omics survival prediction model from a very few training samples of a specific cancer type, with the meta-knowledge across tasks from relevant cancer types. By assuming a deep Cox survival model with multiple omics, MMOSurv first learns an adaptable parameter initialization for the multi-omics survival model from abundant data of relevant cancers, and then adapts the parameters quickly and efficiently for the target cancer task with a very few training samples. Our experiments on eleven cancer types in TCGA datasets show that, compared to single-omics meta-learning methods, MMOSurv can better utilize the meta-information of similarities and relationships between different omics data from relevant cancer datasets to improve survival prediction of the target cancer with a very few multi-omics training samples. Furthermore, MMOSurv achieves better prediction performance than other state-of-the-art strategies such as multi-task learning and pre-training.</p><p><strong>Availability and implementation: </strong>MMOSurv is freely available at https://github.com/LiminLi-xjtu/MMOSurv.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Drug repositioning, identifying novel indications for approved drugs, is a cost-effective strategy in drug discovery. Despite numerous proposed drug repositioning models, integrating network-based features, differential gene expression, and chemical structures for high-performance drug repositioning remains challenging.
Results: We propose a comprehensive deep pre-training and fine-tuning framework for drug repositioning, termed DrugRepPT. Initially, we design a graph pre-training module employing model-augmented contrastive learning on a vast drug-disease heterogeneous graph to capture nuanced interactions and expression perturbations after intervention. Subsequently, we introduce a fine-tuning module leveraging a graph residual-like convolution network to elucidate intricate interactions between diseases and drugs. Moreover, a Bayesian multi-loss approach is introduced to balance the existence and effectiveness of drug treatment effectively. Extensive experiments showcase the efficacy of our framework, with DrugRepPT exhibiting remarkable performance improvements compared to SOTA baseline methods (Improvement 106.13% on Hit@1 and 54.45% on mean reciprocal rank). The reliability of predicted results is further validated through two case studies, ie, gastritis and fatty liver, via literature validation, network medicine analysis, and docking screening.
Availability and implementation: The code and results are available at https://github.com/2020MEAI/DrugRepPT.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"DrugRepPT: a deep pre-training and fine-tuning framework for drug repositioning based on drug's expression perturbation and treatment effectiveness.","authors":"Shuyue Fan, Kuo Yang, Kezhi Lu, Xin Dong, Xianan Li, Qiang Zhu, Shao Li, Jianyang Zeng, Xuezhong Zhou","doi":"10.1093/bioinformatics/btae692","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae692","url":null,"abstract":"<p><strong>Motivation: </strong>Drug repositioning, identifying novel indications for approved drugs, is a cost-effective strategy in drug discovery. Despite numerous proposed drug repositioning models, integrating network-based features, differential gene expression, and chemical structures for high-performance drug repositioning remains challenging.</p><p><strong>Results: </strong>We propose a comprehensive deep pre-training and fine-tuning framework for drug repositioning, termed DrugRepPT. Initially, we design a graph pre-training module employing model-augmented contrastive learning on a vast drug-disease heterogeneous graph to capture nuanced interactions and expression perturbations after intervention. Subsequently, we introduce a fine-tuning module leveraging a graph residual-like convolution network to elucidate intricate interactions between diseases and drugs. Moreover, a Bayesian multi-loss approach is introduced to balance the existence and effectiveness of drug treatment effectively. Extensive experiments showcase the efficacy of our framework, with DrugRepPT exhibiting remarkable performance improvements compared to SOTA baseline methods (Improvement 106.13% on Hit@1 and 54.45% on mean reciprocal rank). The reliability of predicted results is further validated through two case studies, ie, gastritis and fatty liver, via literature validation, network medicine analysis, and docking screening.</p><p><strong>Availability and implementation: </strong>The code and results are available at https://github.com/2020MEAI/DrugRepPT.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1093/bioinformatics/btae697
Alessandro Lussana, Sophia Müller-Dott, Julio Saez-Rodriguez, Evangelia Petsalaki
Summary: The inference of kinase activity from phosphoproteomics data can point to causal mechanisms driving signalling processes and potential drug targets. Identifying the kinases whose change in activity explains the observed phosphorylation profiles, however, remains challenging, and constrained by the manually curated knowledge of kinase-substrate associations. Recently, experimentally determined substrate sequence specificities of human kinases have become available, but robust methods to exploit this new data for kinase activity inference are still missing. We present PhosX, a method to estimate differential kinase activity from phosphoproteomics data that combines state-of-the art statistics in enrichment analysis with kinases' substrate sequence specificity information. Using a large phosphoproteomics dataset with known differentially regulated kinases we show that our method identifies upregulated and downregulated kinases by only relying on the input phosphopeptides' sequences and intensity changes. We find that PhosX outperforms the currently available approach for the same task, and performs better or similarly to state-of-the-art methods that rely on previously known kinase-substrate associations. We therefore recommend its use for data-driven kinase activity inference.
Availability and implementation: PhosX is implemented in Python, open-source under the Apache-2.0 licence, and distributed on the Python Package Index. The code is available on GitHub (https://github.com/alussana/phosx).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"PhosX: data-driven kinase activity inference from phosphoproteomics experiments.","authors":"Alessandro Lussana, Sophia Müller-Dott, Julio Saez-Rodriguez, Evangelia Petsalaki","doi":"10.1093/bioinformatics/btae697","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae697","url":null,"abstract":"<p><strong>Summary: </strong>The inference of kinase activity from phosphoproteomics data can point to causal mechanisms driving signalling processes and potential drug targets. Identifying the kinases whose change in activity explains the observed phosphorylation profiles, however, remains challenging, and constrained by the manually curated knowledge of kinase-substrate associations. Recently, experimentally determined substrate sequence specificities of human kinases have become available, but robust methods to exploit this new data for kinase activity inference are still missing. We present PhosX, a method to estimate differential kinase activity from phosphoproteomics data that combines state-of-the art statistics in enrichment analysis with kinases' substrate sequence specificity information. Using a large phosphoproteomics dataset with known differentially regulated kinases we show that our method identifies upregulated and downregulated kinases by only relying on the input phosphopeptides' sequences and intensity changes. We find that PhosX outperforms the currently available approach for the same task, and performs better or similarly to state-of-the-art methods that rely on previously known kinase-substrate associations. We therefore recommend its use for data-driven kinase activity inference.</p><p><strong>Availability and implementation: </strong>PhosX is implemented in Python, open-source under the Apache-2.0 licence, and distributed on the Python Package Index. The code is available on GitHub (https://github.com/alussana/phosx).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142677173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1093/bioinformatics/btae696
Jian Zou, Zheqi Li, Neil Carleton, Steffi Oesterreich, Adrian V Lee, George C Tseng
Motivation: Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g., cases versus controls) and are not directly applicable for studies with multi-class design (e.g., samples from multiple disease subtypes, treatments, tissues, or cell types).
Results: We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing MCC method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.
Availability: The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies.","authors":"Jian Zou, Zheqi Li, Neil Carleton, Steffi Oesterreich, Adrian V Lee, George C Tseng","doi":"10.1093/bioinformatics/btae696","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae696","url":null,"abstract":"<p><strong>Motivation: </strong>Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g., cases versus controls) and are not directly applicable for studies with multi-class design (e.g., samples from multiple disease subtypes, treatments, tissues, or cell types).</p><p><strong>Results: </strong>We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing MCC method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.</p><p><strong>Availability: </strong>The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1093/bioinformatics/btae667
Ruoqian Liu, Yue Wang, Dan Cheng
Motivation: Extensive research has uncovered the critical role of the human gut microbiome in various aspects of health, including metabolism, nutrition, physiology, and immune function. Fecal microbiota is often used as a proxy for understanding the gut microbiome, but it represents an aggregate view, overlooking spatial variations across different gastrointestinal (GI) locations. Emerging studies with spatial microbiome data collected from specific GI regions offer a unique opportunity to better understand the spatial composition of the stool microbiome.
Results: We introduce Micro-DeMix, a mixture beta-multinomial model that deconvolutes the fecal microbiome at the compositional level by integrating stool samples with spatial microbiome data. Micro-DeMix facilitates the comparison of microbial compositions across different GI regions within the stool microbiome through a hypothesis-testing framework. We demonstrate the effectiveness and efficiency of Micro-DeMix using multiple simulated data sets and the Inflammatory Bowel Disease (IBD) data from the NIH Integrative Human Microbiome Project.
Availability and implementation: The R package is available at https://github.com/liuruoqian/MicroDemix.
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"Micro-DeMix: a mixture beta-multinomial model for investigating the heterogeneity of the stool microbiome compositions.","authors":"Ruoqian Liu, Yue Wang, Dan Cheng","doi":"10.1093/bioinformatics/btae667","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae667","url":null,"abstract":"<p><strong>Motivation: </strong>Extensive research has uncovered the critical role of the human gut microbiome in various aspects of health, including metabolism, nutrition, physiology, and immune function. Fecal microbiota is often used as a proxy for understanding the gut microbiome, but it represents an aggregate view, overlooking spatial variations across different gastrointestinal (GI) locations. Emerging studies with spatial microbiome data collected from specific GI regions offer a unique opportunity to better understand the spatial composition of the stool microbiome.</p><p><strong>Results: </strong>We introduce Micro-DeMix, a mixture beta-multinomial model that deconvolutes the fecal microbiome at the compositional level by integrating stool samples with spatial microbiome data. Micro-DeMix facilitates the comparison of microbial compositions across different GI regions within the stool microbiome through a hypothesis-testing framework. We demonstrate the effectiveness and efficiency of Micro-DeMix using multiple simulated data sets and the Inflammatory Bowel Disease (IBD) data from the NIH Integrative Human Microbiome Project.</p><p><strong>Availability and implementation: </strong>The R package is available at https://github.com/liuruoqian/MicroDemix.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1093/bioinformatics/btae688
Zhengxiang Jiang, Pengyong Li
Summary: Accurate drug response prediction is critical to advancing precision medicine and drug discovery. Recent advances in deep learning (DL) have shown promise in predicting drug response; however, the lack of convenient tools to support such modeling limits their widespread application. To address this, we introduce DeepDR, the first DL library specifically developed for drug response prediction. DeepDR simplifies the process by automating drug and cell featurization, model construction, training, and inference, all achievable with brief programming. The library incorporates three types of drug features along with nine drug encoders, four types of cell features along with nine cell encoders, and two fusion modules, enabling the implementation of up to 135 DL models for drug response prediction. We also explored benchmarking performance with DeepDR, and the optimal models are available on a user-friendly visual interface.
Availability and implementation: DeepDR can be installed from PyPI (https://pypi.org/project/deepdr). The source code and experimental data are available on GitHub (https://github.com/user15632/DeepDR).
Supplementary information: Supplementary data are available at Bioinformatics online.
{"title":"DeepDR: a deep learning library for drug response prediction.","authors":"Zhengxiang Jiang, Pengyong Li","doi":"10.1093/bioinformatics/btae688","DOIUrl":"10.1093/bioinformatics/btae688","url":null,"abstract":"<p><strong>Summary: </strong>Accurate drug response prediction is critical to advancing precision medicine and drug discovery. Recent advances in deep learning (DL) have shown promise in predicting drug response; however, the lack of convenient tools to support such modeling limits their widespread application. To address this, we introduce DeepDR, the first DL library specifically developed for drug response prediction. DeepDR simplifies the process by automating drug and cell featurization, model construction, training, and inference, all achievable with brief programming. The library incorporates three types of drug features along with nine drug encoders, four types of cell features along with nine cell encoders, and two fusion modules, enabling the implementation of up to 135 DL models for drug response prediction. We also explored benchmarking performance with DeepDR, and the optimal models are available on a user-friendly visual interface.</p><p><strong>Availability and implementation: </strong>DeepDR can be installed from PyPI (https://pypi.org/project/deepdr). The source code and experimental data are available on GitHub (https://github.com/user15632/DeepDR).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1093/bioinformatics/btae686
Kristine Bilgrav Saether, Jesper Eisfeldt
Motivation: Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection.
Results: We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis.
Availability: sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.
Supplementary information: Supplementary data are available at Bioinformatics online.
动机可转座元件(TE)等重复元件是高度重复的 DNA 序列,约占基因组的 50%。Alu、SVA、HERV 和 L1 等可转座元件可通过破坏基因、导致换框突变或改变剪接模式而致病。由于短读数基因组测序(srGS)的读数长度和TEs的重复性,这些元素的特征描述具有挑战性。因此,长读数基因组测序(lrGS)为改进 TE 检测和分析提供了机会,不仅从研究角度来看是如此,在未来的临床检测中也是如此。在选择 lrGS TE 调用器时,运行时间、CPU 小时数、灵敏度、精确度以及与纳入管道的兼容性等参数对于高效检测至关重要:因此,我们开发了 sTELLeR(s) Transposable ELement in Long (e) Read,用于准确、快速、有效地检测 TE。特别是,与同类工具相比,sTELLeR 在调用 Alu 元素方面表现出更高的精度和灵敏度。调用速度是同类工具的5-48倍,可用性:sTELLeR是一个基于python的工具,可在https://github.com/kristinebilgrav/sTELLeR。总之,我们证明了 sTELLeR 是一种快速、灵敏、精确的 TE 元素检测调用工具,可以很容易地应用到变异调用工作流中:补充数据可在 Bioinformatics online 上获取。
{"title":"Detecting transposable elements in long read genomes using sTELLeR.","authors":"Kristine Bilgrav Saether, Jesper Eisfeldt","doi":"10.1093/bioinformatics/btae686","DOIUrl":"10.1093/bioinformatics/btae686","url":null,"abstract":"<p><strong>Motivation: </strong>Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection.</p><p><strong>Results: </strong>We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis.</p><p><strong>Availability: </strong>sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1093/bioinformatics/btae689
Bruno Sousa, Maria Bessa, Filipa L de Mendonça, Pedro G Ferreira, Alexandra Moreira, Isabel Pereira-Castro
Summary: APAtizer is a tool designed to analyze alternative polyadenylation events on RNA-sequencing data. The tool handles different file formats, including BAM, htseq and DaPars bedGraph files. It provides a user-friendly interface that allows users to generate informative visualizations, including Volcano plots, heatmaps and gene lists. These outputs allow the user to retrieve useful biological insights such as the occurrence of polyadenylation events when comparing two biological conditions. Additionally, it can perform differential gene expression, gene ontology analysis, visualization of Venn diagram intersections and correlation analysis.
Availability and implementation: Source code and example case studies for APAtizer are available at https://github.com/GeneRegulationi3S/APAtizer/.
{"title":"APAtizer: a tool for alternative polyadenylation analysis of RNA-Seq data.","authors":"Bruno Sousa, Maria Bessa, Filipa L de Mendonça, Pedro G Ferreira, Alexandra Moreira, Isabel Pereira-Castro","doi":"10.1093/bioinformatics/btae689","DOIUrl":"10.1093/bioinformatics/btae689","url":null,"abstract":"<p><strong>Summary: </strong>APAtizer is a tool designed to analyze alternative polyadenylation events on RNA-sequencing data. The tool handles different file formats, including BAM, htseq and DaPars bedGraph files. It provides a user-friendly interface that allows users to generate informative visualizations, including Volcano plots, heatmaps and gene lists. These outputs allow the user to retrieve useful biological insights such as the occurrence of polyadenylation events when comparing two biological conditions. Additionally, it can perform differential gene expression, gene ontology analysis, visualization of Venn diagram intersections and correlation analysis.</p><p><strong>Availability and implementation: </strong>Source code and example case studies for APAtizer are available at https://github.com/GeneRegulationi3S/APAtizer/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}