Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae639
Sumyyah Toonsi, Iris Ivy Gauran, Hernando Ombao, Paul N Schofield, Robert Hoehndorf
Motivation: Identifying causal relations between diseases allows for the study of shared pathways, biological mechanisms, and inter-disease risks. Such causal relations can facilitate the identification of potential disease precursors and candidates for drug re-purposing. However, computational methods often lack access to these causal relations. Few approaches have been developed to automatically extract causal relationships between diseases from unstructured text, but they are often only focused on a small number of diseases, lack validation of the extracted causal relations, or do not make their data available.
Results: We automatically mined statements asserting a causal relation between diseases from the scientific literature by leveraging lexical patterns. Following automated mining of causal relations, we mapped the diseases to the International Classification of Diseases (ICD) identifiers to allow the direct application to clinical data. We provide quantitative and qualitative measures to evaluate the mined causal relations and compare to UK Biobank diagnosis data as a completely independent data source. The validated causal associations were used to create a directed acyclic graph that can be used by causal inference frameworks. We demonstrate the utility of our causal network by performing causal inference using the do-calculus, using relations within the graph to construct and improve polygenic risk scores, and disentangle the pleiotropic effects of variants.
Availability and implementation: The data are available through https://github.com/bio-ontology-research-group/causal-relations-between-diseases.
{"title":"Causal relationships between diseases mined from the literature improve the use of polygenic risk scores.","authors":"Sumyyah Toonsi, Iris Ivy Gauran, Hernando Ombao, Paul N Schofield, Robert Hoehndorf","doi":"10.1093/bioinformatics/btae639","DOIUrl":"10.1093/bioinformatics/btae639","url":null,"abstract":"<p><strong>Motivation: </strong>Identifying causal relations between diseases allows for the study of shared pathways, biological mechanisms, and inter-disease risks. Such causal relations can facilitate the identification of potential disease precursors and candidates for drug re-purposing. However, computational methods often lack access to these causal relations. Few approaches have been developed to automatically extract causal relationships between diseases from unstructured text, but they are often only focused on a small number of diseases, lack validation of the extracted causal relations, or do not make their data available.</p><p><strong>Results: </strong>We automatically mined statements asserting a causal relation between diseases from the scientific literature by leveraging lexical patterns. Following automated mining of causal relations, we mapped the diseases to the International Classification of Diseases (ICD) identifiers to allow the direct application to clinical data. We provide quantitative and qualitative measures to evaluate the mined causal relations and compare to UK Biobank diagnosis data as a completely independent data source. The validated causal associations were used to create a directed acyclic graph that can be used by causal inference frameworks. We demonstrate the utility of our causal network by performing causal inference using the do-calculus, using relations within the graph to construct and improve polygenic risk scores, and disentangle the pleiotropic effects of variants.</p><p><strong>Availability and implementation: </strong>The data are available through https://github.com/bio-ontology-research-group/causal-relations-between-diseases.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae645
Yuan Gao, Rob Patro, Peng Jiang
Motivation: A crucial component of intuitive data visualization is presenting a hierarchical tree structure with interactive functions. For example, single-cell transcriptomics studies may generate gene expression values with developmental trajectories or cell lineage structures. Two common visualization methods, t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), require two separate figures to depict the distribution of cell types and gene expression data, with low-dimension projections that may not capture the hierarchical structures among cells.
Results: Here, we present a JavaScript framework and an interactive web app named Collapsible Tree, which presents values jointly with interactive, expandable, and collapsible lineage structures. For example, the Collapsible Tree presents cellular states and gene expression from single-cell transcriptomics within a single hierarchical plot, enabling comparisons of gene expression across lineages and subtle patterns between sub-lineages. Our framework can facilitate the exploration of complicated value patterns that are not evident in UMAP or t-SNE plots.
Availability and implementation: The Collapsible Tree web interface is available at https://collapsibletree.data2in.net. The JavaScript library source code is available at https://github.com/data2intelligence/collapsible_tree.
{"title":"Collapsible tree: interactive web app to present collapsible hierarchies.","authors":"Yuan Gao, Rob Patro, Peng Jiang","doi":"10.1093/bioinformatics/btae645","DOIUrl":"10.1093/bioinformatics/btae645","url":null,"abstract":"<p><strong>Motivation: </strong>A crucial component of intuitive data visualization is presenting a hierarchical tree structure with interactive functions. For example, single-cell transcriptomics studies may generate gene expression values with developmental trajectories or cell lineage structures. Two common visualization methods, t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), require two separate figures to depict the distribution of cell types and gene expression data, with low-dimension projections that may not capture the hierarchical structures among cells.</p><p><strong>Results: </strong>Here, we present a JavaScript framework and an interactive web app named Collapsible Tree, which presents values jointly with interactive, expandable, and collapsible lineage structures. For example, the Collapsible Tree presents cellular states and gene expression from single-cell transcriptomics within a single hierarchical plot, enabling comparisons of gene expression across lineages and subtle patterns between sub-lineages. Our framework can facilitate the exploration of complicated value patterns that are not evident in UMAP or t-SNE plots.</p><p><strong>Availability and implementation: </strong>The Collapsible Tree web interface is available at https://collapsibletree.data2in.net. The JavaScript library source code is available at https://github.com/data2intelligence/collapsible_tree.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae655
Quancheng Liu, Chengxin Zhang, Lydia Freddolino
Motivation: Accurate protein function prediction is crucial for understanding biological processes and advancing biomedical research. However, the rapid growth of protein sequences far outpaces the experimental characterization of their functions, necessitating the development of automated computational methods.
Results: We present InterLabelGO+, a hybrid approach that integrates a deep learning-based method with an alignment-based method for improved protein function prediction. InterLabelGO+ incorporates a novel loss function that addresses label dependency and imbalance and further enhances performance through dynamic weighting of the alignment-based component. A preliminary version of InterLabelGO+ achieved a strong performance in the CAFA5 challenge, ranking sixth out of 1625 participating teams. Comprehensive evaluations on large-scale protein function prediction tasks demonstrate InterLabelGO+'s ability to accurately predict Gene Ontology terms across various functional categories and evaluation metrics.
Availability and implementation: The source code and datasets for InterLabelGO+ are freely available on GitHub at https://github.com/QuanEvans/InterLabelGO. A web-server is available at https://seq2fun.dcmb.med.umich.edu/InterLabelGO/. The software is implemented in Python and PyTorch, and is supported on Linux and macOS.
{"title":"InterLabelGO+: unraveling label correlations in protein function prediction.","authors":"Quancheng Liu, Chengxin Zhang, Lydia Freddolino","doi":"10.1093/bioinformatics/btae655","DOIUrl":"10.1093/bioinformatics/btae655","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate protein function prediction is crucial for understanding biological processes and advancing biomedical research. However, the rapid growth of protein sequences far outpaces the experimental characterization of their functions, necessitating the development of automated computational methods.</p><p><strong>Results: </strong>We present InterLabelGO+, a hybrid approach that integrates a deep learning-based method with an alignment-based method for improved protein function prediction. InterLabelGO+ incorporates a novel loss function that addresses label dependency and imbalance and further enhances performance through dynamic weighting of the alignment-based component. A preliminary version of InterLabelGO+ achieved a strong performance in the CAFA5 challenge, ranking sixth out of 1625 participating teams. Comprehensive evaluations on large-scale protein function prediction tasks demonstrate InterLabelGO+'s ability to accurately predict Gene Ontology terms across various functional categories and evaluation metrics.</p><p><strong>Availability and implementation: </strong>The source code and datasets for InterLabelGO+ are freely available on GitHub at https://github.com/QuanEvans/InterLabelGO. A web-server is available at https://seq2fun.dcmb.med.umich.edu/InterLabelGO/. The software is implemented in Python and PyTorch, and is supported on Linux and macOS.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae622
Michael K B Ford, Ananth Hari, Qinghui Zhou, Ibrahim Numanagić, S Cenk Sahinalp
Summary: Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This article introduces BAKIR (Biologically informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.
Availability and implementation: BAKIR is available at github.com/algo-cancer/bakir.
摘要:自然杀伤(NK)细胞是先天性免疫系统的重要组成部分,其活性受杀伤细胞免疫球蛋白样受体(KIR)的重要调节。KIR 基因的多样性和结构复杂性给准确的基因分型带来了巨大挑战,而准确的基因分型对于了解 NK 细胞的功能及其对健康和疾病的影响至关重要。传统的基因分型方法难以应对 KIR 基因的多变性,从而导致不准确性,阻碍了免疫遗传学的研究。这些挑战延伸到了高质量的分阶段组装,最近人类泛基因组联盟(Human Pangenome Consortium)推广了这种组装方法。本文介绍了 BAKIR(Biologically-informed Annotator for KIR locus),这是一种量身定制的计算工具,旨在克服在高质量分阶段基因组组装上进行 KIR 基因分型和注释所面临的挑战。BAKIR 的目标是通过围绕识别关键功能突变来构建其注释管道,从而提高 KIR 基因注释的准确性,从而改善基因和等位基因调用的识别和后续相关性。它采用多阶段映射、比对和变异调用过程,确保高精度的基因和等位基因鉴定,同时还能对相对于已知等位基因数据库有明显突变或截断的序列保持较高的召回率。BAKIR 已在 HPRC 集合的一个子集上进行了评估,BAKIR 能够改进许多相关注释并调用新的变异。BAKIR 可在 GitHub 上免费获取,通过多种安装方法(包括 pip、conda 和 singularity container)轻松访问和使用,并配备了用户友好的命令行界面,从而促进了其在科学界的应用:BAKIR 可在 github.com/algo-cancer/bakir 上获取:补充数据可在 Bioinformatics online 上获取。
{"title":"Biologically-informed killer cell immunoglobulin-like receptor gene annotation tool.","authors":"Michael K B Ford, Ananth Hari, Qinghui Zhou, Ibrahim Numanagić, S Cenk Sahinalp","doi":"10.1093/bioinformatics/btae622","DOIUrl":"10.1093/bioinformatics/btae622","url":null,"abstract":"<p><strong>Summary: </strong>Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This article introduces BAKIR (Biologically informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.</p><p><strong>Availability and implementation: </strong>BAKIR is available at github.com/algo-cancer/bakir.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae606
Tingting Han, Jun Wu, Pengpeng Sheng, Yuanyuan Li, ZaiYang Tao, Lei Qu
Motivation: Recent brain mapping efforts are producing large-scale whole-brain images using different imaging modalities. Accurate alignment and delineation of anatomical structures in these images are essential for numerous studies. These requirements are typically modeled as two distinct tasks: registration and segmentation. However, prevailing methods, fail to fully explore and utilize the inherent correlation and complementarity between the two tasks. Furthermore, variations in brain anatomy, brightness, and texture pose another formidable challenge in designing multi-modal similarity metrics. A high-throughput approach capable of overcoming the bottleneck of multi-modal similarity metric design, while effective leveraging the highly correlated and complementary nature of two tasks is highly desirable.
Results: We introduce a deep learning framework for joint registration and segmentation of multi-modal brain images. Under this framework, registration and segmentation tasks are deeply coupled and collaborated at two hierarchical layers. In the inner layer, we establish a strong feature-level coupling between the two tasks by learning a unified common latent feature representation. In the outer layer, we introduce a mutually supervised dual-branch network to decouple latent features and facilitate task-level collaboration between registration and segmentation. Since the latent features we designed are also modality-independent, the bottleneck of designing multi-modal similarity metric is essentially addressed. Another merit offered by this framework is the interpretability of latent features, which allows intuitive manipulation of feature learning, thereby further enhancing network training efficiency and the performance of both tasks. Extensive experiments conducted on both multi-modal and mono-modal datasets of mouse and human brains demonstrate the superiority of our method.
Availability and implementation: The code is available at https://github.com/tingtingup/DCRS.
{"title":"Deep coupled registration and segmentation of multimodal whole-brain images.","authors":"Tingting Han, Jun Wu, Pengpeng Sheng, Yuanyuan Li, ZaiYang Tao, Lei Qu","doi":"10.1093/bioinformatics/btae606","DOIUrl":"10.1093/bioinformatics/btae606","url":null,"abstract":"<p><strong>Motivation: </strong>Recent brain mapping efforts are producing large-scale whole-brain images using different imaging modalities. Accurate alignment and delineation of anatomical structures in these images are essential for numerous studies. These requirements are typically modeled as two distinct tasks: registration and segmentation. However, prevailing methods, fail to fully explore and utilize the inherent correlation and complementarity between the two tasks. Furthermore, variations in brain anatomy, brightness, and texture pose another formidable challenge in designing multi-modal similarity metrics. A high-throughput approach capable of overcoming the bottleneck of multi-modal similarity metric design, while effective leveraging the highly correlated and complementary nature of two tasks is highly desirable.</p><p><strong>Results: </strong>We introduce a deep learning framework for joint registration and segmentation of multi-modal brain images. Under this framework, registration and segmentation tasks are deeply coupled and collaborated at two hierarchical layers. In the inner layer, we establish a strong feature-level coupling between the two tasks by learning a unified common latent feature representation. In the outer layer, we introduce a mutually supervised dual-branch network to decouple latent features and facilitate task-level collaboration between registration and segmentation. Since the latent features we designed are also modality-independent, the bottleneck of designing multi-modal similarity metric is essentially addressed. Another merit offered by this framework is the interpretability of latent features, which allows intuitive manipulation of feature learning, thereby further enhancing network training efficiency and the performance of both tasks. Extensive experiments conducted on both multi-modal and mono-modal datasets of mouse and human brains demonstrate the superiority of our method.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/tingtingup/DCRS.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Recent advances in long-read sequencing technologies have significantly facilitated the production of high-quality genome assembly. The telomere-to-telomere (T2T) gapless assembly has become the new golden standard of genome assembly efforts. Several recent efforts have claimed to produce T2T-level reference genomes. However, a universal standard is still missing to qualify a genome assembly to be at T2T standard. Traditional genome assembly assessment metrics (N50 and its derivatives) have no capacity in differentiating between nearly T2T assembly and the truly T2T assembly in continuity either globally or locally. Additionally, these metrics are independent of raw reads, making them inflated easily by artificial operations. Therefore, a gaplessness evaluation tool at single-nucleotide resolution to reflect true completeness is urgently needed in the era of complete genomes.
Results: Here, we present a tool called Genome Continuity Inspector (GCI), designed to assess genome assembly continuity at single-base resolution, and evaluate how close an assembly is to the T2T level. GCI utilizes multiple aligners to map long reads from various sequencing platforms back to the assembly. By incorporating curated mapping coverage of high-confidence read alignments, GCI identifies potential assembly issues. Meanwhile, it provides GCI scores that quantify overall assembly continuity on the whole genome or chromosome scales.
Availability and implementation: The open-source GCI code is freely available on Github (https://github.com/yeeus/GCI) under the MIT license.
{"title":"GCI: a continuity inspector for complete genome assembly.","authors":"Quanyu Chen, Chentao Yang, Guojie Zhang, Dongya Wu","doi":"10.1093/bioinformatics/btae633","DOIUrl":"10.1093/bioinformatics/btae633","url":null,"abstract":"<p><strong>Motivation: </strong>Recent advances in long-read sequencing technologies have significantly facilitated the production of high-quality genome assembly. The telomere-to-telomere (T2T) gapless assembly has become the new golden standard of genome assembly efforts. Several recent efforts have claimed to produce T2T-level reference genomes. However, a universal standard is still missing to qualify a genome assembly to be at T2T standard. Traditional genome assembly assessment metrics (N50 and its derivatives) have no capacity in differentiating between nearly T2T assembly and the truly T2T assembly in continuity either globally or locally. Additionally, these metrics are independent of raw reads, making them inflated easily by artificial operations. Therefore, a gaplessness evaluation tool at single-nucleotide resolution to reflect true completeness is urgently needed in the era of complete genomes.</p><p><strong>Results: </strong>Here, we present a tool called Genome Continuity Inspector (GCI), designed to assess genome assembly continuity at single-base resolution, and evaluate how close an assembly is to the T2T level. GCI utilizes multiple aligners to map long reads from various sequencing platforms back to the assembly. By incorporating curated mapping coverage of high-confidence read alignments, GCI identifies potential assembly issues. Meanwhile, it provides GCI scores that quantify overall assembly continuity on the whole genome or chromosome scales.</p><p><strong>Availability and implementation: </strong>The open-source GCI code is freely available on Github (https://github.com/yeeus/GCI) under the MIT license.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550331/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae666
Wenmin Zhang, Chen-Yang Su, Satoshi Yoshiji, Tianyuan Lu
Summary: Mendelian randomization is being utilized to assess causal effects of polygenic exposures, where many genetic instruments are subject to horizontal pleiotropy. Existing methods for detecting and correcting for horizontal pleiotropy have important assumptions that may not be fulfilled. Built upon the core gene hypothesis, we developed MR Corge for performing sensitivity analysis of Mendelian randomization. MR Corge identifies a small number of putative core instruments that are more likely to affect genes with a direct biological role in an exposure and obtains causal effect estimates based on these instruments, thereby reducing the risk of horizontal pleiotropy. Using positive and negative controls, we demonstrated that MR Corge estimates aligned with established biomedical knowledge and the results of randomized controlled trials. MR Corge may be widely applied to investigate polygenic exposure-outcome relationships.
Availability and implementation: An open-sourced R package is available at https://github.com/zhwm/MRCorge.
{"title":"MR Corge: sensitivity analysis of Mendelian randomization based on the core gene hypothesis for polygenic exposures.","authors":"Wenmin Zhang, Chen-Yang Su, Satoshi Yoshiji, Tianyuan Lu","doi":"10.1093/bioinformatics/btae666","DOIUrl":"10.1093/bioinformatics/btae666","url":null,"abstract":"<p><strong>Summary: </strong>Mendelian randomization is being utilized to assess causal effects of polygenic exposures, where many genetic instruments are subject to horizontal pleiotropy. Existing methods for detecting and correcting for horizontal pleiotropy have important assumptions that may not be fulfilled. Built upon the core gene hypothesis, we developed MR Corge for performing sensitivity analysis of Mendelian randomization. MR Corge identifies a small number of putative core instruments that are more likely to affect genes with a direct biological role in an exposure and obtains causal effect estimates based on these instruments, thereby reducing the risk of horizontal pleiotropy. Using positive and negative controls, we demonstrated that MR Corge estimates aligned with established biomedical knowledge and the results of randomized controlled trials. MR Corge may be widely applied to investigate polygenic exposure-outcome relationships.</p><p><strong>Availability and implementation: </strong>An open-sourced R package is available at https://github.com/zhwm/MRCorge.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11578597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142607298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae673
Chiyun Lee, Eyyüb S Ünlü, Nina F D White, Jacob Almagro-Garcia, Cristina Ariani, Richard D Pearson
Motivation: Monitoring the genomic evolution of Plasmodium falciparum-the most widespread and deadliest of the human-infecting malaria species-is critical for making decisions in response to changes in drug resistance, diagnostic test failures, and vaccine effectiveness. The MalariaGEN data resources are the world's largest whole genome sequencing databases for Plasmodium parasites. The size and complexity of such data is a barrier to many potential end users in both public health and academic research. A user-friendly method for accessing and exploring data on the genetic variation of P. falciparum would greatly enable efforts in studying and controlling malaria.
Results: We developed Pf-HaploAtlas, a web application enabling exploratory data analysis of genomic variation without requiring advanced technical expertise. The app provides analysis-ready data catalogues and visualizations of amino acid haplotypes for all 5102 core P. falciparum genes. Pf-HaploAtlas facilitates comprehensive spatial and temporal exploration of genes and variants of interest by using data from 16 203 samples, from 33 countries, and spread between the years 1984 and 2018. The scope of Pf-HaploAtlas will expand with each new MalariaGEN Plasmodium data release.
Availability and implementation: Pf-HaploAtlas is available online for public use at https://apps.malariagen.net/pf-haploatlas, which allows users to download the underlying amino acid haplotype data for further analyses, and its source code is freely available on GitHub under the MIT licence at https://github.com/malariagen/pf-haploatlas.
{"title":"Pf-HaploAtlas: an interactive web app for spatiotemporal analysis of Plasmodium falciparum genes.","authors":"Chiyun Lee, Eyyüb S Ünlü, Nina F D White, Jacob Almagro-Garcia, Cristina Ariani, Richard D Pearson","doi":"10.1093/bioinformatics/btae673","DOIUrl":"10.1093/bioinformatics/btae673","url":null,"abstract":"<p><strong>Motivation: </strong>Monitoring the genomic evolution of Plasmodium falciparum-the most widespread and deadliest of the human-infecting malaria species-is critical for making decisions in response to changes in drug resistance, diagnostic test failures, and vaccine effectiveness. The MalariaGEN data resources are the world's largest whole genome sequencing databases for Plasmodium parasites. The size and complexity of such data is a barrier to many potential end users in both public health and academic research. A user-friendly method for accessing and exploring data on the genetic variation of P. falciparum would greatly enable efforts in studying and controlling malaria.</p><p><strong>Results: </strong>We developed Pf-HaploAtlas, a web application enabling exploratory data analysis of genomic variation without requiring advanced technical expertise. The app provides analysis-ready data catalogues and visualizations of amino acid haplotypes for all 5102 core P. falciparum genes. Pf-HaploAtlas facilitates comprehensive spatial and temporal exploration of genes and variants of interest by using data from 16 203 samples, from 33 countries, and spread between the years 1984 and 2018. The scope of Pf-HaploAtlas will expand with each new MalariaGEN Plasmodium data release.</p><p><strong>Availability and implementation: </strong>Pf-HaploAtlas is available online for public use at https://apps.malariagen.net/pf-haploatlas, which allows users to download the underlying amino acid haplotype data for further analyses, and its source code is freely available on GitHub under the MIT licence at https://github.com/malariagen/pf-haploatlas.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae629
Guillaume Marçais, C S Elder, Carl Kingsford
Motivation: Sequences equivalent to their reverse complements (i.e. double-stranded DNA) have no analogue in text analysis and non-biological string algorithms. Despite this striking difference, algorithms designed for computational biology (e.g. sketching algorithms) are designed and tested in the same way as classical string algorithms. Then, as a post-processing step, these algorithms are adapted to work with genomic sequences by folding a k-mer and its reverse complement into a single sequence: The canonical representation (k-nonical space).
Results: The effect of using the canonical representation with sketching methods is understudied and not understood. As a first step, we use context-free sketching methods to illustrate the potentially detrimental effects of using canonical k-mers with string algorithms not designed to accommodate for them. In particular, we show that large stretches of the genome ("sketching deserts") are undersampled or entirely skipped by context-free sketching methods, effectively making these genomic regions invisible to subsequent algorithms using these sketches. We provide empirical data showing these effects and develop a theoretical framework explaining the appearance of sketching deserts. Finally, we propose two schemes to accommodate for these effects: (i) a new procedure that adapts existing sketching methods to k-nonical space and (ii) an optimization procedure to directly design new sketching methods for k-nonical space.
Availability and implementation: The code used in this analysis is available under a permissive license at https://github.com/Kingsford-Group/mdsscope.
{"title":"k-nonical space: sketching with reverse complements.","authors":"Guillaume Marçais, C S Elder, Carl Kingsford","doi":"10.1093/bioinformatics/btae629","DOIUrl":"10.1093/bioinformatics/btae629","url":null,"abstract":"<p><strong>Motivation: </strong>Sequences equivalent to their reverse complements (i.e. double-stranded DNA) have no analogue in text analysis and non-biological string algorithms. Despite this striking difference, algorithms designed for computational biology (e.g. sketching algorithms) are designed and tested in the same way as classical string algorithms. Then, as a post-processing step, these algorithms are adapted to work with genomic sequences by folding a k-mer and its reverse complement into a single sequence: The canonical representation (k-nonical space).</p><p><strong>Results: </strong>The effect of using the canonical representation with sketching methods is understudied and not understood. As a first step, we use context-free sketching methods to illustrate the potentially detrimental effects of using canonical k-mers with string algorithms not designed to accommodate for them. In particular, we show that large stretches of the genome (\"sketching deserts\") are undersampled or entirely skipped by context-free sketching methods, effectively making these genomic regions invisible to subsequent algorithms using these sketches. We provide empirical data showing these effects and develop a theoretical framework explaining the appearance of sketching deserts. Finally, we propose two schemes to accommodate for these effects: (i) a new procedure that adapts existing sketching methods to k-nonical space and (ii) an optimization procedure to directly design new sketching methods for k-nonical space.</p><p><strong>Availability and implementation: </strong>The code used in this analysis is available under a permissive license at https://github.com/Kingsford-Group/mdsscope.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1093/bioinformatics/btae623
Yuying Shi, Botao Xu, Zhe Wang, Qitao Chen, Jie Chai, Cheng Wang
Motivation: Enzymatic reaction play a pivotal role in regulating cellular processes with a high degree of specificity to biological functions. When enzymatic reactions are disrupted by gene, protein, or metabolite dysfunctions in diseases, it becomes crucial to visualize the resulting perturbed enzymatic reaction-induced multi-omics network. Multi-omics network visualization aids in gaining a comprehensive understanding of the functionality and regulatory mechanisms within biological systems.
Results: In this study, we designed PhenoMultiOmics, an enzymatic reaction-based multi-omics web server designed to explore the scope of the multi-omics network across various cancer types. We first curated the PhenoMultiOmics database, which enables the retrieval of cancer-gene-protein-metabolite relationships based on the enzymatic reactions. We then developed the MultiOmics network visualization module to depict the interplay between genes, proteins, and metabolites in response to specific cancer-related enzymatic reactions. The biomarker discovery module facilitates functional analysis through differential omic feature expression and pathway enrichment analysis. PhenoMultiOmics has been applied to analyze the transcriptomics data of gastric cancer and the metabolomics data of lung cancer, providing mechanistic insights into interrupted enzymatic reactions and the associated multi-omics network.
Availability and implementation: PhenoMultiOmics is freely accessed at https://phenomultiomics.shinyapps.io/cancer/ with a user-friendly and interactive web interface.
{"title":"PhenoMultiOmics: an enzymatic reaction inferred multi-omics network visualization web server.","authors":"Yuying Shi, Botao Xu, Zhe Wang, Qitao Chen, Jie Chai, Cheng Wang","doi":"10.1093/bioinformatics/btae623","DOIUrl":"10.1093/bioinformatics/btae623","url":null,"abstract":"<p><strong>Motivation: </strong>Enzymatic reaction play a pivotal role in regulating cellular processes with a high degree of specificity to biological functions. When enzymatic reactions are disrupted by gene, protein, or metabolite dysfunctions in diseases, it becomes crucial to visualize the resulting perturbed enzymatic reaction-induced multi-omics network. Multi-omics network visualization aids in gaining a comprehensive understanding of the functionality and regulatory mechanisms within biological systems.</p><p><strong>Results: </strong>In this study, we designed PhenoMultiOmics, an enzymatic reaction-based multi-omics web server designed to explore the scope of the multi-omics network across various cancer types. We first curated the PhenoMultiOmics database, which enables the retrieval of cancer-gene-protein-metabolite relationships based on the enzymatic reactions. We then developed the MultiOmics network visualization module to depict the interplay between genes, proteins, and metabolites in response to specific cancer-related enzymatic reactions. The biomarker discovery module facilitates functional analysis through differential omic feature expression and pathway enrichment analysis. PhenoMultiOmics has been applied to analyze the transcriptomics data of gastric cancer and the metabolomics data of lung cancer, providing mechanistic insights into interrupted enzymatic reactions and the associated multi-omics network.</p><p><strong>Availability and implementation: </strong>PhenoMultiOmics is freely accessed at https://phenomultiomics.shinyapps.io/cancer/ with a user-friendly and interactive web interface.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}