首页 > 最新文献

GigaScience最新文献

英文 中文
A multi-omics data analysis workflow packaged as a FAIR Digital Object 打包为 FAIR 数字对象的多组学数据分析工作流程
IF 9.2 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-10 DOI: 10.1093/gigascience/giad115
Anna Niehues, Casper de Visser, Fiona A Hagenbeek, Purva Kulkarni, René Pool, Naama Karu, Alida S D Kindt, Gurnoor Singh, Robert R J M Vermeiren, Dorret I Boomsma, Jenny van Dongen, Peter A C ’t Hoen, Alain J van Gool
Background Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. Findings We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. Conclusions Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.
背景 在研究项目中应用良好的数据管理和 FAIR(可查找、可访问、可互操作和可重用)数据原则,有助于在未来的研究中将知识发现、研究结果可重现性和数据重用区分开来。基于最初的研究数据 FAIR 原则的概念,最近又提出了研究软件 FAIR 原则。FAIR 数字对象可以实现研究对象的发现和重用,包括人类和机器的计算工作流程。实际案例有助于促进研究界在计算工作流程中采用 FAIR 实践。我们开发了一个多组学数据分析工作流,将其作为 FAIR 数字对象进行共享。研究结果 我们进行了一项案例研究,调查多组学数据与儿童外化行为之间的共享模式。分析工作流在工作流管理器 Nextflow 中以模块化流水线的形式实现,包括具有软件依赖性的容器。我们遵守了软件开发规范,如版本控制、文档和许可。最后,我们用丰富的语义元数据对工作流进行了描述,将其打包为研究对象板块(Research Object Crate),并通过 WorkflowHub 进行共享。结论 除了打包的多组学数据分析工作流程,我们还分享了采用各种 FAIR 实践和创建 FAIR 数字对象的经验。我们希望我们的经验能够帮助其他开发 omics 数据分析工作流程的研究人员将 FAIR 原则付诸实践。
{"title":"A multi-omics data analysis workflow packaged as a FAIR Digital Object","authors":"Anna Niehues, Casper de Visser, Fiona A Hagenbeek, Purva Kulkarni, René Pool, Naama Karu, Alida S D Kindt, Gurnoor Singh, Robert R J M Vermeiren, Dorret I Boomsma, Jenny van Dongen, Peter A C ’t Hoen, Alain J van Gool","doi":"10.1093/gigascience/giad115","DOIUrl":"https://doi.org/10.1093/gigascience/giad115","url":null,"abstract":"Background Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. Findings We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. Conclusions Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"8 1","pages":""},"PeriodicalIF":9.2,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139463300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation. 海参 Chiridota heheva 的高质量染色体基因组组装及其热液适应性。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad107
Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang

Background: Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.

Findings: Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.

Conclusions: This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.

背景:Chiridota heheva是一种世界性的百足类动物,能很好地适应各种深海生态系统,尤其是化合环境。除了高静水压和有限的光照外,高浓度的金属离子也代表了热液环境中的苛刻条件。能在如此极端条件下生活的百足虫物种少之又少。因此,阐明C. heheva在热液环境中的适应性遗传机制具有重要价值:在此,我们报告了来自凯雷喷口的C. heheva的高质量参考基因组组装,这是Apodida的第一个染色体组水平的基因组。染色体级基因组大小为1.43 Gb,支架N50为53.24 Mb,BUSCO完整性得分为94.5%。对等位基因序列进行了聚类、排序并组装成 19 条天然染色体。比较基因组分析发现,C. heheva的扩展基因家族和正选基因参与了DNA损伤修复过程。扩展基因家族和独特基因有助于在富铁环境中维持铁平衡。具有10个正选位点的正选基因RFC2在极端环境下的DNA修复中发挥了重要作用:C.heheva的首个染色体级基因组组装揭示了热液适应性。作为Apodida目第一个染色体水平的基因组,该基因组将为研究Holothuroidea类的进化提供资源。
{"title":"A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation.","authors":"Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang","doi":"10.1093/gigascience/giad107","DOIUrl":"10.1093/gigascience/giad107","url":null,"abstract":"<p><strong>Background: </strong>Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.</p><p><strong>Findings: </strong>Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.</p><p><strong>Conclusions: </strong>This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139086481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A graph clustering algorithm for detection and genotyping of structural variants from long reads. 从长读数中检测结构变异并进行基因分型的图聚类算法。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad112
Nicolás Gaitán, Jorge Duitama

Background: Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.

Findings: We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.

Conclusion: The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.

背景:结构变异(SV)是由其长度(大于 50 bp)定义的基因组多态性。SV 的常见类型有缺失、插入、易位、倒位和拷贝数变异。鉴于 SV 在表型变异和进化事件等现象中的作用,SV 的检测和基因分型至关重要。因此,最近开发出了利用长线程测序数据识别 SV 的方法:我们提出了一种准确、高效的算法,用于从长读序测序数据中预测种系SV。该算法首先从读数比对中收集 SV 的证据(特征)。然后,根据长度和基因组位置计算出的坐标欧几里得图对特征进行聚类。聚类是通过 DBSCAN 算法进行的,该算法具有高分辨率划分聚类的优势。聚类被转化为 SV,贝叶斯模型可根据 SV 的支持证据对 SV 进行精确的基因分型。该算法已被集成到下一代测序体验平台的单样本变异检测器中,从而促进了与其他基因组学分析功能的集成。我们进行了多个基准实验,包括模拟和真实数据,代表了不同的基因组图谱、测序技术(PacBio HiFi、ONT)和读取深度:结果表明,在种系 SV 调用和基因分型方面,我们的方法优于最先进的工具,尤其是在低深度和易出错的重复区域。我们相信,这项工作将极大地促进生物信息学策略的发展,从而最大限度地利用长读数测序技术。
{"title":"A graph clustering algorithm for detection and genotyping of structural variants from long reads.","authors":"Nicolás Gaitán, Jorge Duitama","doi":"10.1093/gigascience/giad112","DOIUrl":"10.1093/gigascience/giad112","url":null,"abstract":"<p><strong>Background: </strong>Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.</p><p><strong>Findings: </strong>We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.</p><p><strong>Conclusion: </strong>The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10783151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMV_Im2Im: an open-source microscopy machine vision toolbox for image-to-image transformation. MMV_Im2Im:用于图像到图像转换的开源显微镜机器视觉工具箱。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad120
Justin Sonneck, Yu Zhou, Jianxu Chen

Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source Python package for image-to-image transformation in bioimaging applications. MMV_Im2Im is designed with a generic image-to-image transformation framework that can be used for a wide range of tasks, including semantic segmentation, instance segmentation, image restoration, image generation, and so on. Our implementation takes advantage of state-of-the-art machine learning engineering techniques, allowing researchers to focus on their research without worrying about engineering details. We demonstrate the effectiveness of MMV_Im2Im on more than 10 different biomedical problems, showcasing its general potentials and applicabilities. For computational biomedical researchers, MMV_Im2Im provides a starting point for developing new biomedical image analysis or machine learning algorithms, where they can either reuse the code in this package or fork and extend this package to facilitate the development of new methods. Experimental biomedical researchers can benefit from this work by gaining a comprehensive view of the image-to-image transformation concept through diversified examples and use cases. We hope this work can give the community inspirations on how DL-based image-to-image transformation can be integrated into the assay development process, enabling new biomedical studies that cannot be done only with traditional experimental assays. To help researchers get started, we have provided source code, documentation, and tutorials for MMV_Im2Im at [https://github.com/MMV-Lab/mmv_im2im] under MIT license.

过去十年间,计算机视觉领域的深度学习(DL)研究发展迅速,基于 DL 的生物医学问题图像分析方法也取得了许多进展。在这项工作中,我们介绍了 MMV_Im2Im,这是一个新的开源 Python 软件包,用于生物成像应用中的图像到图像转换。MMV_Im2Im 设计了一个通用的图像到图像转换框架,可用于多种任务,包括语义分割、实例分割、图像复原、图像生成等。我们的实现利用了最先进的机器学习工程技术,使研究人员能够专注于他们的研究,而不必担心工程细节。我们在 10 多个不同的生物医学问题上演示了 MMV_Im2Im 的有效性,展示了它的普遍潜力和适用性。对于计算生物医学研究人员来说,MMV_Im2Im 为他们开发新的生物医学图像分析或机器学习算法提供了一个起点,他们既可以重复使用该软件包中的代码,也可以对该软件包进行分叉和扩展,以促进新方法的开发。生物医学实验研究人员可以从这项工作中获益,通过多样化的示例和用例全面了解图像到图像的转换概念。我们希望这项工作能给社区带来启发,让他们了解如何将基于 DL 的图像到图像转换集成到检测开发流程中,从而实现传统实验检测无法完成的新生物医学研究。为了帮助研究人员入门,我们在 MIT 许可下在 [https://github.com/MMV-Lab/mmv_im2im] 网站上提供了 MMV_Im2Im 的源代码、文档和教程。
{"title":"MMV_Im2Im: an open-source microscopy machine vision toolbox for image-to-image transformation.","authors":"Justin Sonneck, Yu Zhou, Jianxu Chen","doi":"10.1093/gigascience/giad120","DOIUrl":"10.1093/gigascience/giad120","url":null,"abstract":"<p><p>Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source Python package for image-to-image transformation in bioimaging applications. MMV_Im2Im is designed with a generic image-to-image transformation framework that can be used for a wide range of tasks, including semantic segmentation, instance segmentation, image restoration, image generation, and so on. Our implementation takes advantage of state-of-the-art machine learning engineering techniques, allowing researchers to focus on their research without worrying about engineering details. We demonstrate the effectiveness of MMV_Im2Im on more than 10 different biomedical problems, showcasing its general potentials and applicabilities. For computational biomedical researchers, MMV_Im2Im provides a starting point for developing new biomedical image analysis or machine learning algorithms, where they can either reuse the code in this package or fork and extend this package to facilitate the development of new methods. Experimental biomedical researchers can benefit from this work by gaining a comprehensive view of the image-to-image transformation concept through diversified examples and use cases. We hope this work can give the community inspirations on how DL-based image-to-image transformation can be integrated into the assay development process, enabling new biomedical studies that cannot be done only with traditional experimental assays. To help researchers get started, we have provided source code, documentation, and tutorials for MMV_Im2Im at [https://github.com/MMV-Lab/mmv_im2im] under MIT license.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10821710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAIR data retrieval for sensitive clinical research data in Galaxy. 银河系统中敏感临床研究数据的 FAIR 数据检索。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad099
Jasper Ouwerkerk, Helena Rasche, John D Spalding, Saskia Hiltemann, Andrew P Stubbs

Background: In clinical research, data have to be accessible and reproducible, but the generated data are becoming larger and analysis complex. Here we propose a platform for Findable, Accessible, Interoperable, and Reusable (FAIR) data access and creating reproducible findings. Standardized access to a major genomic repository, the European Genome-Phenome Archive (EGA), has been achieved with API services like PyEGA3. We aim to provide a FAIR data analysis service in Galaxy by retrieving genomic data from the EGA and provide a generalized "omics" platform for FAIR data analysis.

Results: To demonstrate this, we implemented an end-to-end Galaxy workflow to replicate the findings from an RD-Connect synthetic dataset Beyond the 1 Million Genomes (synB1MG) available from the EGA. We developed the PyEGA3 connector within Galaxy to easily download multiple datasets from the EGA. We added the gene.iobio tool, a diagnostic environment for precision genomics, to Galaxy and demonstrate that it provides a more dynamic and interpretable view for trio analysis results. We developed a Galaxy trio analysis workflow to determine the pathogenic variants from the synB1MG trios using the GEMINI and gene.iobio tool. The complete workflow is available at WorkflowHub, and an associated tutorial was created in the Galaxy Training Network, which helps researchers unfamiliar with Galaxy to run the workflow.

Conclusions: We showed the feasibility of reusing data from the EGA in Galaxy via PyEGA3 and validated the workflow by rediscovering spiked-in variants in synthetic data. Finally, we improved existing tools in Galaxy and created a workflow for trio analysis to demonstrate the value of FAIR genomics analysis in Galaxy.

背景:在临床研究中,数据必须是可访问和可重复的,但生成的数据越来越大,分析也越来越复杂。在此,我们提出了一个可查找、可访问、可互操作和可重用(FAIR)的数据访问平台,以创建可重现的研究结果。通过 PyEGA3 等应用程序接口服务,已经实现了对欧洲基因组-表型组档案(EGA)这一主要基因组资源库的标准化访问。我们的目标是通过检索 EGA 的基因组数据,在 Galaxy 中提供 FAIR 数据分析服务,并为 FAIR 数据分析提供一个通用的 "omics "平台:为了证明这一点,我们实施了一个端到端的 Galaxy 工作流程,以复制来自 EGA 的 RD-Connect 合成数据集 Beyond the 1 Million Genomes (synB1MG) 的研究结果。我们在 Galaxy 中开发了 PyEGA3 连接器,以便从 EGA 轻松下载多个数据集。我们将用于精准基因组学诊断环境的 gene.iobio 工具添加到了 Galaxy 中,并证明它为三元组分析结果提供了更动态、更可解释的视图。我们开发了一套 Galaxy 三元组分析工作流程,利用 GEMINI 和 gene.iobio 工具确定 synB1MG 三元组中的致病变体。完整的工作流程可在 WorkflowHub 上找到,在 Galaxy 培训网络中还创建了相关教程,帮助不熟悉 Galaxy 的研究人员运行工作流程:我们展示了通过 PyEGA3 在 Galaxy 中重用 EGA 数据的可行性,并通过在合成数据中重新发现尖峰变异验证了工作流程。最后,我们改进了 Galaxy 中的现有工具,并创建了一个三元组分析工作流,以展示 Galaxy 中 FAIR 基因组学分析的价值。
{"title":"FAIR data retrieval for sensitive clinical research data in Galaxy.","authors":"Jasper Ouwerkerk, Helena Rasche, John D Spalding, Saskia Hiltemann, Andrew P Stubbs","doi":"10.1093/gigascience/giad099","DOIUrl":"10.1093/gigascience/giad099","url":null,"abstract":"<p><strong>Background: </strong>In clinical research, data have to be accessible and reproducible, but the generated data are becoming larger and analysis complex. Here we propose a platform for Findable, Accessible, Interoperable, and Reusable (FAIR) data access and creating reproducible findings. Standardized access to a major genomic repository, the European Genome-Phenome Archive (EGA), has been achieved with API services like PyEGA3. We aim to provide a FAIR data analysis service in Galaxy by retrieving genomic data from the EGA and provide a generalized \"omics\" platform for FAIR data analysis.</p><p><strong>Results: </strong>To demonstrate this, we implemented an end-to-end Galaxy workflow to replicate the findings from an RD-Connect synthetic dataset Beyond the 1 Million Genomes (synB1MG) available from the EGA. We developed the PyEGA3 connector within Galaxy to easily download multiple datasets from the EGA. We added the gene.iobio tool, a diagnostic environment for precision genomics, to Galaxy and demonstrate that it provides a more dynamic and interpretable view for trio analysis results. We developed a Galaxy trio analysis workflow to determine the pathogenic variants from the synB1MG trios using the GEMINI and gene.iobio tool. The complete workflow is available at WorkflowHub, and an associated tutorial was created in the Galaxy Training Network, which helps researchers unfamiliar with Galaxy to run the workflow.</p><p><strong>Conclusions: </strong>We showed the feasibility of reusing data from the EGA in Galaxy via PyEGA3 and validated the workflow by rediscovering spiked-in variants in synthetic data. Finally, we improved existing tools in Galaxy and created a workflow for trio analysis to demonstrate the value of FAIR genomics analysis in Galaxy.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10821763/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward genome assemblies for all marine vertebrates: current landscape and challenges. 实现所有海洋脊椎动物的基因组组装:现状与挑战。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad119
Emma de Jong, Lara Parata, Philipp E Bayer, Shannon Corrigan, Richard J Edwards

Marine vertebrate biodiversity is fundamental to ocean ecosystem health but is threatened by climate change, overharvesting, and habitat degradation. High-quality reference genomes are valuable foundational scientific resources that can inform conservation efforts. Consequently, global consortia are striving to produce reference genomes for representatives of all life. Here, we summarize the current landscape of available marine vertebrate reference genomes, including their phylogenetic diversity and geographic hotspots of production. We discuss key logistical and technical challenges that remain to be overcome if we are to realize the vision of a comprehensive reference genome library of all marine vertebrates.

海洋脊椎动物的生物多样性是海洋生态系统健康的基础,但却受到气候变化、过度捕捞和栖息地退化的威胁。高质量的参考基因组是宝贵的基础科学资源,可以为保护工作提供依据。因此,全球联盟正在努力为所有生命的代表制作参考基因组。在此,我们总结了目前可用的海洋脊椎动物参考基因组的情况,包括其系统发育多样性和生产的地理热点。我们讨论了要实现建立所有海洋脊椎动物综合参考基因组库的愿景,仍需克服的关键后勤和技术挑战。
{"title":"Toward genome assemblies for all marine vertebrates: current landscape and challenges.","authors":"Emma de Jong, Lara Parata, Philipp E Bayer, Shannon Corrigan, Richard J Edwards","doi":"10.1093/gigascience/giad119","DOIUrl":"10.1093/gigascience/giad119","url":null,"abstract":"<p><p>Marine vertebrate biodiversity is fundamental to ocean ecosystem health but is threatened by climate change, overharvesting, and habitat degradation. High-quality reference genomes are valuable foundational scientific resources that can inform conservation efforts. Consequently, global consortia are striving to produce reference genomes for representatives of all life. Here, we summarize the current landscape of available marine vertebrate reference genomes, including their phylogenetic diversity and geographic hotspots of production. We discuss key logistical and technical challenges that remain to be overcome if we are to realize the vision of a comprehensive reference genome library of all marine vertebrates.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10821707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAT Bridge: an efficient toolkit for gene-metabolite association mining from multiomics data. CAT Bridge:从多组学数据中进行基因-代谢物关联挖掘的高效工具包。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae083
Bowen Yang, Tan Meng, Xinrui Wang, Jun Li, Shuang Zhao, Yingheng Wang, Shu Yi, Yi Zhou, Yi Zhang, Liang Li, Li Guo

Background: With advancements in sequencing and mass spectrometry technologies, multiomics data can now be easily acquired for understanding complex biological systems. Nevertheless, substantial challenges remain in determining the association between gene-metabolite pairs due to the nonlinear and multifactorial interactions within cellular networks. The complexity arises from the interplay of multiple genes and metabolites, often involving feedback loops and time-dependent regulatory mechanisms that are not easily captured by traditional analysis methods.

Findings: Here, we introduce Compounds And Transcripts Bridge (abbreviated as CAT Bridge, available at https://catbridge.work), a free user-friendly platform for longitudinal multiomics analysis to efficiently identify transcripts associated with metabolites using time-series omics data. To evaluate the association of gene-metabolite pairs, CAT Bridge is a pioneering work benchmarking a set of statistical methods spanning causality estimation and correlation coefficient calculation for multiomics analysis. Additionally, CAT Bridge features an artificial intelligence agent to assist users interpreting the association results.

Conclusions: We applied CAT Bridge to experimentally obtained Capsicum chinense (chili pepper) and public human and Escherichia coli time-series transcriptome and metabolome datasets. CAT Bridge successfully identified genes involved in the biosynthesis of capsaicin in C. chinense. Furthermore, case study results showed that the convergent cross-mapping method outperforms traditional approaches in longitudinal multiomics analyses. CAT Bridge simplifies access to various established methods for longitudinal multiomics analysis and enables researchers to swiftly identify associated gene-metabolite pairs for further validation.

背景:随着测序和质谱技术的进步,现在可以很容易地获取多组学数据来了解复杂的生物系统。然而,由于细胞网络内非线性和多因素的相互作用,在确定基因-代谢物对之间的关联方面仍然存在巨大挑战。这种复杂性源于多个基因和代谢物的相互作用,往往涉及传统分析方法难以捕捉的反馈回路和时间依赖性调控机制:在这里,我们介绍化合物与转录本桥(Compounds And Transcripts Bridge,缩写为CAT Bridge,可在https://catbridge.work),这是一个免费的用户友好型纵向多组学分析平台,可利用时间序列omics数据有效识别与代谢物相关的转录本。为了评估基因-代谢物对的关联性,CAT Bridge 是一项开创性的工作,它为多组学分析设定了一套涵盖因果关系估计和相关系数计算的统计方法基准。此外,CAT Bridge 还具有人工智能代理功能,可帮助用户解释关联结果:我们将 CAT Bridge 应用于从实验中获得的辣椒、人类和大肠杆菌时间序列转录组和代谢组数据集。CAT Bridge 成功鉴定了辣椒中参与辣椒素生物合成的基因。此外,案例研究结果表明,在纵向多组学分析中,会聚交叉映射方法优于传统方法。CAT Bridge 简化了纵向多组学分析中各种既定方法的使用,使研究人员能够迅速确定相关的基因-代谢物配对,以便进一步验证。
{"title":"CAT Bridge: an efficient toolkit for gene-metabolite association mining from multiomics data.","authors":"Bowen Yang, Tan Meng, Xinrui Wang, Jun Li, Shuang Zhao, Yingheng Wang, Shu Yi, Yi Zhou, Yi Zhang, Liang Li, Li Guo","doi":"10.1093/gigascience/giae083","DOIUrl":"10.1093/gigascience/giae083","url":null,"abstract":"<p><strong>Background: </strong>With advancements in sequencing and mass spectrometry technologies, multiomics data can now be easily acquired for understanding complex biological systems. Nevertheless, substantial challenges remain in determining the association between gene-metabolite pairs due to the nonlinear and multifactorial interactions within cellular networks. The complexity arises from the interplay of multiple genes and metabolites, often involving feedback loops and time-dependent regulatory mechanisms that are not easily captured by traditional analysis methods.</p><p><strong>Findings: </strong>Here, we introduce Compounds And Transcripts Bridge (abbreviated as CAT Bridge, available at https://catbridge.work), a free user-friendly platform for longitudinal multiomics analysis to efficiently identify transcripts associated with metabolites using time-series omics data. To evaluate the association of gene-metabolite pairs, CAT Bridge is a pioneering work benchmarking a set of statistical methods spanning causality estimation and correlation coefficient calculation for multiomics analysis. Additionally, CAT Bridge features an artificial intelligence agent to assist users interpreting the association results.</p><p><strong>Conclusions: </strong>We applied CAT Bridge to experimentally obtained Capsicum chinense (chili pepper) and public human and Escherichia coli time-series transcriptome and metabolome datasets. CAT Bridge successfully identified genes involved in the biosynthesis of capsaicin in C. chinense. Furthermore, case study results showed that the convergent cross-mapping method outperforms traditional approaches in longitudinal multiomics analyses. CAT Bridge simplifies access to various established methods for longitudinal multiomics analysis and enables researchers to swiftly identify associated gene-metabolite pairs for further validation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11548955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142618127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data. Vulture:通过云技术对公共 scRNA-seq 数据中的微生物读数进行可扩展的挖掘。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad117
Junyi Chen, Danqing Yin, Harris Y H Wong, Xin Duan, Ken H O Yu, Joshua W K Ho

The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.

快速增长的公共单细胞测序数据已成为分子、细胞和微生物发现的宝贵资源。以往的研究大多忽视了在人类单细胞测序数据中检测病原体。此外,现有的生物信息学工具缺乏处理大型公共数据的可扩展性。我们介绍了 Vulture,这是一种基于云的可扩展管道,可对单细胞 RNA 测序(scRNA-seq)数据进行微生物调用,从而对来自公共领域的宿主-微生物研究进行荟萃分析。在我们的基准实验中,Vulture的速度比本地工具(PathogenTrack和Venus)快66%到88%,比最先进的云计算工具Cumulus快41%,同时实现了相当的微生物读数识别。就云计算系统的成本而言,Vulture 的成本也降低了 83%(12 美元对 70 美元)。我们将Vulture应用于2个2019年冠状病毒疾病、3个肝细胞癌(HCC)和2个胃癌人类患者队列的scRNA-seq实验公共测序读数数据,分别发现了严重急性呼吸综合征冠状病毒2、乙型肝炎病毒(HBV)和幽门螺旋杆菌阳性细胞的细胞特异性富集。在 HCC 分析中,所有队列都显示出仅肝细胞的 HBV 富集,而细胞亚型相关的 HBV 富集是基于推断的拷贝数变异。总之,Vulture 提供了一个可扩展且经济的框架,可从大规模公共 scRNA-seq 数据中挖掘未知的宿主-微生物相互作用。Vulture 可通过 https://github.com/holab-hku/Vulture 的开源许可获得。
{"title":"Vulture: cloud-enabled scalable mining of microbial reads in public scRNA-seq data.","authors":"Junyi Chen, Danqing Yin, Harris Y H Wong, Xin Duan, Ken H O Yu, Joshua W K Ho","doi":"10.1093/gigascience/giad117","DOIUrl":"10.1093/gigascience/giad117","url":null,"abstract":"<p><p>The rapidly growing collection of public single-cell sequencing data has become a valuable resource for molecular, cellular, and microbial discovery. Previous studies mostly overlooked detecting pathogens in human single-cell sequencing data. Moreover, existing bioinformatics tools lack the scalability to deal with big public data. We introduce Vulture, a scalable cloud-based pipeline that performs microbial calling for single-cell RNA sequencing (scRNA-seq) data, enabling meta-analysis of host-microbial studies from the public domain. In our benchmarking experiments, Vulture is 66% to 88% faster than local tools (PathogenTrack and Venus) and 41% faster than the state-of-the-art cloud-based tool Cumulus, while achieving comparable microbial read identification. In terms of the cost on cloud computing systems, Vulture also shows a cost reduction of 83% ($12 vs. ${$}$70). We applied Vulture to 2 coronavirus disease 2019, 3 hepatocellular carcinoma (HCC), and 2 gastric cancer human patient cohorts with public sequencing reads data from scRNA-seq experiments and discovered cell type-specific enrichment of severe acute respiratory syndrome coronavirus 2, hepatitis B virus (HBV), and Helicobacter pylori-positive cells, respectively. In the HCC analysis, all cohorts showed hepatocyte-only enrichment of HBV, with cell subtype-associated HBV enrichment based on inferred copy number variations. In summary, Vulture presents a scalable and economical framework to mine unknown host-microbial interactions from large-scale public scRNA-seq data. Vulture is available via an open-source license at https://github.com/holab-hku/Vulture.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10776309/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139402560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolutionary genomics of three agricultural pest moths reveals rapid evolution of host adaptation and immune-related genes. 三种农业害蛾的进化基因组学揭示了宿主适应和免疫相关基因的快速进化。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad103
Yi-Ming Weng, Pathour R Shashank, R Keating Godfrey, David Plotkin, Brandon M Parker, Tyler Wist, Akito Y Kawahara

Background: Understanding the genotype of pest species provides an important baseline for designing integrated pest management (IPM) strategies. Recently developed long-read sequence technologies make it possible to compare genomic features of nonmodel pest species to disclose the evolutionary path underlying the pest species profiles. Here we sequenced and assembled genomes for 3 agricultural pest gelechiid moths: Phthorimaea absoluta (tomato leafminer), Keiferia lycopersicella (tomato pinworm), and Scrobipalpa atriplicella (goosefoot groundling moth). We also compared genomes of tomato leafminer and tomato pinworm with published genomes of Phthorimaea operculella and Pectinophora gossypiella to investigate the gene family evolution related to the pest species profiles.

Results: We found that the 3 solanaceous feeding species, P. absoluta, K. lycopersicella, and P. operculella, are clustered together. Gene family evolution analyses with the 4 species show clear gene family expansions on host plant-associated genes for the 3 solanaceous feeding species. These genes are involved in host compound sensing (e.g., gustatory receptors), detoxification (e.g., ABC transporter C family, cytochrome P450, glucose-methanol-choline oxidoreductase, insect cuticle proteins, and UDP-glucuronosyl), and digestion (e.g., serine proteases and peptidase family S1). A gene ontology enrichment analysis of rapid evolving genes also suggests enriched functions in host sensing and immunity.

Conclusions: Our results of family evolution analyses indicate that host plant adaptation and pathogen defense could be important drivers in species diversification among gelechiid moths.

背景:了解害虫物种的基因型为设计害虫综合治理(IPM)战略提供了一个重要的基础。最近开发的长序列技术使比较非模式害虫物种的基因组特征成为可能,从而揭示害虫物种特征的进化路径。在这里,我们对 3 种农业害虫地蛾的基因组进行了测序和组装:Phthorimaea absoluta(番茄潜叶蛾)、Keiferia lycopersicella(番茄蛲虫)和Scrobipalpa atriplicella(鹅掌楸地蛾)。我们还将番茄潜叶蝇和番茄蛲虫的基因组与已发表的Phthorimaea operculella和Pectinophora gossypiella的基因组进行了比较,以研究与害虫物种概况相关的基因家族进化:结果:我们发现,3种以茄科植物为食的物种(P. absoluta、K. lycopersicella和P. operculella)聚集在一起。对这 4 个物种的基因家族进化分析表明,这 3 个以茄科植物为食的物种与寄主植物相关的基因家族明显扩大。这些基因涉及宿主化合物感应(如味觉受体)、解毒(如 ABC 转运体 C 家族、细胞色素 P450、葡萄糖-甲醇-胆碱氧化还原酶、昆虫角质层蛋白和 UDP-葡萄糖醛酸基)和消化(如丝氨酸蛋白酶和肽酶家族 S1)。对快速进化基因的基因本体富集分析也表明,这些基因在宿主感知和免疫方面的功能得到了富集:我们的家族进化分析结果表明,寄主植物适应和病原体防御可能是地肤蛾物种多样化的重要驱动因素。
{"title":"Evolutionary genomics of three agricultural pest moths reveals rapid evolution of host adaptation and immune-related genes.","authors":"Yi-Ming Weng, Pathour R Shashank, R Keating Godfrey, David Plotkin, Brandon M Parker, Tyler Wist, Akito Y Kawahara","doi":"10.1093/gigascience/giad103","DOIUrl":"10.1093/gigascience/giad103","url":null,"abstract":"<p><strong>Background: </strong>Understanding the genotype of pest species provides an important baseline for designing integrated pest management (IPM) strategies. Recently developed long-read sequence technologies make it possible to compare genomic features of nonmodel pest species to disclose the evolutionary path underlying the pest species profiles. Here we sequenced and assembled genomes for 3 agricultural pest gelechiid moths: Phthorimaea absoluta (tomato leafminer), Keiferia lycopersicella (tomato pinworm), and Scrobipalpa atriplicella (goosefoot groundling moth). We also compared genomes of tomato leafminer and tomato pinworm with published genomes of Phthorimaea operculella and Pectinophora gossypiella to investigate the gene family evolution related to the pest species profiles.</p><p><strong>Results: </strong>We found that the 3 solanaceous feeding species, P. absoluta, K. lycopersicella, and P. operculella, are clustered together. Gene family evolution analyses with the 4 species show clear gene family expansions on host plant-associated genes for the 3 solanaceous feeding species. These genes are involved in host compound sensing (e.g., gustatory receptors), detoxification (e.g., ABC transporter C family, cytochrome P450, glucose-methanol-choline oxidoreductase, insect cuticle proteins, and UDP-glucuronosyl), and digestion (e.g., serine proteases and peptidase family S1). A gene ontology enrichment analysis of rapid evolving genes also suggests enriched functions in host sensing and immunity.</p><p><strong>Conclusions: </strong>Our results of family evolution analyses indicate that host plant adaptation and pathogen defense could be important drivers in species diversification among gelechiid moths.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139073844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DriverMP enables improved identification of cancer driver genes 通过 DriverMP,可以更好地识别癌症驱动基因
IF 9.2 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2023-12-13 DOI: 10.1093/gigascience/giad106
Yangyang Liu, Jiyun Han, Tongxin Kong, Nannan Xiao, Qinglin Mei, Juntao Liu
Background Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. Findings We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, protein‒protein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. Conclusions The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.
背景 癌症被广泛认为是一种主要由基因突变驱动的复杂疾病。在大量客体基因中识别驱动基因是一个关键问题和重大障碍。研究结果 我们提出了一种称为 DriverMP 的新方法,通过考虑突变基因对,在癌症类型水平上有效地确定改变基因的优先次序。该方法旨在首先应用非沉默体细胞突变数据、蛋白质-蛋白质相互作用网络数据和差异基因表达数据对突变基因对进行优先排序,然后根据优先排序的突变基因对对单个突变基因进行优先排序。在《癌症基因组图谱》的 10 个癌症数据集中应用这种方法后,证明它在识别已知驱动基因方面比所有最先进的方法都有很大改进。随后,一项综合分析表明,新驱动基因的可靠性得到了临床实验、疾病富集或生物通路分析的有力支持。结论 通过有效整合多种癌症数据的优势,新方法 DriverMP 能够识别驱动基因,该方法已在 https://github.com/LiuYangyangSDU/DriverMP 上发布。此外,我们还为 10 种癌症类型开发了新颖的驱动基因数据库和在线服务,用户无需注册即可免费访问。DriverMP方法、新型驱动基因数据库和用户友好的在线服务器有望为癌症诊断和治疗带来新的机遇。
{"title":"DriverMP enables improved identification of cancer driver genes","authors":"Yangyang Liu, Jiyun Han, Tongxin Kong, Nannan Xiao, Qinglin Mei, Juntao Liu","doi":"10.1093/gigascience/giad106","DOIUrl":"https://doi.org/10.1093/gigascience/giad106","url":null,"abstract":"Background Cancer is widely regarded as a complex disease primarily driven by genetic mutations. A critical concern and significant obstacle lies in discerning driver genes amid an extensive array of passenger genes. Findings We present a new method termed DriverMP for effectively prioritizing altered genes on a cancer-type level by considering mutated gene pairs. It is designed to first apply nonsilent somatic mutation data, protein‒protein interaction network data, and differential gene expression data to prioritize mutated gene pairs, and then individual mutated genes are prioritized based on prioritized mutated gene pairs. Application of this method in 10 cancer datasets from The Cancer Genome Atlas demonstrated its great improvements over all the compared state-of-the-art methods in identifying known driver genes. Then, a comprehensive analysis demonstrated the reliability of the novel driver genes that are strongly supported by clinical experiments, disease enrichment, or biological pathway analysis. Conclusions The new method, DriverMP, which is able to identify driver genes by effectively integrating the advantages of multiple kinds of cancer data, is available at https://github.com/LiuYangyangSDU/DriverMP. In addition, we have developed a novel driver gene database for 10 cancer types and an online service that can be freely accessed without registration for users. The DriverMP method, the database of novel drivers, and the user-friendly online server are expected to contribute to new diagnostic and therapeutic opportunities for cancers.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"8 1","pages":""},"PeriodicalIF":9.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138684540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1