Standardized pipelines support and facilitate integration of diverse datasets at the Rat Genome Database.

IF 3.6 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-22 DOI:10.1093/database/baae132

Jennifer R Smith, Marek A Tutaj, Jyothi Thota, Logan Lamers, Adam C Gibson, Akhilanand Kundurthi, Varun Reddy Gollapally, Kent C Brodie, Stacy Zacher, Stanley J F Laulederkind, G Thomas Hayman, Shur-Jen Wang, Monika Tutaj, Mary L Kaldunski, Mahima Vedi, Wendy M Demos, Jeffrey L De Pons, Melinda R Dwinell, Anne E Kwitek

{"title":"Standardized pipelines support and facilitate integration of diverse datasets at the Rat Genome Database.","authors":"Jennifer R Smith, Marek A Tutaj, Jyothi Thota, Logan Lamers, Adam C Gibson, Akhilanand Kundurthi, Varun Reddy Gollapally, Kent C Brodie, Stacy Zacher, Stanley J F Laulederkind, G Thomas Hayman, Shur-Jen Wang, Monika Tutaj, Mary L Kaldunski, Mahima Vedi, Wendy M Demos, Jeffrey L De Pons, Melinda R Dwinell, Anne E Kwitek","doi":"10.1093/database/baae132","DOIUrl":null,"url":null,"abstract":"<p><p>The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers. The incorporation of these diverse datatypes is handled by a growing list of automated import, export, data processing, and quality control pipelines. This article outlines the development over time of a standardized infrastructure for automated RGD pipelines with a summary of key design decisions and a focus on lessons learned.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753291/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baae132","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The Rat Genome Database (RGD) is a multispecies knowledgebase which integrates genetic, multiomic, phenotypic, and disease data across 10 mammalian species. To support cross-species, multiomics studies and to enhance and expand on data manually extracted from the biomedical literature by the RGD team of expert curators, RGD imports and integrates data from multiple sources. These include major databases and a substantial number of domain-specific resources, as well as direct submissions by individual researchers. The incorporation of these diverse datatypes is handled by a growing list of automated import, export, data processing, and quality control pipelines. This article outlines the development over time of a standardized infrastructure for automated RGD pipelines with a summary of key design decisions and a focus on lessons learned.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

标准化的管道支持并促进了大鼠基因组数据库中不同数据集的集成。

大鼠基因组数据库（RGD）是一个多物种知识库，集成了10个哺乳动物物种的遗传、多组学、表型和疾病数据。为了支持跨物种、多组学研究，并加强和扩展由RGD专家管理团队手动从生物医学文献中提取的数据，RGD导入并整合了来自多个来源的数据。这些包括主要数据库和大量特定领域的资源，以及个人研究人员直接提交的文件。这些不同数据类型的合并由越来越多的自动化导入、导出、数据处理和质量控制管道来处理。本文概述了自动化RGD管道标准化基础设施的开发过程，并总结了关键的设计决策和经验教训。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

9.00

自引率

3.40%

发文量

100

审稿时长

>12 weeks

期刊介绍： Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.