Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes
{"title":"支持大规模罕见病研究的互联数据基础设施。","authors":"Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes","doi":"10.1093/gigascience/giae058","DOIUrl":null,"url":null,"abstract":"<p><p>The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing (\"solving\") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS \"RD3\" and Café Variome \"Discovery Nexus\" connect data and metadata and offer discovery services, and secure cloud-based \"Sandboxes\" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413801/pdf/","citationCount":"0","resultStr":"{\"title\":\"An interconnected data infrastructure to support large-scale rare disease research.\",\"authors\":\"Lennart F Johansson, Steve Laurie, Dylan Spalding, Spencer Gibson, David Ruvolo, Coline Thomas, Davide Piscia, Fernanda de Andrade, Gerieke Been, Marieke Bijlsma, Han Brunner, Sandi Cimerman, Farid Yavari Dizjikan, Kornelia Ellwanger, Marcos Fernandez, Mallory Freeberg, Gert-Jan van de Geijn, Roan Kanninga, Vatsalya Maddi, Mehdi Mehtarizadeh, Pieter Neerincx, Stephan Ossowski, Ana Rath, Dieuwke Roelofs-Prins, Marloes Stok-Benjamins, K Joeri van der Velde, Colin Veal, Gerben van der Vries, Marc Wadsley, Gregory Warren, Birte Zurek, Thomas Keane, Holm Graessner, Sergi Beltran, Morris A Swertz, Anthony J Brookes\",\"doi\":\"10.1093/gigascience/giae058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing (\\\"solving\\\") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS \\\"RD3\\\" and Café Variome \\\"Discovery Nexus\\\" connect data and metadata and offer discovery services, and secure cloud-based \\\"Sandboxes\\\" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2024-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413801/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giae058\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae058","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
An interconnected data infrastructure to support large-scale rare disease research.
The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing ("solving") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analyzing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing, and multiomics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyze data and metadata in a collaborative manner. Pseudonymized phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardized pipelines. Resulting files and novel produced omics data are sent to the European Genome-Phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS "RD3" and Café Variome "Discovery Nexus" connect data and metadata and offer discovery services, and secure cloud-based "Sandboxes" support multiparty data analysis. This successfully deployed and useful infrastructure design provides a blueprint for other projects that need to analyze large amounts of heterogeneous data.
期刊介绍:
GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.