Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld
{"title":"BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets","authors":"Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld","doi":"arxiv-2408.17320","DOIUrl":null,"url":null,"abstract":"Researchers in biomedical research, public health, and the life sciences\noften spend weeks or months discovering, accessing, curating, and integrating\ndata from disparate sources, significantly delaying the onset of actual\nanalysis and innovation. Instead of countless developers creating redundant and\ninconsistent data pipelines, BioBricks.ai offers a centralized data repository\nand a suite of developer-friendly tools to simplify access to scientific data.\nCurrently, BioBricks.ai delivers over ninety biological and chemical datasets.\nIt provides a package manager-like system for installing and managing\ndependencies on data sources. Each 'brick' is a Data Version Control git\nrepository that supports an updateable pipeline for extraction, transformation,\nand loading data into the BioBricks.ai backend at https://biobricks.ai. Use\ncases include accelerating data science workflows and facilitating the creation\nof novel data assets by integrating multiple datasets into unified, harmonized\nresources. In conclusion, BioBricks.ai offers an opportunity to accelerate\naccess and use of public data through a single open platform.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.17320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Researchers in biomedical research, public health, and the life sciences
often spend weeks or months discovering, accessing, curating, and integrating
data from disparate sources, significantly delaying the onset of actual
analysis and innovation. Instead of countless developers creating redundant and
inconsistent data pipelines, BioBricks.ai offers a centralized data repository
and a suite of developer-friendly tools to simplify access to scientific data.
Currently, BioBricks.ai delivers over ninety biological and chemical datasets.
It provides a package manager-like system for installing and managing
dependencies on data sources. Each 'brick' is a Data Version Control git
repository that supports an updateable pipeline for extraction, transformation,
and loading data into the BioBricks.ai backend at https://biobricks.ai. Use
cases include accelerating data science workflows and facilitating the creation
of novel data assets by integrating multiple datasets into unified, harmonized
resources. In conclusion, BioBricks.ai offers an opportunity to accelerate
access and use of public data through a single open platform.