echemdb 工具包 -- 为数据管理解决方案准备数据的轻量级方法

arXiv - CS - Databases Pub Date : 2024-09-11 DOI:arxiv-2409.07083

Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth

{"title":"echemdb 工具包 -- 为数据管理解决方案准备数据的轻量级方法","authors":"Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth","doi":"arxiv-2409.07083","DOIUrl":null,"url":null,"abstract":"According to the FAIR (findability, accessibility, interoperability, and\nreusability) principles, scientific data should always be stored with\nmachine-readable descriptive metadata. Existing solutions to store data with\nmetadata, such as electronic lab notebooks (ELN), are often very\ndomain-specific and not sufficiently generic for arbitrary experimental or\ncomputational results. In this work, we present open-source echemdb toolkit for creating and\nhandling data and metadata. The toolkit is running entirely on the file system\nlevel using a file-based approach, which facilitates integration with other\ntools in a FAIR data life cycle and means that no complicated server setup is\nrequired. This also makes the toolkit more accessible to the average researcher\nsince no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic\nannotation of raw research data with human- and machine-readable metadata, data\nconversion into standardised frictionless Data Packages, and an API for\nexploring the data. We also illustrate the web frameworks to illustrate the\ndata using example data from research into energy conversion and storage.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions\",\"authors\":\"Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth\",\"doi\":\"arxiv-2409.07083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"According to the FAIR (findability, accessibility, interoperability, and\\nreusability) principles, scientific data should always be stored with\\nmachine-readable descriptive metadata. Existing solutions to store data with\\nmetadata, such as electronic lab notebooks (ELN), are often very\\ndomain-specific and not sufficiently generic for arbitrary experimental or\\ncomputational results. In this work, we present open-source echemdb toolkit for creating and\\nhandling data and metadata. The toolkit is running entirely on the file system\\nlevel using a file-based approach, which facilitates integration with other\\ntools in a FAIR data life cycle and means that no complicated server setup is\\nrequired. This also makes the toolkit more accessible to the average researcher\\nsince no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic\\nannotation of raw research data with human- and machine-readable metadata, data\\nconversion into standardised frictionless Data Packages, and an API for\\nexploring the data. We also illustrate the web frameworks to illustrate the\\ndata using example data from research into energy conversion and storage.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":\"60 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

根据 FAIR（可查找性、可访问性、互操作性和可重用性）原则，科学数据应始终以机器可读的描述性元数据存储。现有的元数据存储解决方案，如电子实验笔记本（ELN），通常都是针对特定领域的，对任意实验或计算结果的通用性不够。在这项工作中，我们提出了用于创建和处理数据与元数据的开源 echemdb 工具包。该工具包采用基于文件的方法，完全在文件系统级运行，这便于在 FAIR 数据生命周期中与其他工具集成，也意味着无需复杂的服务器设置。这也使得普通研究人员更容易使用该工具包，因为不需要了解更复杂的数据库技术。我们展示了该工具包的几个方面和应用：使用人类和机器可读元数据自动标注原始研究数据、将数据转换为标准化的无摩擦数据包以及探索数据的应用程序接口。我们还利用能源转换和存储研究中的示例数据说明了说明数据的网络框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions

According to the FAIR (findability, accessibility, interoperability, and reusability) principles, scientific data should always be stored with machine-readable descriptive metadata. Existing solutions to store data with metadata, such as electronic lab notebooks (ELN), are often very domain-specific and not sufficiently generic for arbitrary experimental or computational results. In this work, we present open-source echemdb toolkit for creating and handling data and metadata. The toolkit is running entirely on the file system level using a file-based approach, which facilitates integration with other tools in a FAIR data life cycle and means that no complicated server setup is required. This also makes the toolkit more accessible to the average researcher since no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic annotation of raw research data with human- and machine-readable metadata, data conversion into standardised frictionless Data Packages, and an API for exploring the data. We also illustrate the web frameworks to illustrate the data using example data from research into energy conversion and storage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量

期刊最新文献

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes