echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions

Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth
{"title":"echemdb Toolkit -- a Lightweight Approach to Getting Data Ready for Data Management Solutions","authors":"Albert K. Engstfeld, Johannes M. Hermann, Nicolas G. Hörmann, Julian Rüth","doi":"arxiv-2409.07083","DOIUrl":null,"url":null,"abstract":"According to the FAIR (findability, accessibility, interoperability, and\nreusability) principles, scientific data should always be stored with\nmachine-readable descriptive metadata. Existing solutions to store data with\nmetadata, such as electronic lab notebooks (ELN), are often very\ndomain-specific and not sufficiently generic for arbitrary experimental or\ncomputational results. In this work, we present open-source echemdb toolkit for creating and\nhandling data and metadata. The toolkit is running entirely on the file system\nlevel using a file-based approach, which facilitates integration with other\ntools in a FAIR data life cycle and means that no complicated server setup is\nrequired. This also makes the toolkit more accessible to the average researcher\nsince no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic\nannotation of raw research data with human- and machine-readable metadata, data\nconversion into standardised frictionless Data Packages, and an API for\nexploring the data. We also illustrate the web frameworks to illustrate the\ndata using example data from research into energy conversion and storage.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

According to the FAIR (findability, accessibility, interoperability, and reusability) principles, scientific data should always be stored with machine-readable descriptive metadata. Existing solutions to store data with metadata, such as electronic lab notebooks (ELN), are often very domain-specific and not sufficiently generic for arbitrary experimental or computational results. In this work, we present open-source echemdb toolkit for creating and handling data and metadata. The toolkit is running entirely on the file system level using a file-based approach, which facilitates integration with other tools in a FAIR data life cycle and means that no complicated server setup is required. This also makes the toolkit more accessible to the average researcher since no understanding of more sophisticated database technologies is required. We showcase several aspects and applications of the toolkit: automatic annotation of raw research data with human- and machine-readable metadata, data conversion into standardised frictionless Data Packages, and an API for exploring the data. We also illustrate the web frameworks to illustrate the data using example data from research into energy conversion and storage.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
echemdb 工具包 -- 为数据管理解决方案准备数据的轻量级方法
根据 FAIR(可查找性、可访问性、互操作性和可重用性)原则,科学数据应始终以机器可读的描述性元数据存储。现有的元数据存储解决方案,如电子实验笔记本(ELN),通常都是针对特定领域的,对任意实验或计算结果的通用性不够。在这项工作中,我们提出了用于创建和处理数据与元数据的开源 echemdb 工具包。该工具包采用基于文件的方法,完全在文件系统级运行,这便于在 FAIR 数据生命周期中与其他工具集成,也意味着无需复杂的服务器设置。这也使得普通研究人员更容易使用该工具包,因为不需要了解更复杂的数据库技术。我们展示了该工具包的几个方面和应用:使用人类和机器可读元数据自动标注原始研究数据、将数据转换为标准化的无摩擦数据包以及探索数据的应用程序接口。我们还利用能源转换和存储研究中的示例数据说明了说明数据的网络框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1