The Expansion of Data Science: Dataset Standardization

Standards Pub Date : 2023-11-30 DOI:10.3390/standards3040028
Nuno Pessanha Santos
{"title":"The Expansion of Data Science: Dataset Standardization","authors":"Nuno Pessanha Santos","doi":"10.3390/standards3040028","DOIUrl":null,"url":null,"abstract":"With recent advances in science and technology, more processing capability and data have become available, allowing a more straightforward implementation of data analysis techniques. Fortunately, available online data storage capacity follows this trend, and vast amounts of data can be stored online freely or at accessible costs. As happens with every evolution (or revolution) in any science field, organizing and sharing these data is essential to contribute to new studies or validate obtained results quickly. To facilitate this, we must guarantee interoperability between existing datasets and developed software, whether commercial or open-source. This article explores this issue and analyzes the current initiatives to establish data standards and compares some of the existing online dataset storage platforms. Through a Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis, it is possible to better understand the strategy that should be taken to improve the efficiency in this field, which directly depends on the data’s characteristics. The development of dataset standards will directly increase the collaboration and data sharing between academia and industry, allowing faster research and development through direct interoperability.","PeriodicalId":21933,"journal":{"name":"Standards","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/standards3040028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With recent advances in science and technology, more processing capability and data have become available, allowing a more straightforward implementation of data analysis techniques. Fortunately, available online data storage capacity follows this trend, and vast amounts of data can be stored online freely or at accessible costs. As happens with every evolution (or revolution) in any science field, organizing and sharing these data is essential to contribute to new studies or validate obtained results quickly. To facilitate this, we must guarantee interoperability between existing datasets and developed software, whether commercial or open-source. This article explores this issue and analyzes the current initiatives to establish data standards and compares some of the existing online dataset storage platforms. Through a Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis, it is possible to better understand the strategy that should be taken to improve the efficiency in this field, which directly depends on the data’s characteristics. The development of dataset standards will directly increase the collaboration and data sharing between academia and industry, allowing faster research and development through direct interoperability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据科学的扩展:数据集标准化
随着科学技术的不断进步,处理能力和数据越来越多,数据分析技术的实施也更加直接。幸运的是,现有的在线数据存储能力也顺应了这一趋势,大量的数据可以免费或以低廉的价格在线存储。正如任何科学领域的每一次演变(或革命)一样,组织和共享这些数据对于促进新的研究或快速验证已获得的结果至关重要。为此,我们必须保证现有数据集与开发的软件(无论是商业软件还是开源软件)之间的互操作性。本文探讨了这一问题,分析了当前建立数据标准的举措,并对一些现有的在线数据集存储平台进行了比较。通过优势、劣势、机会和威胁(SWOT)分析,我们可以更好地理解提高该领域效率所应采取的战略,这直接取决于数据的特性。数据集标准的制定将直接加强学术界与产业界之间的合作和数据共享,通过直接的互操作性加快研发速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Life Cycle Assessment for the Environmental Evaluation of District Heating and Cooling: A Critical Review Towards The Development of a Governance System for Central Purchasing Body Collaboration and Performance Benefit–Risk Assessment in Sport and Recreation: Historical Development and Review of AS ISO 4980:2023 Seasonal Data Cleaning for Sales with Chase Demand Strategy Are Stakeholders’ Opinions Redundant?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1