首页 > 最新文献

Data Intelligence最新文献

英文 中文
HUSS: A Heuristic Method for Understanding the Semantic Structure of Spreadsheets HUSS:一种理解电子表格语义结构的启发式方法
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-11-01 DOI: 10.1109/ICKG55886.2022.00049
Xindong Wu, Hao Chen, Chenyang Bu, Shengwei Ji, Zan Zhang, Victor S. Sheng
ABSTRACT Spreadsheets contain a lot of valuable data and have many practical applications. The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets, e.g., identifying cell function types and discovering relationships between cell pairs. Most existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells. A few studies do, but they ignore the layout structure information of spreadsheets, which affects the performance of cell function classification and the discovery of different relationship types of cell pairs. In this paper, we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets (HUSS). Specifically, for improving the cell function classification, we propose an error correction mechanism (ECM) based on an existing cell function classification model [11] and the layout features of spreadsheets. For improving the table structure analysis, we propose five types of heuristic rules to extract four different types of cell pairs, based on the cell style and spatial location information. Our experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.
电子表格包含了大量有价值的数据,有许多实际应用。这些实际应用的关键技术是如何使机器理解电子表格的语义结构,例如,识别单元格功能类型和发现单元格对之间的关系。大多数现有的理解电子表格语义结构的方法都没有利用单元格的语义信息。虽然有一些研究做到了这一点,但它们忽略了电子表格的布局结构信息,从而影响了单元格功能分类的性能和单元格对不同关系类型的发现。本文提出了一种理解电子表格语义结构的启发式算法(HUSS)。具体来说,为了改进单元格功能分类,我们提出了一种基于现有单元格功能分类模型[11]和电子表格布局特征的纠错机制(ECM)。为了改进表结构分析,我们提出了基于单元格样式和空间位置信息的五种启发式规则来提取四种不同类型的单元格对。我们在五个真实数据集上的实验结果表明,HUSS可以有效地理解电子表格的语义结构,并且优于相应的基线。
{"title":"HUSS: A Heuristic Method for Understanding the Semantic Structure of Spreadsheets","authors":"Xindong Wu, Hao Chen, Chenyang Bu, Shengwei Ji, Zan Zhang, Victor S. Sheng","doi":"10.1109/ICKG55886.2022.00049","DOIUrl":"https://doi.org/10.1109/ICKG55886.2022.00049","url":null,"abstract":"ABSTRACT Spreadsheets contain a lot of valuable data and have many practical applications. The key technology of these practical applications is how to make machines understand the semantic structure of spreadsheets, e.g., identifying cell function types and discovering relationships between cell pairs. Most existing methods for understanding the semantic structure of spreadsheets do not make use of the semantic information of cells. A few studies do, but they ignore the layout structure information of spreadsheets, which affects the performance of cell function classification and the discovery of different relationship types of cell pairs. In this paper, we propose a Heuristic algorithm for Understanding the Semantic Structure of spreadsheets (HUSS). Specifically, for improving the cell function classification, we propose an error correction mechanism (ECM) based on an existing cell function classification model [11] and the layout features of spreadsheets. For improving the table structure analysis, we propose five types of heuristic rules to extract four different types of cell pairs, based on the cell style and spatial location information. Our experimental results on five real-world datasets demonstrate that HUSS can effectively understand the semantic structure of spreadsheets and outperforms corresponding baselines.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"537-559"},"PeriodicalIF":3.9,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43443624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Analysis of Crosswalks from Research Data Schemas to Schema.org 从研究数据模式到Schema.org的交叉分析
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-10-07 DOI: 10.1162/dint_a_00186
Mingfang Wu, S. Richard, C. Verhey, L. J. Castro, Baptiste Cecconi, N. Juty
ABSTRACT The increased number of data repositories has greatly increased the availability of open data. To enable broad discovery and access to research dataset, some data repositories have begun leveraging the web architecture by embedding structured metadata markup in dataset web landing pages using vocabularies from Schema.org and extensions. This paper aims to examine metadata interoperability for supporting global data discovery. Specifically, the paper reports a survey on which metadata schema has been adopted by participating data repositories, and presents an analysis of crosswalks from fourteen research data schemas to Schema.org. The analysis indicates most descriptive metadata are interoperable among the schemas, the most inconsistent mapping is the rights metadata, and a large gap exists in the structural metadata and controlled vocabularies to specify various property values. The analysis and collated crosswalks can serve as a reference for data repositories when they develop crosswalks from their own schemas to Schema.org, and provide the research data community a benchmark of structured metadata implementation.
摘要数据存储库数量的增加极大地提高了开放数据的可用性。为了实现对研究数据集的广泛发现和访问,一些数据存储库已经开始利用web架构,使用Schema.org和扩展中的词汇表在数据集web登录页中嵌入结构化元数据标记。本文旨在研究支持全局数据发现的元数据互操作性。具体而言,本文报告了一项关于参与的数据存储库采用了哪些元数据模式的调查,并对schema.org上的14个研究数据模式中的人行横道进行了分析。分析表明,大多数描述性元数据在模式之间是可互操作的,最不一致的映射是权利元数据,并且在用于指定各种属性值的结构元数据和受控词汇表中存在大的间隙。分析和整理的人行横道可以作为数据存储库在将自己的模式开发到Schema.org时的参考,并为研究数据社区提供结构化元数据实现的基准。
{"title":"An Analysis of Crosswalks from Research Data Schemas to Schema.org","authors":"Mingfang Wu, S. Richard, C. Verhey, L. J. Castro, Baptiste Cecconi, N. Juty","doi":"10.1162/dint_a_00186","DOIUrl":"https://doi.org/10.1162/dint_a_00186","url":null,"abstract":"ABSTRACT The increased number of data repositories has greatly increased the availability of open data. To enable broad discovery and access to research dataset, some data repositories have begun leveraging the web architecture by embedding structured metadata markup in dataset web landing pages using vocabularies from Schema.org and extensions. This paper aims to examine metadata interoperability for supporting global data discovery. Specifically, the paper reports a survey on which metadata schema has been adopted by participating data repositories, and presents an analysis of crosswalks from fourteen research data schemas to Schema.org. The analysis indicates most descriptive metadata are interoperable among the schemas, the most inconsistent mapping is the rights metadata, and a large gap exists in the structural metadata and controlled vocabularies to specify various property values. The analysis and collated crosswalks can serve as a reference for data repositories when they develop crosswalks from their own schemas to Schema.org, and provide the research data community a benchmark of structured metadata implementation.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"100-121"},"PeriodicalIF":3.9,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49610991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
FAIR Equivalency in Indonesia's Digital Health Framework 印度尼西亚数字卫生框架中的公平对等
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-10-01 DOI: 10.1162/dint_a_00171
Putu Hadi Purnama Jati
Abstract The objective of this study was to assess the regulatory framework for health data in Indonesia in order to understand the policy context and explore the possibility of expanding the adoption and implementation of the FAIR Guidelines, which state that data should be Findable, Accessible, Interoperable and Reusable (FAIR), in Indonesia. Although the FAIR Guidelines were not explicitly mentioned in any of the policy documents relevant to the Indonesian digital health sector, six out of the eight documents analysed contained FAIR Equivalent principles. In particular, Indonesia's Population Identification Number (NIK) has the potential, as a unique identifier, to support the integration and interoperability (findability) of data, which is crucial to all other aspects of the FAIR Guidelines. There is also a plan to build standards and protocols into the implementation of information systems in each ministry and government agency to improve data accessibility (accessibility), the integration of the various information systems is planned/ongoing (interoperability), and the need for a standardised arrangement for health information systems related to health data following the community standard is recognised (reusability). The documents at the core of Indonesia's digital health/eHealth policy have the highest FAIR Equivalency Score (FE-Score), showing some degree of alignment between the Indonesian digital health implementation vision and the FAIR Guidelines. This indicates that Indonesia's digital health sector is open to using the FAIR Guidelines.
本研究的目的是评估印度尼西亚卫生数据的监管框架,以了解政策背景,并探索扩大公平准则的采用和实施的可能性,该准则指出,印度尼西亚的数据应该是可查找的、可访问的、可互操作的和可重复使用的(FAIR)。虽然与印度尼西亚数字卫生部门有关的任何政策文件都没有明确提到公平准则,但所分析的8个文件中有6个包含公平等效原则。特别是,印度尼西亚的人口识别号码(NIK)作为唯一标识符,具有支持数据整合和互操作性(可查找性)的潜力,这对《公平准则》的所有其他方面都至关重要。还有一项计划,在每个部委和政府机构的信息系统实施中建立标准和协议,以改善数据的可访问性(可访问性),计划/正在进行各种信息系统的整合(互操作性),并认识到需要按照社区标准对卫生数据相关的卫生信息系统进行标准化安排(可重用性)。作为印度尼西亚数字卫生/电子卫生政策核心的文件具有最高的公平等效分数(FE-Score),表明印度尼西亚数字卫生实施愿景与公平指南之间存在一定程度的一致性。这表明印度尼西亚的数字卫生部门对使用《公平准则》持开放态度。
{"title":"FAIR Equivalency in Indonesia's Digital Health Framework","authors":"Putu Hadi Purnama Jati","doi":"10.1162/dint_a_00171","DOIUrl":"https://doi.org/10.1162/dint_a_00171","url":null,"abstract":"Abstract The objective of this study was to assess the regulatory framework for health data in Indonesia in order to understand the policy context and explore the possibility of expanding the adoption and implementation of the FAIR Guidelines, which state that data should be Findable, Accessible, Interoperable and Reusable (FAIR), in Indonesia. Although the FAIR Guidelines were not explicitly mentioned in any of the policy documents relevant to the Indonesian digital health sector, six out of the eight documents analysed contained FAIR Equivalent principles. In particular, Indonesia's Population Identification Number (NIK) has the potential, as a unique identifier, to support the integration and interoperability (findability) of data, which is crucial to all other aspects of the FAIR Guidelines. There is also a plan to build standards and protocols into the implementation of information systems in each ministry and government agency to improve data accessibility (accessibility), the integration of the various information systems is planned/ongoing (interoperability), and the need for a standardised arrangement for health information systems related to health data following the community standard is recognised (reusability). The documents at the core of Indonesia's digital health/eHealth policy have the highest FAIR Equivalency Score (FE-Score), showing some degree of alignment between the Indonesian digital health implementation vision and the FAIR Guidelines. This indicates that Indonesia's digital health sector is open to using the FAIR Guidelines.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"798-812"},"PeriodicalIF":3.9,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64532083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
FAIREST: A Framework for Assessing Research Repositories FAIREST:评估研究知识库的框架
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-09-28 DOI: 10.1162/dint_a_00159
M. d’Aquin, Fabian Kirstein, Daniela Oliveira, Sonja Schimmler, Sebastian Urbanek
ABSTRACT The open science movement has gained significant momentum within the last few years. This comes along with the need to store and share research artefacts, such as publications and research data. For this purpose, research repositories need to be established. A variety of solutions exist for implementing such repositories, covering diverse features, ranging from custom depositing workflows to social media-like functions. In this article, we introduce the FAIREST principles, a framework inspired by the well-known FAIR principles, but designed to provide a set of metrics for assessing and selecting solutions for creating digital repositories for research artefacts. The goal is to support decision makers in choosing such a solution when planning for a repository, especially at an institutional level. The metrics included are therefore based on two pillars: (1) an analysis of established features and functionalities, drawn from existing dedicated, general purpose and commonly used solutions, and (2) a literature review on general requirements for digital repositories for research artefacts and related systems. We further describe an assessment of 11 widespread solutions, with the goal to provide an overview of the current landscape of research data repository solutions, identifying gaps and research challenges to be addressed.
摘要:开放科学运动在过去几年中取得了巨大的发展势头。与此同时,还需要存储和共享研究成果,如出版物和研究数据。为此,需要建立研究资料库。有多种解决方案可用于实现此类存储库,涵盖各种功能,从自定义存放工作流到类似社交媒体的功能。在本文中,我们介绍了FAIREST原则,这是一个受著名的FAIR原则启发的框架,但旨在提供一组指标,用于评估和选择创建研究成果数字存储库的解决方案。目标是支持决策者在规划存储库时选择这样的解决方案,尤其是在机构层面。因此,所包含的指标基于两个支柱:(1)从现有的专用、通用和常用解决方案中提取的既定特征和功能的分析,以及(2)对研究成果和相关系统的数字存储库的一般要求的文献综述。我们进一步描述了对11个广泛应用的解决方案的评估,目的是概述研究数据存储库解决方案的当前前景,确定差距和需要解决的研究挑战。
{"title":"FAIREST: A Framework for Assessing Research Repositories","authors":"M. d’Aquin, Fabian Kirstein, Daniela Oliveira, Sonja Schimmler, Sebastian Urbanek","doi":"10.1162/dint_a_00159","DOIUrl":"https://doi.org/10.1162/dint_a_00159","url":null,"abstract":"ABSTRACT The open science movement has gained significant momentum within the last few years. This comes along with the need to store and share research artefacts, such as publications and research data. For this purpose, research repositories need to be established. A variety of solutions exist for implementing such repositories, covering diverse features, ranging from custom depositing workflows to social media-like functions. In this article, we introduce the FAIREST principles, a framework inspired by the well-known FAIR principles, but designed to provide a set of metrics for assessing and selecting solutions for creating digital repositories for research artefacts. The goal is to support decision makers in choosing such a solution when planning for a repository, especially at an institutional level. The metrics included are therefore based on two pillars: (1) an analysis of established features and functionalities, drawn from existing dedicated, general purpose and commonly used solutions, and (2) a literature review on general requirements for digital repositories for research artefacts and related systems. We further describe an assessment of 11 widespread solutions, with the goal to provide an overview of the current landscape of research data repository solutions, identifying gaps and research challenges to be addressed.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"202-241"},"PeriodicalIF":3.9,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47618941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The FAIR Data Point: Interfaces and Tooling FAIR数据点:接口和工具
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-09-28 DOI: 10.1162/dint_a_00161
Ousamma Mohammed Benhamed, K. Burger, R. Kaliyaperumal, Luiz Olavo Bonino da Silva Santos, M. Suchánek, Jan Slifka, Mark D. Wilkinson
ABSTRACT While the FAIR Principles do not specify a technical solution for ‘FAIRness’, it was clear from the outset of the FAIR initiative that it would be useful to have commodity software and tooling that would simplify the creation of FAIR-compliant resources. The FAIR Data Point is a metadata repository that follows the DCAT(2) schema, and utilizes the Linked Data Platform to manage the hierarchical metadata layers as LDP Containers. There has been a recent flurry of development activity around the FAIR Data Point that has significantly improved its power and ease-of-use. Here we describe five specific tools—an installer, a loader, two Web-based interfaces, and an indexer—aimed at maximizing the uptake and utility of the FAIR Data Point.
摘要虽然FAIR原则没有规定“FAIRness”的技术解决方案,但从FAIR倡议一开始就很清楚,拥有商品软件和工具将有助于简化符合FAIR的资源的创建。FAIR数据点是一个遵循DCAT(2)模式的元数据存储库,并利用链接数据平台将分层元数据层管理为LDP容器。最近围绕FAIR数据点进行了一系列开发活动,显著提高了其功能和易用性。在这里,我们描述了五个特定的工具——一个安装程序、一个加载程序、两个基于Web的接口和一个索引器——旨在最大限度地利用FAIR数据点。
{"title":"The FAIR Data Point: Interfaces and Tooling","authors":"Ousamma Mohammed Benhamed, K. Burger, R. Kaliyaperumal, Luiz Olavo Bonino da Silva Santos, M. Suchánek, Jan Slifka, Mark D. Wilkinson","doi":"10.1162/dint_a_00161","DOIUrl":"https://doi.org/10.1162/dint_a_00161","url":null,"abstract":"ABSTRACT While the FAIR Principles do not specify a technical solution for ‘FAIRness’, it was clear from the outset of the FAIR initiative that it would be useful to have commodity software and tooling that would simplify the creation of FAIR-compliant resources. The FAIR Data Point is a metadata repository that follows the DCAT(2) schema, and utilizes the Linked Data Platform to manage the hierarchical metadata layers as LDP Containers. There has been a recent flurry of development activity around the FAIR Data Point that has significantly improved its power and ease-of-use. Here we describe five specific tools—an installer, a loader, two Web-based interfaces, and an indexer—aimed at maximizing the uptake and utility of the FAIR Data Point.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"184-201"},"PeriodicalIF":3.9,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49453130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FAIR data and metadata: GNSS precise positioning user perspective FAIR数据和元数据:GNSS精确定位用户视角
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-09-28 DOI: 10.1162/dint_a_00185
I. Ivánová, R. Keenan, Christopher Marshall, Lori Mancell, E. Rubinov, R. Ruddick, Nicholas Brown, Graeme Kernich
ABSTRACT The FAIR principles of Wilkinson et al. [1] are finding their way from research into application domains, one of which is the precise positioning with global satellite navigation systems (GNSS). Current GNSS users demand that data and services are findable online, accessible via open protocols (by both, machines and humans), interoperable with their legacy systems and reusable in various settings. Comprehensive metadata are essential in seamless communication between GNSS data and service providers and their users, and, for decades, geodetic and geospatial standards are efficiently implemented to support this. However, GNSS user community is transforming from precise positioning by highly specialised use by geodetic professionals to every-day precise positioning by autonomous vehicles or wellness obsessed citizens. Moreover, rapid technological developments allow alternative ways of offering data and services to their users. These transforming circumstances warrant a review whether metadata defined in generic geospatial and geodetic standards in use still support FAIR use of modern GNSS data and services across its novel user spectrum. This paper reports the results of current GNSS users’ requirements in various application sectors on the way data, metadata and services are provided. We engaged with GNSS stakeholders to validate our findings and to gain understanding on their perception of the FAIR principles. Our results confirm that offering FAIR GNSS data and services is fundamental, but for a confident use of these, there is a need to review the way metadata are offered to the community. Defining standard compliant GNSS community metadata profile and providing relevant metadata with data on-demand, the approach outlined in this paper, is a way to manage current GNSS users’ expectations and the way to improve FAIR GNSS data and service delivery for both humans and the machines.
摘要Wilkinson等人[1]的FAIR原理正在从应用领域的研究中找到出路,其中之一就是利用全球卫星导航系统(GNSS)进行精确定位。目前的全球导航卫星系统用户要求数据和服务可以在线找到,可以通过开放协议(机器和人类)访问,可以与传统系统互操作,并在各种环境中可重复使用。全面的元数据对于全球导航卫星系统数据和服务提供商及其用户之间的无缝通信至关重要,几十年来,大地测量和地理空间标准一直在有效实施,以支持这一点。然而,全球导航卫星系统用户群体正在从大地测量专业人员高度专业化的精确定位转变为自动驾驶汽车或痴迷健康的公民每天的精确定位。此外,快速的技术发展允许以其他方式向用户提供数据和服务。这些变化的情况需要审查正在使用的通用地理空间和大地测量标准中定义的元数据是否仍然支持FAIR在其新的用户频谱中使用现代GNSS数据和服务。本文报告了当前全球导航卫星系统用户在各个应用部门对提供数据、元数据和服务的方式提出的要求的结果。我们与全球导航卫星系统的利益相关者进行了接触,以验证我们的发现,并了解他们对公平竞争原则的看法。我们的研究结果证实,提供FAIR GNSS数据和服务是至关重要的,但为了充分利用这些数据和服务,有必要审查向社区提供元数据的方式。本文概述的方法是定义符合标准的全球导航卫星系统社区元数据档案,并按需提供相关元数据和数据,这是管理当前全球导航卫星服务用户期望的一种方式,也是改进FAIR全球导航卫星系统数据和为人类和机器提供服务的一种途径。
{"title":"FAIR data and metadata: GNSS precise positioning user perspective","authors":"I. Ivánová, R. Keenan, Christopher Marshall, Lori Mancell, E. Rubinov, R. Ruddick, Nicholas Brown, Graeme Kernich","doi":"10.1162/dint_a_00185","DOIUrl":"https://doi.org/10.1162/dint_a_00185","url":null,"abstract":"ABSTRACT The FAIR principles of Wilkinson et al. [1] are finding their way from research into application domains, one of which is the precise positioning with global satellite navigation systems (GNSS). Current GNSS users demand that data and services are findable online, accessible via open protocols (by both, machines and humans), interoperable with their legacy systems and reusable in various settings. Comprehensive metadata are essential in seamless communication between GNSS data and service providers and their users, and, for decades, geodetic and geospatial standards are efficiently implemented to support this. However, GNSS user community is transforming from precise positioning by highly specialised use by geodetic professionals to every-day precise positioning by autonomous vehicles or wellness obsessed citizens. Moreover, rapid technological developments allow alternative ways of offering data and services to their users. These transforming circumstances warrant a review whether metadata defined in generic geospatial and geodetic standards in use still support FAIR use of modern GNSS data and services across its novel user spectrum. This paper reports the results of current GNSS users’ requirements in various application sectors on the way data, metadata and services are provided. We engaged with GNSS stakeholders to validate our findings and to gain understanding on their perception of the FAIR principles. Our results confirm that offering FAIR GNSS data and services is fundamental, but for a confident use of these, there is a need to review the way metadata are offered to the community. Defining standard compliant GNSS community metadata profile and providing relevant metadata with data on-demand, the approach outlined in this paper, is a way to manage current GNSS users’ expectations and the way to improve FAIR GNSS data and service delivery for both humans and the machines.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"43-74"},"PeriodicalIF":3.9,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45440698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automated metadata annotation: What is and is not possible with machine learning 自动化元数据注释:机器学习可以做什么,不可以做什么
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-09-28 DOI: 10.1162/dint_a_00162
Mingfang Wu, Hans Brandhorst, M. Marinescu, J. M. López, Marjorie M. K. Hlava, J. Busch
ABSTRACT Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what's popular and what's available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.
摘要自动化元数据注释只能与训练数据集或域可用的规则一样好。了解预先训练的机器学习算法在什么类型的数据内容上进行了训练,以了解其局限性和潜在的偏见,这一点很重要。考虑什么类型的内容可以很容易地用于训练算法——什么是流行的,什么是可用的。然而,机器学习所需的大量学术和历史内容往往无法以可消费、同质化和可互操作的格式提供。也有例外,比如科学和医学,那里有大量的、有充分记录的藏品。本文介绍了文化遗产和研究数据中自动元数据注释的现状,讨论了从用例中发现的挑战,并提出了解决方案。
{"title":"Automated metadata annotation: What is and is not possible with machine learning","authors":"Mingfang Wu, Hans Brandhorst, M. Marinescu, J. M. López, Marjorie M. K. Hlava, J. Busch","doi":"10.1162/dint_a_00162","DOIUrl":"https://doi.org/10.1162/dint_a_00162","url":null,"abstract":"ABSTRACT Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to understand its limitations and potential biases. Consider what type of content is readily available to train an algorithm—what's popular and what's available. However, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. There are exceptions such as science and medicine, where large, well documented collections are available. This paper presents the current state of automated metadata annotation in cultural heritage and research data, discusses challenges identified from use cases, and proposes solutions.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"5 1","pages":"122-138"},"PeriodicalIF":3.9,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47658196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Terminology for a FAIR Framework for the Virus Outbreak Data Network-Africa 病毒爆发数据网络公平框架术语-非洲
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-08-18 DOI: 10.1162/dint_a_00167
Ruduan Plug, Yan Liang, Aliya Aktau, Mariam Basajja, Francisca Onaolapo Oladipo, M. van Reisen
Abstract The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network (VODAN) architecture builds on these principles, using the European Union's General Data Protection Regulation (GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.
卫生数据管理领域在数据所有权、数据主体的隐私和数据的可重用性方面提出了独特的挑战。制定《公平准则》就是为了应对这些挑战。病毒爆发数据网络(VODAN)架构以这些原则为基础,使用欧盟的《一般数据保护条例》(GDPR)框架确保遵守当地数据法规,同时使用信息知识管理概念进一步改进数据来源和互操作性。在本文中,我们概述了FAIR数据管理领域中使用的术语,并特别关注在VODAN架构中实现的符合FAIR的健康信息管理。
{"title":"Terminology for a FAIR Framework for the Virus Outbreak Data Network-Africa","authors":"Ruduan Plug, Yan Liang, Aliya Aktau, Mariam Basajja, Francisca Onaolapo Oladipo, M. van Reisen","doi":"10.1162/dint_a_00167","DOIUrl":"https://doi.org/10.1162/dint_a_00167","url":null,"abstract":"Abstract The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network (VODAN) architecture builds on these principles, using the European Union's General Data Protection Regulation (GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"698-723"},"PeriodicalIF":3.9,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47351158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FAIR Machine Learning Model Pipeline Implementation of COVID-19 Data COVID-19数据的FAIR机器学习模型流水线实现
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-08-18 DOI: 10.1162/dint_a_00182
Sakinat Folorunso, E. Ogundepo, Mariam Basajja, Joseph Awotunde, A. Kawu, Francisca Onaolapo Oladipo, Ibrahim Abdullahi
Abstract Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group the data into homogeneous subgroups and ascertain the underlying structure of the data using a Nigerian-based FAIR dataset that contains data on economic factors, healthcare facilities, and coronavirus occurrences in all the 36 states of Nigeria. The model showed that research data and the ML pipeline can be FAIRified, shared, and reused by following the proposed FAIRification workflow and implementing technical architecture.
研究和发展正逐渐成为数据驱动的,科学数据管理和管理的FAIR指南(数据应该是可查找的、可访问的、可互操作的和可重用的)的实施有可能显著增强研究数据重用的框架。通过这种方式,FAIR正在帮助数字化转型。数据的“公平化”提高了数据的互操作性和(再)可用性,因此,新的和强大的分析工具,如机器学习(ML)模型,可以访问数据,以推断有意义的见解,提取可操作的信息,并识别隐藏的模式。本文旨在使用通用的公平工作流构建公平机器学习模型管道,使整个机器学习分析过程公平。因此,FAIR输入数据使用FAIR ML模型建模。对FAIR ML模型的输出数据也进行了FAIR处理。为此,应用混合分层k-均值(HHK)聚类ML算法将数据分组为同质子组,并使用基于尼日利亚的FAIR数据集确定数据的底层结构,该数据集包含尼日利亚所有36个州的经济因素、医疗设施和冠状病毒发病率的数据。该模型表明,通过遵循提出的farification工作流和实现技术架构,可以对研究数据和ML管道进行farification、共享和重用。
{"title":"FAIR Machine Learning Model Pipeline Implementation of COVID-19 Data","authors":"Sakinat Folorunso, E. Ogundepo, Mariam Basajja, Joseph Awotunde, A. Kawu, Francisca Onaolapo Oladipo, Ibrahim Abdullahi","doi":"10.1162/dint_a_00182","DOIUrl":"https://doi.org/10.1162/dint_a_00182","url":null,"abstract":"Abstract Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group the data into homogeneous subgroups and ascertain the underlying structure of the data using a Nigerian-based FAIR dataset that contains data on economic factors, healthcare facilities, and coronavirus occurrences in all the 36 states of Nigeria. The model showed that research data and the ML pipeline can be FAIRified, shared, and reused by following the proposed FAIRification workflow and implementing technical architecture.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"971-990"},"PeriodicalIF":3.9,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45554526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Curriculum Development for FAIR Data Stewardship FAIR数据管理课程开发
IF 3.9 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-08-18 DOI: 10.1162/dint_a_00183
Francisca Onaolapo Oladipo, Sakinat Folorunso, E. Ogundepo, Obinna Osigwe, A. Akindele
Abstract The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable (FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Guidelines gain more acceptance, an increase in the demand for data stewards is expected. Consequently, there is a need to develop curricula to foster professional skills in data stewardship through effective knowledge communication. There have been a number of initiatives aimed at bridging the gap in FAIR data management training through both formal and informal programmes. This article describes the experience of developing a digital initiative for FAIR data management training under the Digital Innovations and Skills Hub (DISH) project. The FAIR Data Management course offers 6 short on-demand certificate modules over 12 weeks. The modules are divided into two sets: FAIR data and data science. The core subjects cover elementary topics in data science, regulatory frameworks, FAIR data management, intermediate to advanced topics in FAIR Data Point installation, and FAIR data in the management of healthcare and semantic data. Each week, participants are required to devote 7–8 hours of self-study to the modules, based on the resources provided. Once they have satisfied all requirements, students are certified as FAIR data scientists and qualified to serve as both FAIR data stewards and analysts. It is expected that in-depth and focused curricula development with diverse participants will build a core of FAIR data scientists for Data Competence Centres and encourage the rapid adoption of the FAIR Guidelines for research and development.
摘要FAIR指南试图使数字数据可查找、可访问、可互操作和可重用(FAIR)。为了准备FAIR数据,一个被称为数据管理的新数据科学学科正在兴起,随着FAIR指南越来越被接受,预计对数据管理人员的需求会增加。因此,有必要制定课程,通过有效的知识交流培养数据管理方面的专业技能。已经采取了一些举措,旨在通过正式和非正式方案弥合FAIR数据管理培训方面的差距。本文描述了在数字创新和技能中心(DISH)项目下为FAIR数据管理培训开发数字计划的经验。FAIR数据管理课程在12周内提供6个简短的按需证书模块。模块分为两组:FAIR数据和数据科学。核心主题包括数据科学的基本主题、监管框架、FAIR数据管理、FAIR data Point安装的中级到高级主题,以及医疗保健和语义数据管理中的FAIR数据。根据所提供的资源,参与者每周需要花7-8小时自学模块。一旦满足了所有要求,学生就被认证为FAIR数据科学家,并有资格担任FAIR数据管理员和分析师。预计由不同参与者进行的深入而有针对性的课程开发将为数据能力中心建立FAIR数据科学家的核心,并鼓励快速采用FAIR研发指南。
{"title":"Curriculum Development for FAIR Data Stewardship","authors":"Francisca Onaolapo Oladipo, Sakinat Folorunso, E. Ogundepo, Obinna Osigwe, A. Akindele","doi":"10.1162/dint_a_00183","DOIUrl":"https://doi.org/10.1162/dint_a_00183","url":null,"abstract":"Abstract The FAIR Guidelines attempts to make digital data Findable, Accessible, Interoperable, and Reusable (FAIR). To prepare FAIR data, a new data science discipline known as data stewardship is emerging and, as the FAIR Guidelines gain more acceptance, an increase in the demand for data stewards is expected. Consequently, there is a need to develop curricula to foster professional skills in data stewardship through effective knowledge communication. There have been a number of initiatives aimed at bridging the gap in FAIR data management training through both formal and informal programmes. This article describes the experience of developing a digital initiative for FAIR data management training under the Digital Innovations and Skills Hub (DISH) project. The FAIR Data Management course offers 6 short on-demand certificate modules over 12 weeks. The modules are divided into two sets: FAIR data and data science. The core subjects cover elementary topics in data science, regulatory frameworks, FAIR data management, intermediate to advanced topics in FAIR Data Point installation, and FAIR data in the management of healthcare and semantic data. Each week, participants are required to devote 7–8 hours of self-study to the modules, based on the resources provided. Once they have satisfied all requirements, students are certified as FAIR data scientists and qualified to serve as both FAIR data stewards and analysts. It is expected that in-depth and focused curricula development with diverse participants will build a core of FAIR data scientists for Data Competence Centres and encourage the rapid adoption of the FAIR Guidelines for research and development.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"991-1012"},"PeriodicalIF":3.9,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43285441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Data Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1