Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.

IF 1.1 Online journal of public health informatics Pub Date : 2024-08-01 DOI:10.2196/56237

David Amadi, Sylvia Kiwuwa-Muyingo, Tathagata Bhattacharjee, Amelia Taylor, Agnes Kiragga, Michael Ochola, Chifundo Kanjala, Arofan Gregory, Keith Tomlin, Jim Todd, Jay Greenfield

{"title":"Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.","authors":"David Amadi, Sylvia Kiwuwa-Muyingo, Tathagata Bhattacharjee, Amelia Taylor, Agnes Kiragga, Michael Ochola, Chifundo Kanjala, Arofan Gregory, Keith Tomlin, Jim Todd, Jay Greenfield","doi":"10.2196/56237","DOIUrl":null,"url":null,"abstract":"Background: Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.Objective: To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.Methods: The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.Results: The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.Conclusions: The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.","PeriodicalId":74345,"journal":{"name":"Online journal of public health informatics","volume":"16 ","pages":"e56237"},"PeriodicalIF":1.1000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11327634/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online journal of public health informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/56237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse.

Objective: To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications.

Methods: The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya.

Results: The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website.

Conclusions: The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使元数据机器可读是提供可查找、可访问、可互操作和可重复使用的人口健康数据的第一步：框架开发与实施研究》。

背景：元数据描述并提供其他数据的上下文，在实现可查找性、可访问性、互操作性和可重用性（FAIR）数据原则方面发挥着关键作用。元数据为数字资源提供了全面的、机器可读的描述，使机器和人类用户都能在不同的平台和应用中无缝地发现、访问、整合和重用数据或内容。然而，人口健康数据现有元数据的有限可访问性和机器可读性阻碍了数据的有效发现和重用：为了应对这些挑战，我们提出了一个使用标准化格式、词汇表和协议的综合框架，以实现人口健康数据的机器可读性，从而显著提高其公平性，并实现跨不同平台和研究应用的无缝发现、访问和整合：方法：该框架分为三个阶段。第一阶段是数据文档倡议（DDI）整合，包括利用 DDI 代码手册元数据和数据及相关资产的详细信息文档，同时确保透明度和全面性。第二阶段是观察性医疗结果伙伴关系（OMOP）通用数据模型（CDM）标准化。在这一阶段，数据被统一和标准化到 OMOP CDM 中，从而便于对异构数据集进行统一分析。第三阶段涉及Schema.org和JavaScript关联数据对象标记（JSON-LD）的整合，在这一阶段，使用Schema.org实体生成机器可读的元数据，并使用JSON-LD嵌入数据中，从而提高机器和人类用户的可发现性和可理解性。我们使用马拉维和肯尼亚的综合疾病监测和响应（IDSR）数据演示了这三个阶段的实施：结果：我们的框架的实施大大提高了人口健康数据的公平性，通过与谷歌数据集搜索等平台的无缝集成，提高了数据的可发现性。标准化格式和协议的采用简化了数据的可访问性以及在各种研究环境中的整合，促进了合作和知识共享。此外，使用机器可解释的元数据使研究人员能够有效地重复使用数据，进行有针对性的分析和洞察，从而最大限度地提高人口健康资源的整体价值。JSON-LD 代码可通过 GitHub 存储库访问，与 JSON-LD 集成的 HTML 代码可在研究实体人口信息共享实施网络网站上查阅：采用机器可读元数据标准对于确保人口健康数据的公平性至关重要。通过采用这些标准，各组织可以提高各种资源的可见性、可获取性和实用性，从而产生更广泛的影响，尤其是在中低收入国家。机器可读元数据可以加速研究，改善医疗决策，并最终促进全球人口获得更好的健康结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Online journal of public health informatics

自引率

0.00%

发文量

审稿时长

10 weeks