协调数据质量失败的系统方法:使用脊髓损伤数据的调查

ACI open Pub Date : 2021-07-01 DOI:10.1055/s-0041-1735975
Nandini Anantharama, Wray L. Buntine, Andrew Nunn
{"title":"协调数据质量失败的系统方法:使用脊髓损伤数据的调查","authors":"Nandini Anantharama, Wray L. Buntine, Andrew Nunn","doi":"10.1055/s-0041-1735975","DOIUrl":null,"url":null,"abstract":"Abstract Background Secondary use of electronic health record's (EHR) data requires evaluation of data quality (DQ) for fitness of use. While multiple frameworks exist for quantifying DQ, there are no guidelines for the evaluation of DQ failures identified through such frameworks. Objectives This study proposes a systematic approach to evaluate DQ failures through the understanding of data provenance to support exploratory modeling in machine learning. Methods Our study is based on the EHR of spinal cord injury inpatients in a state spinal care center in Australia, admitted between 2011 and 2018 (inclusive), and aged over 17 years. DQ was measured in our prerequisite step of applying a DQ framework on the EHR data through rules that quantified DQ dimensions. DQ was measured as the percentage of values per field that meet the criteria or Krippendorff's α for agreement between variables. These failures were then assessed using semistructured interviews with purposively sampled domain experts. Results The DQ of the fields in our dataset was measured to be from 0% adherent up to 100%. Understanding the data provenance of fields with DQ failures enabled us to ascertain if each DQ failure was fatal, recoverable, or not relevant to the field's inclusion in our study. We also identify the themes of data provenance from a DQ perspective as systems, processes, and actors. Conclusion A systematic approach to understanding data provenance through the context of data generation helps in the reconciliation or repair of DQ failures and is a necessary step in the preparation of data for secondary use.","PeriodicalId":72041,"journal":{"name":"ACI open","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data\",\"authors\":\"Nandini Anantharama, Wray L. Buntine, Andrew Nunn\",\"doi\":\"10.1055/s-0041-1735975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Background Secondary use of electronic health record's (EHR) data requires evaluation of data quality (DQ) for fitness of use. While multiple frameworks exist for quantifying DQ, there are no guidelines for the evaluation of DQ failures identified through such frameworks. Objectives This study proposes a systematic approach to evaluate DQ failures through the understanding of data provenance to support exploratory modeling in machine learning. Methods Our study is based on the EHR of spinal cord injury inpatients in a state spinal care center in Australia, admitted between 2011 and 2018 (inclusive), and aged over 17 years. DQ was measured in our prerequisite step of applying a DQ framework on the EHR data through rules that quantified DQ dimensions. DQ was measured as the percentage of values per field that meet the criteria or Krippendorff's α for agreement between variables. These failures were then assessed using semistructured interviews with purposively sampled domain experts. Results The DQ of the fields in our dataset was measured to be from 0% adherent up to 100%. Understanding the data provenance of fields with DQ failures enabled us to ascertain if each DQ failure was fatal, recoverable, or not relevant to the field's inclusion in our study. We also identify the themes of data provenance from a DQ perspective as systems, processes, and actors. Conclusion A systematic approach to understanding data provenance through the context of data generation helps in the reconciliation or repair of DQ failures and is a necessary step in the preparation of data for secondary use.\",\"PeriodicalId\":72041,\"journal\":{\"name\":\"ACI open\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACI open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1055/s-0041-1735975\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACI open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1055/s-0041-1735975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

摘要背景 电子健康记录(EHR)数据的二次使用需要对数据质量(DQ)的适用性进行评估。虽然存在多种量化DQ的框架,但没有通过这些框架评估DQ故障的指南。目标 本研究提出了一种通过理解数据来源来评估DQ故障的系统方法,以支持机器学习中的探索性建模。方法 我们的研究基于澳大利亚一家州立脊柱护理中心2011年至2018年(含)收治的17岁以上脊髓损伤住院患者的EHR。DQ是在我们通过量化DQ维度的规则对EHR数据应用DQ框架的先决步骤中测量的。DQ测量为每个字段符合标准的值的百分比,或变量之间一致性的Krippendorffα。然后,使用有针对性的抽样领域专家的半结构化访谈来评估这些故障。后果 我们的数据集中字段的DQ被测量为从0%粘附到100%。了解DQ故障字段的数据来源使我们能够确定每个DQ故障是否是致命的、可恢复的,或者与我们研究中包含的字段无关。我们还从DQ的角度将数据来源的主题确定为系统、过程和参与者。结论 通过数据生成的背景来理解数据来源的系统方法有助于DQ故障的协调或修复,也是准备二次使用数据的必要步骤。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Systematic Approach to Reconciling Data Quality Failures: Investigation Using Spinal Cord Injury Data
Abstract Background Secondary use of electronic health record's (EHR) data requires evaluation of data quality (DQ) for fitness of use. While multiple frameworks exist for quantifying DQ, there are no guidelines for the evaluation of DQ failures identified through such frameworks. Objectives This study proposes a systematic approach to evaluate DQ failures through the understanding of data provenance to support exploratory modeling in machine learning. Methods Our study is based on the EHR of spinal cord injury inpatients in a state spinal care center in Australia, admitted between 2011 and 2018 (inclusive), and aged over 17 years. DQ was measured in our prerequisite step of applying a DQ framework on the EHR data through rules that quantified DQ dimensions. DQ was measured as the percentage of values per field that meet the criteria or Krippendorff's α for agreement between variables. These failures were then assessed using semistructured interviews with purposively sampled domain experts. Results The DQ of the fields in our dataset was measured to be from 0% adherent up to 100%. Understanding the data provenance of fields with DQ failures enabled us to ascertain if each DQ failure was fatal, recoverable, or not relevant to the field's inclusion in our study. We also identify the themes of data provenance from a DQ perspective as systems, processes, and actors. Conclusion A systematic approach to understanding data provenance through the context of data generation helps in the reconciliation or repair of DQ failures and is a necessary step in the preparation of data for secondary use.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dealing with Diversity in Digital Psychological Interventions for Young People: A Structured Review Enhancing Secure Messaging in Electronic Health Records: Evaluating the Impact of Emoji Chat Reactions on the Volume of Interruptive Notifications Using Electronic Health Record Mortality Data to Promote Goals-of-Care Discussions in Seriously Ill Transferred Patients: A Pilot Study User-centered Design and Formative Evaluation of a Web Application to Collect and Visualize Real-time Clinician Well-being Levels Factors Influencing Health Care Professionals' Perceptions of Frequent Drug–Drug Interaction Alerts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1