Use of an ultrasound picture archiving and communication system to answer research questions: Description of data cleaning methods

Matthew K Moore, Gillian Whalley, Gregory T Jones, Sean Coffey
{"title":"Use of an ultrasound picture archiving and communication system to answer research questions: Description of data cleaning methods","authors":"Matthew K Moore,&nbsp;Gillian Whalley,&nbsp;Gregory T Jones,&nbsp;Sean Coffey","doi":"10.1002/ajum.12374","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction/Purpose</h3>\n \n <p>Ultrasound picture archiving and communication system (PACS) databases are useful for quality improvement and clinical research but frequently contain free text that is not easily readable. Here, we present a method to extract and clean a semi-structured echocardiography (cardiac ultrasound) PACS database.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Echocardiography studies between 1 January 2010 and 31 December 2018 were extracted using a data mining tool. Numeric variables were recoded with extreme values excluded. Analysis of free text, including descriptions of the heart valves and right and left ventricular size and function, was performed using a rule-based system. Different levels of free text variables were initially identified using commonly used phrases and then iteratively developed. Randomly selected sets of 100 studies were compared to the electronic health record to validate the data cleaning process.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The data validation step was performed three times in total, with Cohen's kappa ranging between 0.88 and 1.00 for the final set of data validation across all measures.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Free text cleaning of semi-structured PACS databases is possible using freely available open-source software. The accuracy of this method is high, and the resulting dataset can be linked to administrative data to answer research questions. We present a method that could be used to answer clinical questions or to develop quality improvement initiatives.</p>\n </section>\n </div>","PeriodicalId":36517,"journal":{"name":"Australasian Journal of Ultrasound in Medicine","volume":"27 1","pages":"49-55"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajum.12374","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australasian Journal of Ultrasound in Medicine","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ajum.12374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction/Purpose

Ultrasound picture archiving and communication system (PACS) databases are useful for quality improvement and clinical research but frequently contain free text that is not easily readable. Here, we present a method to extract and clean a semi-structured echocardiography (cardiac ultrasound) PACS database.

Methods

Echocardiography studies between 1 January 2010 and 31 December 2018 were extracted using a data mining tool. Numeric variables were recoded with extreme values excluded. Analysis of free text, including descriptions of the heart valves and right and left ventricular size and function, was performed using a rule-based system. Different levels of free text variables were initially identified using commonly used phrases and then iteratively developed. Randomly selected sets of 100 studies were compared to the electronic health record to validate the data cleaning process.

Results

The data validation step was performed three times in total, with Cohen's kappa ranging between 0.88 and 1.00 for the final set of data validation across all measures.

Conclusion

Free text cleaning of semi-structured PACS databases is possible using freely available open-source software. The accuracy of this method is high, and the resulting dataset can be linked to administrative data to answer research questions. We present a method that could be used to answer clinical questions or to develop quality improvement initiatives.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用超声图片存档和通信系统回答研究问题:数据清理方法说明
超声图片存档和通信系统(PACS)数据库对质量改进和临床研究非常有用,但经常包含不易阅读的自由文本。在此,我们介绍一种提取和清理半结构化超声心动图(心脏超声)PACS 数据库的方法。我们使用数据挖掘工具提取了 2010 年 1 月 1 日至 2018 年 12 月 31 日期间的超声心动图研究。对数值变量进行了重新编码,并排除了极端值。使用基于规则的系统分析自由文本,包括对心脏瓣膜和左右心室大小及功能的描述。最初使用常用短语确定不同层次的自由文本变量,然后进行反复开发。随机抽取的 100 组研究报告与电子健康记录进行对比,以验证数据清理过程。数据验证步骤共进行了三次,在所有措施的最后一组数据验证中,科恩卡帕(Cohen's kappa)介于 0.88 和 1.00 之间。使用免费提供的开源软件可以对半结构化 PACS 数据库进行自由文本清理。这种方法的准确性很高,所得到的数据集可以与行政数据相连接,从而回答研究问题。我们介绍的方法可用于回答临床问题或制定质量改进计划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Australasian Journal of Ultrasound in Medicine
Australasian Journal of Ultrasound in Medicine Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
1.90
自引率
0.00%
发文量
40
期刊最新文献
Issue Information The impact of ultrasound imaging on patient management – Let's practice the evidence EUS‐guided tissue acquisition from gastric subepithelial lesions—The optimal technique still remains undecided Ultrasound‐assisted and landmark‐based nusinersen delivery in spinal muscular atrophy adults: A retrospective analysis Cutaneous ultrasound in the diagnosis and assessment of inflammatory activity in tinea capitis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1