DICOM 元数据和烧入像素文本的高效去标识化方法

IF 2.9 2区工程技术 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Journal of Digital Imaging Pub Date : 2024-04-08 DOI:10.1007/s10278-024-01098-7

Jacob A. Macdonald, Katelyn R. Morgan, Brandon Konkel, Kulsoom Abdullah, Mark Martin, Cory Ennis, Joseph Y. Lo, Marissa Stroo, Denise C. Snyder, Mustafa R. Bashir

{"title":"DICOM 元数据和烧入像素文本的高效去标识化方法","authors":"Jacob A. Macdonald, Katelyn R. Morgan, Brandon Konkel, Kulsoom Abdullah, Mark Martin, Cory Ennis, Joseph Y. Lo, Marissa Stroo, Denise C. Snyder, Mustafa R. Bashir","doi":"10.1007/s10278-024-01098-7","DOIUrl":null,"url":null,"abstract":"<p>De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI “burned-in” to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"30 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Method for Efficient De-identification of DICOM Metadata and Burned-in Pixel Text\",\"authors\":\"Jacob A. Macdonald, Katelyn R. Morgan, Brandon Konkel, Kulsoom Abdullah, Mark Martin, Cory Ennis, Joseph Y. Lo, Marissa Stroo, Denise C. Snyder, Mustafa R. Bashir\",\"doi\":\"10.1007/s10278-024-01098-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI “burned-in” to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).</p>\",\"PeriodicalId\":50214,\"journal\":{\"name\":\"Journal of Digital Imaging\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Imaging\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s10278-024-01098-7\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-024-01098-7","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

DICOM 图像的去标识化是医学图像研究的重要组成部分。虽然有许多成熟的方法可以安全去除 DICOM 元数据中的受保护健康信息 (PHI)，但去除 "烙印 "在图像像素数据中的 PHI 的方法通常都是人工操作，而自动高通量方法还没有得到很好的验证。新兴的光学字符识别 (OCR) 模型有可能检测并移除医学图像中含有 PHI 的文本，但在典型研究中发现的大量图像上运行非常耗时。我们介绍了一种数据处理方法，该方法可对所有图像执行元数据去标识化处理，并结合一种有针对性的方法，只将 OCR 应用于极有可能存在焚烧文本的图像。该方法在一个包含 415,182 张图像的数据集上进行了验证，该数据集涵盖十种模式，代表了本机构 20 年来所提交的去标识化请求。在该数据集中的 12,578 张带有任何类型烧入文本的图像中，只有 10 张未被该方法检测到。只有 6050 张图像（占数据集的 1.5%）需要进行 OCR 识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Method for Efficient De-identification of DICOM Metadata and Burned-in Pixel Text

De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI “burned-in” to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Digital Imaging 医学-核医学

CiteScore

7.50

自引率

6.80%

发文量

192

审稿时长

6-12 weeks

期刊介绍： The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals. Suggested Topics PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.