Jacob A. Macdonald, Katelyn R. Morgan, Brandon Konkel, Kulsoom Abdullah, Mark Martin, Cory Ennis, Joseph Y. Lo, Marissa Stroo, Denise C. Snyder, Mustafa R. Bashir
{"title":"DICOM 元数据和烧入像素文本的高效去标识化方法","authors":"Jacob A. Macdonald, Katelyn R. Morgan, Brandon Konkel, Kulsoom Abdullah, Mark Martin, Cory Ennis, Joseph Y. Lo, Marissa Stroo, Denise C. Snyder, Mustafa R. Bashir","doi":"10.1007/s10278-024-01098-7","DOIUrl":null,"url":null,"abstract":"<p>De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI “burned-in” to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"30 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Method for Efficient De-identification of DICOM Metadata and Burned-in Pixel Text\",\"authors\":\"Jacob A. Macdonald, Katelyn R. Morgan, Brandon Konkel, Kulsoom Abdullah, Mark Martin, Cory Ennis, Joseph Y. Lo, Marissa Stroo, Denise C. Snyder, Mustafa R. Bashir\",\"doi\":\"10.1007/s10278-024-01098-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI “burned-in” to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).</p>\",\"PeriodicalId\":50214,\"journal\":{\"name\":\"Journal of Digital Imaging\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Imaging\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s10278-024-01098-7\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-024-01098-7","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
A Method for Efficient De-identification of DICOM Metadata and Burned-in Pixel Text
De-identification of DICOM images is an essential component of medical image research. While many established methods exist for the safe removal of protected health information (PHI) in DICOM metadata, approaches for the removal of PHI “burned-in” to image pixel data are typically manual, and automated high-throughput approaches are not well validated. Emerging optical character recognition (OCR) models can potentially detect and remove PHI-bearing text from medical images but are very time-consuming to run on the high volume of images found in typical research studies. We present a data processing method that performs metadata de-identification for all images combined with a targeted approach to only apply OCR to images with a high likelihood of burned-in text. The method was validated on a dataset of 415,182 images across ten modalities representative of the de-identification requests submitted at our institution over a 20-year span. Of the 12,578 images in this dataset with burned-in text of any kind, only 10 passed undetected with the method. OCR was only required for 6050 images (1.5% of the dataset).
期刊介绍:
The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals.
Suggested Topics
PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.