首页 > 最新文献

Methods of Information in Medicine最新文献

英文 中文
Information Technology Systems for Infection Control in German University Hospitals-Results of a Structured Survey a Year into the Severe Acute Respiratory Syndrome Coronavirus 2 Pandemic. 德国大学医院感染控制的信息技术系统——对冠状病毒大流行一年的结构化调查结果
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/s-0042-1760222
Nicolás Reinoso Schiller, Martin Wiesenfeldt, Ulrike Loderstädt, Hani Kaba, Dagmar Krefting, Simone Scheithauer

Background: Digitalization is playing a major role in mastering the current coronavirus 2019 (COVID-19) pandemic. However, several outbreaks of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in German hospitals last year have shown that many of the surveillance and warning mechanisms related to infection control (IC) in hospitals need to be updated.

Objectives: The main objective of the following work was to assess the state of information technology (IT) systems supporting IC and surveillance in German university hospitals in March 2021, almost a year into the SARS-CoV-2 pandemic.

Methods: As part of the National Research Network for Applied Surveillance and Testing project within the Network University Medicine, a cross-sectional survey was conducted to assess the situation of IC IT systems in 36 university hospitals in Germany.

Results: Among the most prominent findings were the lack of standardization of IC IT systems and the predominant use of commercial IC IT systems, while the vast majority of hospitals reported inadequacies in the features their IC IT systems provide for their daily work. However, as the pandemic has shown that there is a need for systems that can help improve health care, several German university hospitals have already started this upgrade independently.

Conclusions: The deep challenges faced by the German health care sector regarding the integration and interoperability of IT systems designed for IC and surveillance are unlikely to be solved through punctual interventions and require collaboration between educational, medical, and administrative institutions.

背景:数字化在应对2019冠状病毒病大流行中发挥着重要作用。然而,去年在德国医院爆发的几起严重急性呼吸系统综合征冠状病毒2 (SARS-CoV-2)疫情表明,许多与医院感染控制(IC)相关的监测和预警机制需要更新。目标:以下工作的主要目标是评估2021年3月德国大学医院支持IC和监测的信息技术(IT)系统的状况,此时距离SARS-CoV-2大流行已近一年。方法:作为网络大学医学国家应用监测和测试研究网络项目的一部分,对德国36所大学医院的IC IT系统进行了横断面调查。结果:其中最突出的发现是IC IT系统缺乏标准化和主要使用商业IC IT系统,而绝大多数医院报告其IC IT系统为其日常工作提供的功能不足。然而,由于大流行表明需要能够帮助改善卫生保健的系统,德国几所大学医院已经独立开始了这一升级。结论:德国卫生保健部门所面临的关于集成和监控IT系统的集成和互操作性的深层次挑战不太可能通过及时干预来解决,需要教育、医疗和行政机构之间的合作。
{"title":"Information Technology Systems for Infection Control in German University Hospitals-Results of a Structured Survey a Year into the Severe Acute Respiratory Syndrome Coronavirus 2 Pandemic.","authors":"Nicolás Reinoso Schiller,&nbsp;Martin Wiesenfeldt,&nbsp;Ulrike Loderstädt,&nbsp;Hani Kaba,&nbsp;Dagmar Krefting,&nbsp;Simone Scheithauer","doi":"10.1055/s-0042-1760222","DOIUrl":"https://doi.org/10.1055/s-0042-1760222","url":null,"abstract":"<p><strong>Background: </strong>Digitalization is playing a major role in mastering the current coronavirus 2019 (COVID-19) pandemic. However, several outbreaks of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in German hospitals last year have shown that many of the surveillance and warning mechanisms related to infection control (IC) in hospitals need to be updated.</p><p><strong>Objectives: </strong>The main objective of the following work was to assess the state of information technology (IT) systems supporting IC and surveillance in German university hospitals in March 2021, almost a year into the SARS-CoV-2 pandemic.</p><p><strong>Methods: </strong>As part of the National Research Network for Applied Surveillance and Testing project within the Network University Medicine, a cross-sectional survey was conducted to assess the situation of IC IT systems in 36 university hospitals in Germany.</p><p><strong>Results: </strong>Among the most prominent findings were the lack of standardization of IC IT systems and the predominant use of commercial IC IT systems, while the vast majority of hospitals reported inadequacies in the features their IC IT systems provide for their daily work. However, as the pandemic has shown that there is a need for systems that can help improve health care, several German university hospitals have already started this upgrade independently.</p><p><strong>Conclusions: </strong>The deep challenges faced by the German health care sector regarding the integration and interoperability of IT systems designed for IC and surveillance are unlikely to be solved through punctual interventions and require collaboration between educational, medical, and administrative institutions.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e57-e62"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d0/e9/10-1055-s-0042-1760222.PMC10306444.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Quality in Health Care: Main Concepts and Assessment Methodologies. 卫生保健中的数据质量:主要概念和评估方法。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/s-0043-1761500
Mehrnaz Mashoufi, Haleh Ayatollahi, Davoud Khorasani-Zavareh, Tahere Talebi Azad Boni

Introduction: In the health care environment, a huge volume of data is produced on a daily basis. However, the processes of collecting, storing, sharing, analyzing, and reporting health data usually face with numerous challenges that lead to producing incomplete, inaccurate, and untimely data. As a result, data quality issues have received more attention than before.

Objective: The purpose of this article is to provide an insight into the data quality definitions, dimensions, and assessment methodologies.

Methods: In this article, a scoping literature review approach was used to describe and summarize the main concepts related to data quality and data quality assessment methodologies. Search terms were selected to find the relevant articles published between January 1, 2012 and September 31, 2022. The retrieved articles were then reviewed and the results were reported narratively.

Results: In total, 23 papers were included in the study. According to the results, data quality dimensions were various and different methodologies were used to assess them. Most studies used quantitative methods to measure data quality dimensions either in paper-based or computer-based medical records. Only two studies investigated respondents' opinions about data quality.

Conclusion: In health care, high-quality data not only are important for patient care, but also are vital for improving quality of health care services and better decision making. Therefore, using technical and nontechnical solutions as well as constant assessment and supervision is suggested to improve data quality.

简介:在医疗环境中,每天都会产生大量的数据。然而,收集、存储、共享、分析和报告健康数据的过程通常面临许多挑战,导致产生不完整、不准确和不及时的数据。因此,数据质量问题受到了比以往更多的关注。目的:本文的目的是提供对数据质量定义、维度和评估方法的深入了解。方法:在本文中,采用范围界定文献综述方法来描述和总结与数据质量和数据质量评估方法相关的主要概念。选取检索词查找2012年1月1日至2022年9月31日期间发表的相关文章。然后对检索到的文章进行审查,并以叙述的方式报告结果。结果:共纳入23篇论文。根据结果,数据质量维度是不同的,并使用不同的方法来评估它们。大多数研究使用定量方法来测量纸质或计算机医疗记录的数据质量维度。只有两项研究调查了受访者对数据质量的看法。结论:在卫生保健中,高质量的数据不仅对患者的护理很重要,而且对提高卫生保健服务质量和更好的决策至关重要。因此,建议采用技术和非技术解决方案,并不断进行评估和监督,以提高数据质量。
{"title":"Data Quality in Health Care: Main Concepts and Assessment Methodologies.","authors":"Mehrnaz Mashoufi,&nbsp;Haleh Ayatollahi,&nbsp;Davoud Khorasani-Zavareh,&nbsp;Tahere Talebi Azad Boni","doi":"10.1055/s-0043-1761500","DOIUrl":"https://doi.org/10.1055/s-0043-1761500","url":null,"abstract":"<p><strong>Introduction: </strong>In the health care environment, a huge volume of data is produced on a daily basis. However, the processes of collecting, storing, sharing, analyzing, and reporting health data usually face with numerous challenges that lead to producing incomplete, inaccurate, and untimely data. As a result, data quality issues have received more attention than before.</p><p><strong>Objective: </strong>The purpose of this article is to provide an insight into the data quality definitions, dimensions, and assessment methodologies.</p><p><strong>Methods: </strong>In this article, a scoping literature review approach was used to describe and summarize the main concepts related to data quality and data quality assessment methodologies. Search terms were selected to find the relevant articles published between January 1, 2012 and September 31, 2022. The retrieved articles were then reviewed and the results were reported narratively.</p><p><strong>Results: </strong>In total, 23 papers were included in the study. According to the results, data quality dimensions were various and different methodologies were used to assess them. Most studies used quantitative methods to measure data quality dimensions either in paper-based or computer-based medical records. Only two studies investigated respondents' opinions about data quality.</p><p><strong>Conclusion: </strong>In health care, high-quality data not only are important for patient care, but also are vital for improving quality of health care services and better decision making. Therefore, using technical and nontechnical solutions as well as constant assessment and supervision is suggested to improve data quality.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"5-18"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10163566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic Identification of Self-Reported COVID-19 Vaccine Information from Vaccine Adverse Events Reporting System. 疫苗不良事件报告系统中自报COVID-19疫苗信息的自动识别
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/s-0042-1760248
Jay S Patel, Sonya Zhan, Zasim Siddiqui, Bari Dzomba, Huanmei Wu

Background: The short time frame between the coronavirus disease 2019 (COVID-19) pandemic declaration and the vaccines authorization led to concerns among public regarding the safety and efficacy of the vaccines. The Food and Drug Administration uses the Vaccine Adverse Events Reporting System (VAERS) where general population can report their vaccine side effects in the text box. This information could be utilized to determine self-reported vaccine side effects.

Objective: To develop a supervised and unsupervised natural language processing (NLP) pipeline to extract self-reported COVID-19 vaccination side effects, location of the side effects, medications, and possibly false/misinformation seeking further investigation in a structured format for analysis and reporting.

Methods: We utilized the VAERS dataset of COVID-19 vaccine reports from November 2020 to August 2022 of 725,246 individuals. We first developed a gold-standard (GS) dataset of randomly selected 1,500 records. Second, the GS was split into training, testing, and validation sets. The training dataset was used to develop the NLP applications (supervised and unsupervised) and testing and validation datasets were used to test the performances of the NLP application.

Results: The NLP application automatically extracted vaccine side effects, body locations of the side effects, medication, and possibly misinformation with moderate to high accuracy (84% sensitivity, 82% specificity, and 83% F-1 measure). We found that 23% people (386,270) faced arm soreness, 31% body swelling (226,208), 23% fatigue/body weakness (168,160), and 22% (159,873) cold/flue-like symptoms. Most of the complications occurred in the body locations such as the arm, back, chest, neck, face, and head. Over-the-counter pain medications such as Tylenol and Ibuprofen and allergy medication like Benadryl were most reported self-reported medications. Death due to COVID-19, changes in the DNA, and infertility were possible false/misinformation reported by people.

Conclusion: Some self-reported side effects such as syncope, arthralgia, and blood clotting need further clinical investigations. Our NLP application may help in extracting information from big free-text electronic datasets to help policy makers and other researchers with decision making.

背景:2019冠状病毒病(COVID-19)大流行宣布到疫苗批准的时间较短,导致公众对疫苗的安全性和有效性感到担忧。食品和药物管理局使用疫苗不良事件报告系统(VAERS),普通人群可以在文本框中报告他们的疫苗副作用。这些信息可用于确定自我报告的疫苗副作用。目的:建立有监督和无监督的自然语言处理(NLP)管道,以结构化格式提取自我报告的COVID-19疫苗接种副作用、副作用位置、药物以及可能的虚假/错误信息,以便进行进一步调查分析和报告。方法:利用VAERS数据集收集2020年11月至2022年8月725246人的COVID-19疫苗报告。我们首先开发了一个随机选择的1500条记录的金标准(GS)数据集。其次,将GS划分为训练集、测试集和验证集。训练数据集用于开发NLP应用程序(监督和无监督),测试和验证数据集用于测试NLP应用程序的性能。结果:NLP应用程序自动提取疫苗副作用、副作用的身体部位、药物和可能的错误信息,准确度中等至较高(灵敏度84%,特异性82%,F-1测量83%)。我们发现23%的人(386270)有手臂疼痛,31%的人有身体肿胀(226208),23%的人有疲劳/身体无力(168160),22%的人有感冒/流感样症状(159873)。大多数并发症发生在身体部位,如手臂、背部、胸部、颈部、面部和头部。泰诺和布洛芬等非处方止痛药以及苯海拉明等过敏药物是自我报告最多的药物。人们报告的COVID-19死亡、DNA变化和不孕症可能是错误的/错误的信息。结论:一些自述的副作用,如晕厥、关节痛、凝血等,需要进一步的临床调查。我们的NLP应用程序可以帮助从大型自由文本电子数据集中提取信息,以帮助政策制定者和其他研究人员做出决策。
{"title":"Automatic Identification of Self-Reported COVID-19 Vaccine Information from Vaccine Adverse Events Reporting System.","authors":"Jay S Patel,&nbsp;Sonya Zhan,&nbsp;Zasim Siddiqui,&nbsp;Bari Dzomba,&nbsp;Huanmei Wu","doi":"10.1055/s-0042-1760248","DOIUrl":"https://doi.org/10.1055/s-0042-1760248","url":null,"abstract":"<p><strong>Background: </strong>The short time frame between the coronavirus disease 2019 (COVID-19) pandemic declaration and the vaccines authorization led to concerns among public regarding the safety and efficacy of the vaccines. The Food and Drug Administration uses the Vaccine Adverse Events Reporting System (VAERS) where general population can report their vaccine side effects in the text box. This information could be utilized to determine self-reported vaccine side effects.</p><p><strong>Objective: </strong>To develop a supervised and unsupervised natural language processing (NLP) pipeline to extract self-reported COVID-19 vaccination side effects, location of the side effects, medications, and possibly false/misinformation seeking further investigation in a structured format for analysis and reporting.</p><p><strong>Methods: </strong>We utilized the VAERS dataset of COVID-19 vaccine reports from November 2020 to August 2022 of 725,246 individuals. We first developed a gold-standard (GS) dataset of randomly selected 1,500 records. Second, the GS was split into training, testing, and validation sets. The training dataset was used to develop the NLP applications (supervised and unsupervised) and testing and validation datasets were used to test the performances of the NLP application.</p><p><strong>Results: </strong>The NLP application automatically extracted vaccine side effects, body locations of the side effects, medication, and possibly misinformation with moderate to high accuracy (84% sensitivity, 82% specificity, and 83% F-1 measure). We found that 23% people (386,270) faced arm soreness, 31% body swelling (226,208), 23% fatigue/body weakness (168,160), and 22% (159,873) cold/flue-like symptoms. Most of the complications occurred in the body locations such as the arm, back, chest, neck, face, and head. Over-the-counter pain medications such as Tylenol and Ibuprofen and allergy medication like Benadryl were most reported self-reported medications. Death due to COVID-19, changes in the DNA, and infertility were possible false/misinformation reported by people.</p><p><strong>Conclusion: </strong>Some self-reported side effects such as syncope, arthralgia, and blood clotting need further clinical investigations. Our NLP application may help in extracting information from big free-text electronic datasets to help policy makers and other researchers with decision making.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"49-59"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9787256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Workflow, Time Requirement, and Quality of Medication Documentation with or without a Computerized Physician Order Entry System-A Simulation-Based Lab Study. 有或没有计算机化医嘱输入系统的工作流程、时间要求和药物文件质量——基于模拟的实验室研究。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/s-0042-1758631
Viktoria Jungreithmayr, Walter E Haefeli, Hanna M Seidling

Background: The introduction of a computerized physician order entry (CPOE) system is changing workflows and redistributing tasks among health care professionals.

Objectives: The aim of this study is to describe exemplary changes in workflow, to objectify the time required for medication documentation, and to evaluate documentation quality with and without a CPOE system (Cerner® i.s.h.med).

Methods: Workflows were assessed either through direct observation and in-person interviews or through semistructured online interviews with clinical staff involved in medication documentation. Two case scenarios were developed consisting of exemplary medications (case 1 = 6 drugs and case 2 = 11 drugs). Physicians and nurses/documentation assistants were observed documenting the case scenarios according to workflows established prior to CPOE implementation and those newly established with CPOE implementation, measuring the time spent on each step in the documentation process. Subsequently, the documentation quality of the documented medication was assessed according to a previously established and published methodology.

Results: CPOE implementation simplified medication documentation. The overall time needed for medication documentation increased from a median of 12:12 min (range: 07:29-21:10 min) without to 14:40 min (09:18-25:18) with the CPOE system (p = 0.002). With CPOE, less time was spent documenting peroral prescriptions and more time documenting intravenous/subcutaneous prescriptions. For physicians, documentation time approximately doubled, while nurses achieved time savings. Overall, the documentation quality increased from a median fulfillment score of 66.7% without to 100.0% with the CPOE system (p < 0.001).

Conclusion: This study revealed that CPOE implementation simplified the medication documentation process but increased the time spent on medication documentation by 20% in two fictitious cases. This increased time resulted in higher documentation quality, occurred at the expense of physicians, and was primarily due to intravenous/subcutaneous prescriptions. Therefore, measures to support physicians with complex prescriptions in the CPOE system should be established.

背景:计算机化医嘱输入(CPOE)系统的引入正在改变医疗保健专业人员的工作流程和重新分配任务。目的:本研究的目的是描述工作流程中的典型变化,客观化药物文件所需的时间,并评估使用和不使用CPOE系统(Cerner®i.s.h.med)的文件质量。方法:通过直接观察和面对面访谈或与参与药物记录的临床工作人员进行半结构化在线访谈来评估工作流程。开发了两种由示范药物组成的病例情景(病例1 = 6种药物,病例2 = 11种药物)。观察医生和护士/文档助理根据实施CPOE之前建立的工作流程和实施CPOE后新建立的工作流程记录病例场景,测量文档流程中每个步骤所花费的时间。随后,根据先前建立和发表的方法评估记录药物的文件质量。结果:CPOE的实施简化了用药记录。用药记录所需的总时间从无CPOE系统的中位数12:12 min(范围:07:29-21:10 min)增加到有CPOE系统的14:40 min (09:18-25:18) (p = 0.002)。在CPOE中,记录口服处方的时间更少,记录静脉/皮下处方的时间更多。对于医生来说,记录时间大约翻了一番,而护士则节省了时间。总体而言,使用CPOE系统后,文件质量从未使用CPOE系统的66.7%中位数满意度提高到100.0%。结论:本研究显示,在两个虚构病例中,CPOE的实施简化了药物文件编制过程,但增加了花费在药物文件编制上的时间20%。这增加的时间导致更高的文件质量,以牺牲医生为代价,主要是由于静脉/皮下处方。因此,应建立支持CPOE系统中处方复杂的医生的措施。
{"title":"Workflow, Time Requirement, and Quality of Medication Documentation with or without a Computerized Physician Order Entry System-A Simulation-Based Lab Study.","authors":"Viktoria Jungreithmayr,&nbsp;Walter E Haefeli,&nbsp;Hanna M Seidling","doi":"10.1055/s-0042-1758631","DOIUrl":"https://doi.org/10.1055/s-0042-1758631","url":null,"abstract":"<p><strong>Background: </strong>The introduction of a computerized physician order entry (CPOE) system is changing workflows and redistributing tasks among health care professionals.</p><p><strong>Objectives: </strong>The aim of this study is to describe exemplary changes in workflow, to objectify the time required for medication documentation, and to evaluate documentation quality with and without a CPOE system (Cerner® i.s.h.med).</p><p><strong>Methods: </strong>Workflows were assessed either through direct observation and in-person interviews or through semistructured online interviews with clinical staff involved in medication documentation. Two case scenarios were developed consisting of exemplary medications (case 1 = 6 drugs and case 2 = 11 drugs). Physicians and nurses/documentation assistants were observed documenting the case scenarios according to workflows established prior to CPOE implementation and those newly established with CPOE implementation, measuring the time spent on each step in the documentation process. Subsequently, the documentation quality of the documented medication was assessed according to a previously established and published methodology.</p><p><strong>Results: </strong>CPOE implementation simplified medication documentation. The overall time needed for medication documentation increased from a median of 12:12 min (range: 07:29-21:10 min) without to 14:40 min (09:18-25:18) with the CPOE system (<i>p</i> = 0.002). With CPOE, less time was spent documenting peroral prescriptions and more time documenting intravenous/subcutaneous prescriptions. For physicians, documentation time approximately doubled, while nurses achieved time savings. Overall, the documentation quality increased from a median fulfillment score of 66.7% without to 100.0% with the CPOE system (<i>p</i> < 0.001).</p><p><strong>Conclusion: </strong>This study revealed that CPOE implementation simplified the medication documentation process but increased the time spent on medication documentation by 20% in two fictitious cases. This increased time resulted in higher documentation quality, occurred at the expense of physicians, and was primarily due to intravenous/subcutaneous prescriptions. Therefore, measures to support physicians with complex prescriptions in the CPOE system should be established.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"40-48"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9787117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine. 临床医学中存在时间数据转移时保持机器学习性能的特征选择方法评价。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/s-0043-1762904
Joshua Lemmon, Lin Lawrence Guo, Jose Posada, Stephen R Pfohl, Jason Fries, Scott Lanyon Fleming, Catherine Aftandilian, Nigam Shah, Lillian Sung

Background: Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.

Methods: Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.

Results: The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.

Conclusions: While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.

背景:随着时间的推移,训练数据和部署数据之间的差异越来越大,时间数据转移会导致模型性能下降。主要目的是确定由特定特征选择方法产生的简约模型是否对分布外(OOD)性能测量的时间数据集移动更具鲁棒性,同时保持分布内(ID)性能。方法:我们的数据集包括MIMIC-IV重症监护病房患者,按年份组(2008-2010年、2011-2013年、2014-2016年和2017-2019年)进行分类。我们在2008-2010年使用l2正则化逻辑回归训练基线模型来预测所有年份组的住院死亡率、住院时间(LOS)、败血症和有创通气。我们评估了三种特征选择方法:L1正则化逻辑回归(L1)、去除和再训练(ROAR)和因果特征选择。我们评估了特征选择方法是否可以保持ID性能(2008-2010)并提高OOD性能(2017-2019)。我们还评估了在OOD数据上重新训练的简约模型是否与在OOD年份组的所有特征上训练的oracle模型一样好。结果:基线模型显示,与ID性能相比,长LOS和脓毒症任务的OOD性能明显较差。L1和ROAR保留了所有特征的3.7 - 12.6%,而因果特征选择通常保留较少的特征。L1和ROAR生成的模型具有与基线模型相似的ID和OOD性能。使用从2008-2010年数据训练中选择的特征在2017-2019年数据上对这些模型进行再训练,通常与使用所有可用特征直接在2017-2019年数据上训练的oracle模型相当。因果特征选择导致异构结果,超集保持ID性能,而仅在长LOS任务上改进OOD校准。结论:虽然模型再训练可以减轻时间数据转移对L1和ROAR生成的简约模型的影响,但需要新的方法来主动提高时间鲁棒性。
{"title":"Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.","authors":"Joshua Lemmon,&nbsp;Lin Lawrence Guo,&nbsp;Jose Posada,&nbsp;Stephen R Pfohl,&nbsp;Jason Fries,&nbsp;Scott Lanyon Fleming,&nbsp;Catherine Aftandilian,&nbsp;Nigam Shah,&nbsp;Lillian Sung","doi":"10.1055/s-0043-1762904","DOIUrl":"https://doi.org/10.1055/s-0043-1762904","url":null,"abstract":"<p><strong>Background: </strong>Temporal dataset shift can cause degradation in model performance as discrepancies between training and deployment data grow over time. The primary objective was to determine whether parsimonious models produced by specific feature selection methods are more robust to temporal dataset shift as measured by out-of-distribution (OOD) performance, while maintaining in-distribution (ID) performance.</p><p><strong>Methods: </strong>Our dataset consisted of intensive care unit patients from MIMIC-IV categorized by year groups (2008-2010, 2011-2013, 2014-2016, and 2017-2019). We trained baseline models using L2-regularized logistic regression on 2008-2010 to predict in-hospital mortality, long length of stay (LOS), sepsis, and invasive ventilation in all year groups. We evaluated three feature selection methods: L1-regularized logistic regression (L1), Remove and Retrain (ROAR), and causal feature selection. We assessed whether a feature selection method could maintain ID performance (2008-2010) and improve OOD performance (2017-2019). We also assessed whether parsimonious models retrained on OOD data performed as well as oracle models trained on all features in the OOD year group.</p><p><strong>Results: </strong>The baseline model showed significantly worse OOD performance with the long LOS and sepsis tasks when compared with the ID performance. L1 and ROAR retained 3.7 to 12.6% of all features, whereas causal feature selection generally retained fewer features. Models produced by L1 and ROAR exhibited similar ID and OOD performance as the baseline models. The retraining of these models on 2017-2019 data using features selected from training on 2008-2010 data generally reached parity with oracle models trained directly on 2017-2019 data using all available features. Causal feature selection led to heterogeneous results with the superset maintaining ID performance while improving OOD calibration only on the long LOS task.</p><p><strong>Conclusions: </strong>While model retraining can mitigate the impact of temporal dataset shift on parsimonious models produced by L1 and ROAR, new methods are required to proactively improve temporal robustness.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"60-70"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9790776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Impact of Health Care Data Completeness for Deep Generative Models. 评估医疗保健数据完整性对深度生成模型的影响。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/a-2023-9181
Benjamin Smith, Senne Van Steelandt, Anahita Khojandi

Background: Deep generative models (DGMs) present a promising avenue for generating realistic, synthetic data to augment existing health care datasets. However, exactly how the completeness of the original dataset affects the quality of the generated synthetic data is unclear.

Objectives: In this paper, we investigate the effect of data completeness on samples generated by the most common DGM paradigms.

Methods: We create both cross-sectional and panel datasets with varying missingness and subset rates and train generative adversarial networks, variational autoencoders, and autoregressive models (Transformers) on these datasets. We then compare the distributions of generated data with original training data to measure similarity.

Results: We find that increased incompleteness is directly correlated with increased dissimilarity between original and generated samples produced through DGMs.

Conclusions: Care must be taken when using DGMs to generate synthetic data as data completeness issues can affect the quality of generated data in both panel and cross-sectional datasets.

背景:深度生成模型(dgm)为生成真实的合成数据以增强现有医疗保健数据集提供了一条有前途的途径。然而,原始数据集的完整性究竟如何影响生成的合成数据的质量尚不清楚。目的:在本文中,我们研究了数据完整性对最常见的DGM范式生成的样本的影响。方法:我们创建了具有不同缺失率和子集率的横截面和面板数据集,并在这些数据集上训练生成对抗网络、变分自编码器和自回归模型(transformer)。然后,我们将生成数据的分布与原始训练数据进行比较,以衡量相似性。结果:我们发现不完整性的增加与通过dgm产生的原始样品和生成样品之间的不相似性增加直接相关。结论:在使用dgm生成合成数据时必须小心,因为数据完整性问题会影响面板和横截面数据集生成数据的质量。
{"title":"Evaluating the Impact of Health Care Data Completeness for Deep Generative Models.","authors":"Benjamin Smith,&nbsp;Senne Van Steelandt,&nbsp;Anahita Khojandi","doi":"10.1055/a-2023-9181","DOIUrl":"https://doi.org/10.1055/a-2023-9181","url":null,"abstract":"<p><strong>Background: </strong>Deep generative models (DGMs) present a promising avenue for generating realistic, synthetic data to augment existing health care datasets. However, exactly how the completeness of the original dataset affects the quality of the generated synthetic data is unclear.</p><p><strong>Objectives: </strong>In this paper, we investigate the effect of data completeness on samples generated by the most common DGM paradigms.</p><p><strong>Methods: </strong>We create both cross-sectional and panel datasets with varying missingness and subset rates and train generative adversarial networks, variational autoencoders, and autoregressive models (Transformers) on these datasets. We then compare the distributions of generated data with original training data to measure similarity.</p><p><strong>Results: </strong>We find that increased incompleteness is directly correlated with increased dissimilarity between original and generated samples produced through DGMs.</p><p><strong>Conclusions: </strong>Care must be taken when using DGMs to generate synthetic data as data completeness issues can affect the quality of generated data in both panel and cross-sectional datasets.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"31-39"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10145379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases. 参考卫生保健数据库中数据质量问题的实用分类法的定义。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/a-1976-2371
Paul Quindroit, Mathilde Fruchart, Samuel Degoul, Renaud Périchon, Julien Soula, Romaric Marcilly, Antoine Lamer

Introduction: Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.

Material: We searched the literature for publications that mentioned "data quality problems," "data quality taxonomy," "data quality assessment," or "dirty data." The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.

Results: Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.

Discussion: This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.

Conclusion: Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.

简介:卫生保健信息系统可以生成和/或记录大量数据,其中一些数据可以用于研究、临床试验或教学。然而,这些数据库可能受到数据质量问题的影响;因此,数据重用过程中的一个重要步骤是检测和纠正这些问题。为了便于评估数据质量,我们制定了一套运行数据库中数据质量问题的分类。材料:我们搜索了提到“数据质量问题”、“数据质量分类”、“数据质量评估”或“脏数据”的出版物的文献。然后使用自底向上的方法对出版物进行审查、比较、总结和结构化,以提供数据质量问题的操作分类法。后者是用临床数据库中的虚构例子(尽管基于现实)来说明的。结果:选取了12篇出版物,确定了286个数据质量问题实例,并根据六个不同的粒度级别进行了分类。我们使用Oliveira等人定义的分类来构建我们的分类法。提取的项目被分为53个数据质量问题。讨论:这种分类法通过根据数据的粒度表示数据的质量,促进了对数据库中数据质量的系统评估。这个分类法的定义是数据清理过程中的第一步。后续步骤包括定义相关的质量评估方法和数据清理方法。结论:我们的新分类法能够对医院数据库中发现的53个数据质量问题进行分类和说明。
{"title":"Definition of a Practical Taxonomy for Referencing Data Quality Problems in Health Care Databases.","authors":"Paul Quindroit,&nbsp;Mathilde Fruchart,&nbsp;Samuel Degoul,&nbsp;Renaud Périchon,&nbsp;Julien Soula,&nbsp;Romaric Marcilly,&nbsp;Antoine Lamer","doi":"10.1055/a-1976-2371","DOIUrl":"https://doi.org/10.1055/a-1976-2371","url":null,"abstract":"<p><strong>Introduction: </strong>Health care information systems can generate and/or record huge volumes of data, some of which may be reused for research, clinical trials, or teaching. However, these databases can be affected by data quality problems; hence, an important step in the data reuse process consists in detecting and rectifying these issues. With a view to facilitating the assessment of data quality, we developed a taxonomy of data quality problems in operational databases.</p><p><strong>Material: </strong>We searched the literature for publications that mentioned \"data quality problems,\" \"data quality taxonomy,\" \"data quality assessment,\" or \"dirty data.\" The publications were then reviewed, compared, summarized, and structured using a bottom-up approach, to provide an operational taxonomy of data quality problems. The latter were illustrated with fictional examples (though based on reality) from clinical databases.</p><p><strong>Results: </strong>Twelve publications were selected, and 286 instances of data quality problems were identified and were classified according to six distinct levels of granularity. We used the classification defined by Oliveira et al to structure our taxonomy. The extracted items were grouped into 53 data quality problems.</p><p><strong>Discussion: </strong>This taxonomy facilitated the systematic assessment of data quality in databases by presenting the data's quality according to their granularity. The definition of this taxonomy is the first step in the data cleaning process. The subsequent steps include the definition of associated quality assessment methods and data cleaning methods.</p><p><strong>Conclusion: </strong>Our new taxonomy enabled the classification and illustration of 53 data quality problems found in hospital databases.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"19-30"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9786699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-Quality Data for Health Care and Health Research. 卫生保健和卫生研究的高质量数据。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-05-01 DOI: 10.1055/a-2045-8287
Jürgen Stausberg, Sonja Harkener
In the 19th century, Florence Nightingale pointed to the importance of nursing documentation for the care of patients and the necessity of data-based statistics for quality improvement. The same century, John Snow projected his observations about patients with Cholera on a street map, laying the ground for modern epidemiological science. The historical examples demonstrate that proper data are the foundation of relevant information about individuals and of new scientific evidence. In the ideal case of Ackoff's pyramid, information, knowledge, understanding, and wisdom arise from data.
{"title":"High-Quality Data for Health Care and Health Research.","authors":"Jürgen Stausberg,&nbsp;Sonja Harkener","doi":"10.1055/a-2045-8287","DOIUrl":"https://doi.org/10.1055/a-2045-8287","url":null,"abstract":"In the 19th century, Florence Nightingale pointed to the importance of nursing documentation for the care of patients and the necessity of data-based statistics for quality improvement. The same century, John Snow projected his observations about patients with Cholera on a street map, laying the ground for modern epidemiological science. The historical examples demonstrate that proper data are the foundation of relevant information about individuals and of new scientific evidence. In the ideal case of Ackoff's pyramid, information, knowledge, understanding, and wisdom arise from data.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 1-02","pages":"1-4"},"PeriodicalIF":1.7,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10164150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation. COVID-19数据集市验证的数字分析患者审稿人(DAPR)。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-12-01 DOI: 10.1055/a-1938-0436
Heekyong Park, Taowei David Wang, Nich Wattanasin, Victor M Castro, Vivian Gainer, Sergey Goryachev, Shawn Murphy

Objective: To provide high-quality data for coronavirus disease 2019 (COVID-19) research, we validated derived COVID-19 clinical indicators and 22 associated machine learning phenotypes, in the Mass General Brigham (MGB) COVID-19 Data Mart.

Methods: Fifteen reviewers performed a retrospective manual chart review for 150 COVID-19-positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered a natural language processing (NLP)-based chart review tool, the Digital Analytic Patient Reviewer (DAPR). For this work, we designed a dedicated patient summary view and developed new 127 NLP logics to extract COVID-19 relevant medical concepts and target phenotypes. Moreover, we transformed DAPR for research purposes so that patient information is used for an approved research purpose only and enabled fast access to the integrated patient information. Lastly, we performed a survey to evaluate the validation difficulty and usefulness of the DAPR.

Results: The concepts for COVID-19-positive cohort, COVID-19 index date, COVID-19-related admission, and the admission date were shown to have high values in all evaluation metrics. However, three phenotypes showed notable performance degradation than the positive predictive value in the prepandemic population. Based on these results, we removed the three phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes toward using DAPR for chart review. They assessed that the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed.

Conclusion: Use of NLP technology in the chart review helped to cope with the challenges of the COVID-19 data validation task and accelerated the process. As a result, we could provide more reliable research data promptly and respond to the COVID-19 crisis. DAPR's benefit can be expanded to other domains. We plan to operationalize it for wider research groups.

目的:为了为2019冠状病毒病(COVID-19)研究提供高质量的数据,我们在麻省总医院(MGB) COVID-19数据集市中验证了衍生的COVID-19临床指标和22种相关机器学习表型。方法:对数据集市中150例covid -19阳性患者进行回顾性手工图表复习。为了支持对大范围目标数据的快速图表审查,我们提供了一个基于自然语言处理(NLP)的图表审查工具,数字分析患者审查(DAPR)。在这项工作中,我们设计了一个专门的患者总结视图,并开发了新的127 NLP逻辑来提取COVID-19相关的医学概念和目标表型。此外,我们将DAPR转换为研究目的,以便患者信息仅用于批准的研究目的,并支持快速访问集成的患者信息。最后,我们进行了一项调查来评估DAPR的验证难度和有用性。结果:COVID-19阳性队列、COVID-19索引日期、COVID-19相关入院、入院日期等概念在所有评价指标中均具有较高值。然而,与大流行前人群的阳性预测值相比,三种表型表现出显著的性能下降。基于这些结果,我们从数据集市中删除了这三种表型。在使用该工具的调查中,参与者对使用DAPR进行图表审查表达了积极的态度。他们认为验证很容易,DAPR帮助找到了相关信息。还讨论了一些验证困难。结论:在图表审核中使用NLP技术有助于应对COVID-19数据验证任务的挑战,并加快了流程。因此,我们可以及时提供更可靠的研究数据,应对COVID-19危机。DAPR的好处可以扩展到其他领域。我们计划将其应用于更广泛的研究小组。
{"title":"The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation.","authors":"Heekyong Park,&nbsp;Taowei David Wang,&nbsp;Nich Wattanasin,&nbsp;Victor M Castro,&nbsp;Vivian Gainer,&nbsp;Sergey Goryachev,&nbsp;Shawn Murphy","doi":"10.1055/a-1938-0436","DOIUrl":"https://doi.org/10.1055/a-1938-0436","url":null,"abstract":"<p><strong>Objective: </strong>To provide high-quality data for coronavirus disease 2019 (COVID-19) research, we validated derived COVID-19 clinical indicators and 22 associated machine learning phenotypes, in the Mass General Brigham (MGB) COVID-19 Data Mart.</p><p><strong>Methods: </strong>Fifteen reviewers performed a retrospective manual chart review for 150 COVID-19-positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered a natural language processing (NLP)-based chart review tool, the Digital Analytic Patient Reviewer (DAPR). For this work, we designed a dedicated patient summary view and developed new 127 NLP logics to extract COVID-19 relevant medical concepts and target phenotypes. Moreover, we transformed DAPR for research purposes so that patient information is used for an approved research purpose only and enabled fast access to the integrated patient information. Lastly, we performed a survey to evaluate the validation difficulty and usefulness of the DAPR.</p><p><strong>Results: </strong>The concepts for COVID-19-positive cohort, COVID-19 index date, COVID-19-related admission, and the admission date were shown to have high values in all evaluation metrics. However, three phenotypes showed notable performance degradation than the positive predictive value in the prepandemic population. Based on these results, we removed the three phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes toward using DAPR for chart review. They assessed that the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed.</p><p><strong>Conclusion: </strong>Use of NLP technology in the chart review helped to cope with the challenges of the COVID-19 data validation task and accelerated the process. As a result, we could provide more reliable research data promptly and respond to the COVID-19 crisis. DAPR's benefit can be expanded to other domains. We plan to operationalize it for wider research groups.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"61 5-06","pages":"167-173"},"PeriodicalIF":1.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9254113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Service Registry Log Builder: A Case Study in National Trauma Registry of Iran. 自助登记日志生成器:在伊朗国家创伤登记个案研究。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2022-12-01 DOI: 10.1055/a-1911-9088
Mansoureh Yari Eili, Safar Vafadar, Jalal Rezaeenour, Mahdi Sharif-Alhoseini

Background: Although the process-mining algorithms have evolved in the past decade, the lack of attention to extracting event logs from raw data of databases in an automatic manner is evident. These logs are available in a process-oriented manner in the process-aware information systems. Still, there are areas where their extraction is a challenge to address (e.g., trauma registries).

Objective: The registry data are recorded manually and follow an unstructured ad hoc pattern; prone to high noises and errors; consequently, registry logs are classified at a maturity level of one, and extracting process-centric information is not a trivial task therein. The experiences made during the event log building from the trauma registry are the subjects to be studied.

Results: The result indicates that the three-phase self-service registry log builder tool can withstand the mentioned issues by filtering and enriching the raw data and making them ready for any level of process-mining analysis. This proposed tool is demonstrated through process discovery in the National Trauma Registry of Iran, and the encountered challenges and limitations are reported.

Conclusion: This tool is an interactive visual event log builder for trauma registry data and is freely available for studies involving other registries. In conclusion, future research directions derived from this case study are suggested.

背景:虽然过程挖掘算法在过去的十年中有了很大的发展,但显然缺乏对以自动方式从数据库原始数据中提取事件日志的关注。这些日志在流程感知信息系统中以面向流程的方式提供。尽管如此,仍有一些领域的提取是一个挑战(例如,创伤登记)。目的:注册表数据是手工记录的,遵循非结构化的临时模式;容易产生高噪声和误差;因此,注册中心日志按1级成熟度进行分类,提取以流程为中心的信息并不是一项微不足道的任务。在创伤登记处建立事件日志期间所获得的经验是研究的主题。结果:结果表明,通过过滤和丰富原始数据并使其为任何级别的流程挖掘分析做好准备,三相自助注册表日志构建器工具可以抵御上述问题。伊朗国家创伤登记处通过过程发现证明了这一建议的工具,并报告了所遇到的挑战和局限性。结论:该工具是创伤登记数据的交互式可视化事件日志生成器,可免费用于涉及其他登记的研究。最后,提出了今后的研究方向。
{"title":"Self-Service Registry Log Builder: A Case Study in National Trauma Registry of Iran.","authors":"Mansoureh Yari Eili,&nbsp;Safar Vafadar,&nbsp;Jalal Rezaeenour,&nbsp;Mahdi Sharif-Alhoseini","doi":"10.1055/a-1911-9088","DOIUrl":"https://doi.org/10.1055/a-1911-9088","url":null,"abstract":"<p><strong>Background: </strong>Although the process-mining algorithms have evolved in the past decade, the lack of attention to extracting event logs from raw data of databases in an automatic manner is evident. These logs are available in a process-oriented manner in the process-aware information systems. Still, there are areas where their extraction is a challenge to address (e.g., trauma registries).</p><p><strong>Objective: </strong>The registry data are recorded manually and follow an unstructured ad hoc pattern; prone to high noises and errors; consequently, registry logs are classified at a maturity level of one, and extracting process-centric information is not a trivial task therein. The experiences made during the event log building from the trauma registry are the subjects to be studied.</p><p><strong>Results: </strong>The result indicates that the three-phase self-service registry log builder tool can withstand the mentioned issues by filtering and enriching the raw data and making them ready for any level of process-mining analysis. This proposed tool is demonstrated through process discovery in the National Trauma Registry of Iran, and the encountered challenges and limitations are reported.</p><p><strong>Conclusion: </strong>This tool is an interactive visual event log builder for trauma registry data and is freely available for studies involving other registries. In conclusion, future research directions derived from this case study are suggested.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"61 5-06","pages":"185-194"},"PeriodicalIF":1.7,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9608515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Methods of Information in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1