首页 > 最新文献

Methods of Information in Medicine最新文献

英文 中文
An Alternative Application of Natural Language Processing to Express a Characteristic Feature of Diseases in Japanese Medical Records. 自然语言处理在日本病历中表达疾病特征的另一种应用。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-01 DOI: 10.1055/a-2039-3773
Yoshinori Yamanouchi, Taishi Nakamura, Tokunori Ikeda, Koichiro Usuku

Background: Owing to the linguistic situation, Japanese natural language processing (NLP) requires morphological analyses for word segmentation using dictionary techniques.

Objective: We aimed to clarify whether it can be substituted with an open-end discovery-based NLP (OD-NLP), which does not use any dictionary techniques.

Methods: Clinical texts at the first medical visit were collected for comparison of OD-NLP with word dictionary-based-NLP (WD-NLP). Topics were generated in each document using a topic model, which later corresponded to the respective diseases determined in International Statistical Classification of Diseases and Related Health Problems 10 revision. The prediction accuracy and expressivity of each disease were examined in equivalent number of entities/words after filtration with either term frequency and inverse document frequency (TF-IDF) or dominance value (DMV).

Results: In documents from 10,520 observed patients, 169,913 entities and 44,758 words were segmented using OD-NLP and WD-NLP, simultaneously. Without filtering, accuracy and recall levels were low, and there was no difference in the harmonic mean of the F-measure between NLPs. However, physicians reported OD-NLP contained more meaningful words than WD-NLP. When datasets were created in an equivalent number of entities/words with TF-IDF, F-measure in OD-NLP was higher than WD-NLP at lower thresholds. When the threshold increased, the number of datasets created decreased, resulting in increased values of F-measure, although the differences disappeared. Two datasets near the maximum threshold showing differences in F-measure were examined whether their topics were associated with diseases. The results showed that more diseases were found in OD-NLP at lower thresholds, indicating that the topics described characteristics of diseases. The superiority remained as much as that of TF-IDF when filtration was changed to DMV.

Conclusion: The current findings prefer the use of OD-NLP to express characteristics of diseases from Japanese clinical texts and may help in the construction of document summaries and retrieval in clinical settings.

背景:日语自然语言处理(NLP)中,由于语言环境的原因,需要使用词典技术进行词法分析来进行分词。目的:我们旨在澄清是否可以用不使用任何字典技术的开放式基于发现的NLP (OD-NLP)代替它。方法:收集首次就诊时的临床文献,将OD-NLP与基于单词词典的nlp (WD-NLP)进行比较。在每个文件中使用主题模型生成主题,这些主题后来对应于《疾病和相关健康问题国际统计分类10》修订版中确定的各自疾病。每一种疾病的预测准确性和表达性在用术语频率和逆文档频率(TF-IDF)或优势值(DMV)过滤后,以等量的实体/单词进行检测。结果:在10520例患者的文献中,同时使用OD-NLP和WD-NLP对169,913个实体和44,758个单词进行了分割。在未进行过滤的情况下,nlp的准确率和召回率都很低,f测量的谐波平均值在nlp之间没有差异。然而,医生报告OD-NLP比WD-NLP包含更多有意义的单词。当使用TF-IDF以相同数量的实体/词创建数据集时,在较低阈值下,OD-NLP的F-measure高于WD-NLP。当阈值增加时,创建的数据集数量减少,导致F-measure值增加,尽管差异消失。两个接近最大阈值的数据集显示f值差异,检查其主题是否与疾病相关。结果表明,在较低阈值下,OD-NLP中发现的疾病较多,说明主题描述了疾病的特征。当过滤改为DMV时,其优越性与TF-IDF相同。结论:目前的研究结果更倾向于使用OD-NLP来表达日本临床文献的疾病特征,可能有助于临床文献摘要和检索的构建。
{"title":"An Alternative Application of Natural Language Processing to Express a Characteristic Feature of Diseases in Japanese Medical Records.","authors":"Yoshinori Yamanouchi,&nbsp;Taishi Nakamura,&nbsp;Tokunori Ikeda,&nbsp;Koichiro Usuku","doi":"10.1055/a-2039-3773","DOIUrl":"https://doi.org/10.1055/a-2039-3773","url":null,"abstract":"<p><strong>Background: </strong>Owing to the linguistic situation, Japanese natural language processing (NLP) requires morphological analyses for word segmentation using dictionary techniques.</p><p><strong>Objective: </strong>We aimed to clarify whether it can be substituted with an open-end discovery-based NLP (OD-NLP), which does not use any dictionary techniques.</p><p><strong>Methods: </strong>Clinical texts at the first medical visit were collected for comparison of OD-NLP with word dictionary-based-NLP (WD-NLP). Topics were generated in each document using a topic model, which later corresponded to the respective diseases determined in International Statistical Classification of Diseases and Related Health Problems 10 revision. The prediction accuracy and expressivity of each disease were examined in equivalent number of entities/words after filtration with either term frequency and inverse document frequency (TF-IDF) or dominance value (DMV).</p><p><strong>Results: </strong>In documents from 10,520 observed patients, 169,913 entities and 44,758 words were segmented using OD-NLP and WD-NLP, simultaneously. Without filtering, accuracy and recall levels were low, and there was no difference in the harmonic mean of the F-measure between NLPs. However, physicians reported OD-NLP contained more meaningful words than WD-NLP. When datasets were created in an equivalent number of entities/words with TF-IDF, F-measure in OD-NLP was higher than WD-NLP at lower thresholds. When the threshold increased, the number of datasets created decreased, resulting in increased values of F-measure, although the differences disappeared. Two datasets near the maximum threshold showing differences in F-measure were examined whether their topics were associated with diseases. The results showed that more diseases were found in OD-NLP at lower thresholds, indicating that the topics described characteristics of diseases. The superiority remained as much as that of TF-IDF when filtration was changed to DMV.</p><p><strong>Conclusion: </strong>The current findings prefer the use of OD-NLP to express characteristics of diseases from Japanese clinical texts and may help in the construction of document summaries and retrieval in clinical settings.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"110-118"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/3b/10-1055-a-2039-3773.PMC10462427.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10141870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Paper Files to Web-Based Application for Data-Driven Monitoring of HIV Programs: Nigeria's Journey to a National Data Repository for Decision-Making and Patient Care. 从纸质文件到基于网络的数据驱动的艾滋病毒监测应用程序:尼日利亚建立国家决策和患者护理数据库之旅。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-01 DOI: 10.1055/s-0043-1768711
Ibrahim Dalhatu, Chinedu Aniekwe, Adebobola Bashorun, Alhassan Abdulkadir, Emilio Dirlikov, Stephen Ohakanu, Oluwasanmi Adedokun, Ademola Oladipo, Ibrahim Jahun, Lisa Murie, Steven Yoon, Mubarak G Abdu-Aguye, Ahmed Sylvanus, Samuel Indyer, Isah Abbas, Mustapha Bello, Nannim Nalda, Matthias Alagi, Solomon Odafe, Sylvia Adebajo, Otse Ogorry, Murphy Akpu, Ifeanyi Okoye, Kunle Kakanfo, Amobi Andrew Onovo, Gregory Ashefor, Charles Nzelu, Akudo Ikpeazu, Gambo Aliyu, Tedd Ellerbrock, Mary Boyd, Kristen A Stafford, Mahesh Swaminathan

Background: Timely and reliable data are crucial for clinical, epidemiologic, and program management decision making. Electronic health information systems provide platforms for managing large longitudinal patient records. Nigeria implemented the National Data Repository (NDR) to create a central data warehouse of all people living with human immunodeficiency virus (PLHIV) while providing useful functionalities to aid decision making at different levels of program implementation.

Objective: We describe the Nigeria NDR and its development process, including its use for surveillance, research, and national HIV program monitoring toward achieving HIV epidemic control.

Methods: Stakeholder engagement meetings were held in 2013 to gather information on data elements and vocabulary standards for reporting patient-level information, technical infrastructure, human capacity requirements, and information flow. Findings from these meetings guided the development of the NDR. An implementation guide provided common terminologies and data reporting structures for data exchange between the NDR and the electronic medical record (EMR) systems. Data from the EMR were encoded in extensible markup language and sent to the NDR over secure hypertext transfer protocol after going through a series of validation processes.

Results: By June 30, 2021, the NDR had up-to-date records of 1,477,064 (94.4%) patients receiving HIV treatment across 1,985 health facilities, of which 1,266,512 (85.7%) patient records had fingerprint template data to support unique patient identification and record linkage to prevent registration of the same patient under different identities. Data from the NDR was used to support HIV program monitoring, case-based surveillance and production of products like the monthly lists of patients who have treatment interruptions and dashboards for monitoring HIV test and start.

Conclusion: The NDR enabled the availability of reliable and timely data for surveillance, research, and HIV program monitoring to guide program improvements to accelerate progress toward epidemic control.

背景:及时可靠的数据对临床、流行病学和项目管理决策至关重要。电子健康信息系统为管理大型纵向患者记录提供了平台。尼日利亚实施了国家数据存储库(NDR),以建立一个关于所有人类免疫缺陷病毒(艾滋病毒)感染者的中央数据仓库,同时提供有用的功能,以协助不同级别方案执行的决策。目的:我们描述了尼日利亚的NDR及其发展过程,包括其用于监测、研究和国家艾滋病毒规划监测,以实现艾滋病毒流行控制。方法:于2013年召开利益相关者参与会议,收集报告患者级信息、技术基础设施、人员能力要求和信息流的数据元素和词汇标准信息。这些会议的结论指导了《国家发展规划》的制定。一份实施指南为NDR和电子病历系统之间的数据交换提供了通用术语和数据报告结构。EMR中的数据用可扩展标记语言编码,经过一系列验证过程后,通过安全超文本传输协议发送给NDR。结果:截至2021年6月30日,全国共有1985家卫生机构的147.7064万例(94.4%)艾滋病患者接受了NDR的最新记录,其中126.6512万例(85.7%)患者记录具有指纹模板数据,支持患者唯一识别和记录链接,防止同一患者以不同身份登记。来自《国家发展规划》的数据被用于支持艾滋病毒规划监测、基于病例的监测和产品的制作,如每月中断治疗的患者名单和监测艾滋病毒检测和启动的仪表板。结论:NDR为监测、研究和艾滋病毒规划监测提供了可靠和及时的数据,以指导规划改进,加快流行病控制的进展。
{"title":"From Paper Files to Web-Based Application for Data-Driven Monitoring of HIV Programs: Nigeria's Journey to a National Data Repository for Decision-Making and Patient Care.","authors":"Ibrahim Dalhatu,&nbsp;Chinedu Aniekwe,&nbsp;Adebobola Bashorun,&nbsp;Alhassan Abdulkadir,&nbsp;Emilio Dirlikov,&nbsp;Stephen Ohakanu,&nbsp;Oluwasanmi Adedokun,&nbsp;Ademola Oladipo,&nbsp;Ibrahim Jahun,&nbsp;Lisa Murie,&nbsp;Steven Yoon,&nbsp;Mubarak G Abdu-Aguye,&nbsp;Ahmed Sylvanus,&nbsp;Samuel Indyer,&nbsp;Isah Abbas,&nbsp;Mustapha Bello,&nbsp;Nannim Nalda,&nbsp;Matthias Alagi,&nbsp;Solomon Odafe,&nbsp;Sylvia Adebajo,&nbsp;Otse Ogorry,&nbsp;Murphy Akpu,&nbsp;Ifeanyi Okoye,&nbsp;Kunle Kakanfo,&nbsp;Amobi Andrew Onovo,&nbsp;Gregory Ashefor,&nbsp;Charles Nzelu,&nbsp;Akudo Ikpeazu,&nbsp;Gambo Aliyu,&nbsp;Tedd Ellerbrock,&nbsp;Mary Boyd,&nbsp;Kristen A Stafford,&nbsp;Mahesh Swaminathan","doi":"10.1055/s-0043-1768711","DOIUrl":"https://doi.org/10.1055/s-0043-1768711","url":null,"abstract":"<p><strong>Background: </strong>Timely and reliable data are crucial for clinical, epidemiologic, and program management decision making. Electronic health information systems provide platforms for managing large longitudinal patient records. Nigeria implemented the National Data Repository (NDR) to create a central data warehouse of all people living with human immunodeficiency virus (PLHIV) while providing useful functionalities to aid decision making at different levels of program implementation.</p><p><strong>Objective: </strong>We describe the Nigeria NDR and its development process, including its use for surveillance, research, and national HIV program monitoring toward achieving HIV epidemic control.</p><p><strong>Methods: </strong>Stakeholder engagement meetings were held in 2013 to gather information on data elements and vocabulary standards for reporting patient-level information, technical infrastructure, human capacity requirements, and information flow. Findings from these meetings guided the development of the NDR. An implementation guide provided common terminologies and data reporting structures for data exchange between the NDR and the electronic medical record (EMR) systems. Data from the EMR were encoded in extensible markup language and sent to the NDR over secure hypertext transfer protocol after going through a series of validation processes.</p><p><strong>Results: </strong>By June 30, 2021, the NDR had up-to-date records of 1,477,064 (94.4%) patients receiving HIV treatment across 1,985 health facilities, of which 1,266,512 (85.7%) patient records had fingerprint template data to support unique patient identification and record linkage to prevent registration of the same patient under different identities. Data from the NDR was used to support HIV program monitoring, case-based surveillance and production of products like the monthly lists of patients who have treatment interruptions and dashboards for monitoring HIV test and start.</p><p><strong>Conclusion: </strong>The NDR enabled the availability of reliable and timely data for surveillance, research, and HIV program monitoring to guide program improvements to accelerate progress toward epidemic control.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"130-139"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b5/f9/10-1055-s-0043-1768711.PMC10462428.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10136836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rare Diseases in Hospital Information Systems-An Interoperable Methodology for Distributed Data Quality Assessments. 医院信息系统中的罕见疾病——分布式数据质量评估的可互操作方法。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-01 DOI: 10.1055/a-2006-1018
Kais Tahar, Tamara Martin, Yongli Mou, Raphael Verbuecheln, Holm Graessner, Dagmar Krefting

Background: Multisite research networks such as the project "Collaboration on Rare Diseases" connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.

Objectives: The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.

Methods: We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.

Results: Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.

Conclusion: We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.

背景:“罕见病合作”项目等多站点研究网络将各医院连接起来,以获得临床研究所需的足够数据。然而,对于不同卫生信息系统中记录的数据的二次使用,数据质量仍然是一个挑战。需要高水平的DQ以及适当的质量评估方法来支持这种分布式数据的重用。目的:这项工作的目的是开发一种可互操作的方法来评估异质来源记录的数据质量,以提高罕见病(RD)文献的质量并支持临床研究。方法:我们首先开发了DQ评估的概念框架。使用这一理论指导,我们实现了一个软件框架,该框架为计算DQ度量和生成本地以及跨机构报告提供了适当的工具。我们进一步将我们的方法应用于使用Personal Health Train分布在多家医院的合成数据。最后,我们使用精确度和召回率作为度量来验证我们的实现。结果:定义了四个DQ维度,并将其表示为不相交的本体范畴。基于这些顶级维度,我们开发了9个DQ概念、10个DQ指标和25个DQ参数,并将其应用于不同的数据集。随机引入的DQ问题都被自动识别和报告。生成的报告显示结果DQ指示器和检测到的DQ问题。结论:我们已经表明,我们的方法产生了有希望的结果,可用于本地和跨机构DQ评估。所开发的框架为满足指定需求的DQ互操作和隐私保护评估提供了有用的方法。这项研究表明,我们的方法是能够检测DQ问题,如编码诊断的歧义或不可信。它可以用于DQ基准测试,以提高RD文档的质量,并支持分布式数据的临床研究。
{"title":"Rare Diseases in Hospital Information Systems-An Interoperable Methodology for Distributed Data Quality Assessments.","authors":"Kais Tahar,&nbsp;Tamara Martin,&nbsp;Yongli Mou,&nbsp;Raphael Verbuecheln,&nbsp;Holm Graessner,&nbsp;Dagmar Krefting","doi":"10.1055/a-2006-1018","DOIUrl":"https://doi.org/10.1055/a-2006-1018","url":null,"abstract":"<p><strong>Background: </strong>Multisite research networks such as the project \"Collaboration on Rare Diseases\" connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.</p><p><strong>Objectives: </strong>The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.</p><p><strong>Methods: </strong>We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.</p><p><strong>Results: </strong>Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.</p><p><strong>Conclusion: </strong>We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"71-89"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10138370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Simple-to-Use R Package for Mimicking Study Data by Simulations. 一个简单易用的R包,用于模拟研究数据。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-01 DOI: 10.1055/a-2048-7692
Giorgos Koliopanos, Francisco Ojeda, Andreas Ziegler

Background: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.

Objectives: The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.

Methods: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.

Results: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.

Conclusion: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.

背景:数据保护政策可能会禁止将现有研究数据转移到感兴趣的研究小组。为了克服法律限制,可以传输模拟结构但与现有研究数据不同的模拟数据。目的:这项工作的目的是介绍简单易用的R包模拟数据生成(modgo),可用于模拟现有研究数据中的连续、有序分类和二分类变量的数据。方法:将秩反正态变换与各变量的相关矩阵的计算相结合。然后,可以从多元正态态模拟数据,并将其转移回变量的原始尺度。modgo的独特之处在于它允许改变变量之间的相关性,执行扰动分析,处理多中心数据,并通过选择一个或一组变量的特定值来改变纳入/排除标准。对实际数据的仿真研究表明了该模型的有效性和灵活性。结果:modgo模拟了原始研究数据的结构。modgo的结果与其他两个现有软件包在标准模拟场景中的结果相似。Modgo的灵活性在几个扩展中得到了证明。结论:当现有研究数据不能共享时,R包模式是有用的。它的扰动扩展允许模拟真正匿名的对象。扩展到多中心研究可用于验证预测模型。额外的扩展甚至可以在大型研究数据中支持关联的解开,并且在功率计算中很有用。
{"title":"A Simple-to-Use R Package for Mimicking Study Data by Simulations.","authors":"Giorgos Koliopanos,&nbsp;Francisco Ojeda,&nbsp;Andreas Ziegler","doi":"10.1055/a-2048-7692","DOIUrl":"https://doi.org/10.1055/a-2048-7692","url":null,"abstract":"<p><strong>Background: </strong>Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.</p><p><strong>Objectives: </strong>The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.</p><p><strong>Methods: </strong>The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.</p><p><strong>Results: </strong>modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.</p><p><strong>Conclusion: </strong>The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"119-129"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/75/40/10-1055-a-2048-7692.PMC10462429.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10492948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions. 健康领域的综合表格数据评估,涵盖相似性、效用和隐私维度。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/s-0042-1760247
Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, Debbie Rankin

Background: Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.

Objective: The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.

Methods: Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories ("Excellent," "Good," and "Poor"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.

Results: The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.

Conclusion: The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.

背景:合成表格数据生成是一种潜在的有价值的技术,在数据增强和隐私保护方面具有很大的前景。然而,在采用之前,需要跨与目标应用程序相关的维度对生成的合成表格数据进行经验评估,以确定其有效性。文献中发现,卫生领域合成表格数据缺乏标准化和客观的评估和基准策略。目的:本文的目的是确定关键维度、每维度度量和评估使用不同技术和配置生成的健康领域应用程序开发的综合表格数据的方法,并提供编排它们的策略。方法:基于文献,相似性、效用和隐私维度已被优先考虑,并将其评估的度量和方法集合编排成一个完整的评估管道。通过这种方式,可以对生成的合成表格数据进行指导和比较评估,将其质量分为三类(“优秀”、“良好”和“差”)。选择了六个卫生保健相关数据集和四种综合表格数据生成方法进行分析和评估,以验证拟议的评估管道的效用。结果:对于大多数数据集和合成表格数据生成方法组合,四种方法生成的合成表格数据保持了相似性、实用性和隐私性。在一些数据集中,一些方法优于其他方法,而在其他数据集中,不止一种方法产生了相同的性能。结论:实验结果表明,该管道可有效地对各种合成表格数据生成方法生成的合成表格数据进行评价和基准测试。因此,这个管道可以支持科学界为他们感兴趣的数据和应用选择最合适的合成表格数据生成方法。
{"title":"Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions.","authors":"Mikel Hernadez,&nbsp;Gorka Epelde,&nbsp;Ane Alberdi,&nbsp;Rodrigo Cilla,&nbsp;Debbie Rankin","doi":"10.1055/s-0042-1760247","DOIUrl":"https://doi.org/10.1055/s-0042-1760247","url":null,"abstract":"<p><strong>Background: </strong>Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.</p><p><strong>Objective: </strong>The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.</p><p><strong>Methods: </strong>Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories (\"<i>Excellent,</i>\" \"<i>Good,</i>\" and \"<i>Poor</i>\"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.</p><p><strong>Results: </strong>The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.</p><p><strong>Conclusion: </strong>The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e19-e38"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/31/67/10-1055-s-0042-1760247.PMC10306449.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Aligning Semantic Interoperability Frameworks with the FOXS Stack for FAIR Health Data. 将语义互操作性框架与FOXS堆栈对齐以实现公平健康数据。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/a-1993-8036
John Meredith, Nicola Whitehead, Michael Dacey

Background: FAIR Guiding Principles present a synergy with the use cases for digital health records, in that clinical data need to be found, accessible within a range of environments, and data must interoperate between systems and subsequently reused. The use of HL7 FHIR, openEHR, IHE XDS, and SNOMED CT (FOXS) together represents a specification to create an open digital health platform for modern health care applications.

Objectives: To describe where logical FOXS components align to the European Open Science Cloud Interoperability Framework (EOSC-IF) reference architecture for semantic interoperability. This should provide a means of defining if FOXS aligns to FAIR principles and to establish the data models and structures that support longitudinal care records as being fit to underpin scientific research.

Methods: The EOSC-IF Semantic View is a representation of semantic interoperability where meaning is preserved between systems and users. This was analyzed and cross-referenced with FOXS architectural components, mapping concepts, and objects that describe content such as catalogues and semantic artifacts.

Results: Majority of conceptual Semantic View components were featured within the FOXS architecture. Semantic Business Objects are composed of a range of elements such as openEHR archetypes and templates, FHIR resources and profiles, SNOMED CT concepts, and XDS document identifiers. Semantic Functional Content comprises catalogues of metadata that were also supported by openEHR and FHIR tools.

Conclusions: Despite some elements of EOSC-IF being vague (e.g., FAIR Digital Object), there was a broad conformance to the framework concepts and the components of a FOXS platform. This work supports a health-domain-specific view of semantic interoperability and how this may be achieved to support FAIR data for health research via a standardized framework.

背景:FAIR指导原则提出了与数字健康记录用例的协同作用,因为临床数据需要在一系列环境中被发现和访问,并且数据必须在系统之间互操作并随后被重用。HL7 FHIR、openEHR、IHE XDS和SNOMED CT (FOXS)的共同使用代表了为现代医疗保健应用程序创建开放数字医疗平台的规范。目标:描述逻辑FOXS组件与欧洲开放科学云互操作性框架(EOSC-IF)参考架构在语义互操作性方面的一致性。这应该提供一种方法来定义FOXS是否符合FAIR原则,并建立数据模型和结构,以支持纵向护理记录适合支撑科学研究。方法:EOSC-IF语义视图是语义互操作性的一种表示,其中在系统和用户之间保留了意义。对其进行了分析,并与FOXS体系结构组件、映射概念和描述目录和语义构件等内容的对象进行了交叉引用。结果:大多数概念语义视图组件在FOXS体系结构中具有特征。语义业务对象由一系列元素组成,例如openEHR原型和模板、FHIR资源和概要文件、SNOMED CT概念和XDS文档标识符。语义功能内容包括元数据目录,openEHR和FHIR工具也支持这些目录。结论:尽管eoc - if的一些元素是模糊的(例如,FAIR数字对象),但对框架概念和FOXS平台的组件有广泛的一致性。这项工作支持卫生领域特定的语义互操作性观点,以及如何通过标准化框架实现这一点,以支持卫生研究的FAIR数据。
{"title":"Aligning Semantic Interoperability Frameworks with the FOXS Stack for FAIR Health Data.","authors":"John Meredith,&nbsp;Nicola Whitehead,&nbsp;Michael Dacey","doi":"10.1055/a-1993-8036","DOIUrl":"https://doi.org/10.1055/a-1993-8036","url":null,"abstract":"<p><strong>Background: </strong>FAIR Guiding Principles present a synergy with the use cases for digital health records, in that clinical data need to be found, accessible within a range of environments, and data must interoperate between systems and subsequently reused. The use of HL7 FHIR, openEHR, IHE XDS, and SNOMED CT (FOXS) together represents a specification to create an open digital health platform for modern health care applications.</p><p><strong>Objectives: </strong>To describe where logical FOXS components align to the European Open Science Cloud Interoperability Framework (EOSC-IF) reference architecture for semantic interoperability. This should provide a means of defining if FOXS aligns to FAIR principles and to establish the data models and structures that support longitudinal care records as being fit to underpin scientific research.</p><p><strong>Methods: </strong>The EOSC-IF Semantic View is a representation of semantic interoperability where meaning is preserved between systems and users. This was analyzed and cross-referenced with FOXS architectural components, mapping concepts, and objects that describe content such as catalogues and semantic artifacts.</p><p><strong>Results: </strong>Majority of conceptual Semantic View components were featured within the FOXS architecture. Semantic Business Objects are composed of a range of elements such as openEHR archetypes and templates, FHIR resources and profiles, SNOMED CT concepts, and XDS document identifiers. Semantic Functional Content comprises catalogues of metadata that were also supported by openEHR and FHIR tools.</p><p><strong>Conclusions: </strong>Despite some elements of EOSC-IF being vague (e.g., FAIR Digital Object), there was a broad conformance to the framework concepts and the components of a FOXS platform. This work supports a health-domain-specific view of semantic interoperability and how this may be achieved to support FAIR data for health research via a standardized framework.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e39-e46"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a3/76/10-1055-a-1993-8036.PMC10306448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9786736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
We Know What You Agreed To, Don't We?-Evaluating the Quality of Paper-Based Consents Forms and Their Digitalized Equivalent Using the Example of the Baltic Fracture Competence Centre Project. 我们知道你同意了什么,不是吗?-以波罗的海裂缝能力中心项目为例,评估纸质同意书及其数字化等效文件的质量。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/s-0042-1760249
Henriette Rau, Dana Stahl, Anna-Juliana Reichel, Martin Bialke, Thomas Bahls, Wolfgang Hoffmann

Introduction: The informed consent is the legal basis for research with human subjects. Therefore, the consent form (CF) as legally binding document must be valid, that is, be completely filled-in stating the person's decision clearly and signed by the respective person. However, especially paper-based CFs might have quality issues and the transformation into machine-readable information could add to low quality. This paper evaluates the quality and arising quality issues of paper-based CFs using the example of the Baltic Fracture Competence Centre (BFCC) fracture registry. It also evaluates the impact of quality assurance (QA) measures including giving site-specific feedback. Finally, it answers the question whether manual data entry of patients' decisions by clinical staff leads to a significant error rate in digitalized paper-based CFs.

Methods: Based on defined quality criteria, monthly QA including source data verification was conducted by two individual reviewers since the start of recruitment in December 2017. Basis for the analyses are the CFs collected from December 2017 until February 2019 (first recruitment period).

Results: After conducting QA internally, the sudden increase of quality issues in May 2018 led to site-specific feedback reports and follow-up training regarding the CFs' quality starting in June 2018. Specific criteria and descriptions on how to correct the CFs helped in increasing the quality in a timely matter. Most common issues were missing pages, decisions regarding optional modules, and signature(s). Since patients' datasets without valid CFs must be deleted, QA helped in retaining 65 datasets for research so that the final datapool consisted of 840 (99.29%) patients.

Conclusion: All quality issues could be assigned to one predefined criterion. Using the example of the BFCC fracture registry, CF-QA proved to significantly increase CF quality and help retain the number of available datasets for research. Consequently, the described quality indicators, criteria, and QA processes can be seen as the best practice approach.

前言:知情同意是人类受试者研究的法律依据。因此,作为具有法律约束力的文件,同意书必须是有效的,即填写完整,清楚地说明当事人的决定,并由当事人签署。但是,特别是基于纸张的cf可能存在质量问题,并且转换为机器可读的信息可能会增加低质量。本文以波罗的海骨折能力中心(BFCC)骨折登记为例,评估了纸质cf的质量和出现的质量问题。它还评估质量保证(QA)措施的影响,包括给出特定地点的反馈。最后,它回答了临床工作人员手动输入患者决策数据是否会导致数字化纸质CFs的显着错误率的问题。方法:自2017年12月开始招募以来,根据定义的质量标准,由两名个人评审员进行每月QA,包括源数据验证。分析的基础是2017年12月至2019年2月(第一个招聘期)收集的财务报表。结果:在进行内部QA后,2018年5月质量问题突然增加,导致2018年6月开始针对中心质量进行现场反馈报告和后续培训。具体的准则和关于如何纠正缺陷的说明有助于及时提高质量。最常见的问题是缺少页面、关于可选模块的决定和签名。由于必须删除没有有效cf的患者数据集,QA帮助保留了65个数据集用于研究,因此最终的数据池由840例(99.29%)患者组成。结论:所有的质量问题都可以归为一个预定义的标准。以BFCC骨折登记为例,CF- qa被证明可以显著提高CF质量,并有助于保留可用数据集的数量。因此,所描述的质量指标、标准和QA过程可以被视为最佳实践方法。
{"title":"We Know What You Agreed To, Don't We?-Evaluating the Quality of Paper-Based Consents Forms and Their Digitalized Equivalent Using the Example of the Baltic Fracture Competence Centre Project.","authors":"Henriette Rau,&nbsp;Dana Stahl,&nbsp;Anna-Juliana Reichel,&nbsp;Martin Bialke,&nbsp;Thomas Bahls,&nbsp;Wolfgang Hoffmann","doi":"10.1055/s-0042-1760249","DOIUrl":"https://doi.org/10.1055/s-0042-1760249","url":null,"abstract":"<p><strong>Introduction: </strong>The informed consent is the legal basis for research with human subjects. Therefore, the consent form (CF) as legally binding document must be valid, that is, be completely filled-in stating the person's decision clearly and signed by the respective person. However, especially paper-based CFs might have quality issues and the transformation into machine-readable information could add to low quality. This paper evaluates the quality and arising quality issues of paper-based CFs using the example of the Baltic Fracture Competence Centre (BFCC) fracture registry. It also evaluates the impact of quality assurance (QA) measures including giving site-specific feedback. Finally, it answers the question whether manual data entry of patients' decisions by clinical staff leads to a significant error rate in digitalized paper-based CFs.</p><p><strong>Methods: </strong>Based on defined quality criteria, monthly QA including source data verification was conducted by two individual reviewers since the start of recruitment in December 2017. Basis for the analyses are the CFs collected from December 2017 until February 2019 (first recruitment period).</p><p><strong>Results: </strong>After conducting QA internally, the sudden increase of quality issues in May 2018 led to site-specific feedback reports and follow-up training regarding the CFs' quality starting in June 2018. Specific criteria and descriptions on how to correct the CFs helped in increasing the quality in a timely matter. Most common issues were missing pages, decisions regarding optional modules, and signature(s). Since patients' datasets without valid CFs must be deleted, QA helped in retaining 65 datasets for research so that the final datapool consisted of 840 (99.29%) patients.</p><p><strong>Conclusion: </strong>All quality issues could be assigned to one predefined criterion. Using the example of the BFCC fracture registry, CF-QA proved to significantly increase CF quality and help retain the number of available datasets for research. Consequently, the described quality indicators, criteria, and QA processes can be seen as the best practice approach.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e10-e18"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/05/82/10-1055-s-0042-1760249.PMC10306442.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections. 一致性作为来自国家大流行队列网络数据收集的德国冠状病毒共识项目的数据质量度量。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/a-2006-1086
Khalid O Yusuf, Olga Miljukov, Anne Schoneberg, Sabine Hanß, Martin Wiesenfeldt, Melanie Stecher, Lazar Mitrov, Sina Marie Hopff, Sarah Steinbrecher, Florian Kurth, Thomas Bahmer, Stefan Schreiber, Daniel Pape, Anna-Lena Hofmann, Mirjam Kohls, Stefan Störk, Hans Christian Stubbe, Johannes J Tebbe, Johannes C Hellmuth, Johanna Erber, Lilian Krist, Siegbert Rieg, Lisa Pilgram, Jörg J Vehreschild, Jens-Peter Reese, Dagmar Krefting

Background: As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.

Objectives: The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.

Methods: All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their-defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.

Results: Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.

Conclusion: An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.

背景:为了更好地了解当前的大流行,德国国家大流行队列网络(NAPKON)内的三个队列收集了来自不同目标人群的2019冠状病毒病(COVID-19)患者的社会人口统计学和临床数据。此外,引入德国冠状病毒共识数据集(GECCO)作为临床常规中COVID-19患者的统一基本信息模型。为了将队列数据与其他基于GECCO的研究进行比较,数据项被映射到GECCO。由于从一个信息模型到另一个信息模型的映射很复杂,因此建议对映射项进行额外的一致性评估,以检测可能的映射问题或源数据不一致。目的:本工作的目的是确保研究数据映射到GECCO数据模型的高度一致性。特别是,它旨在识别德国国家COVID-19队列中相互依赖的GECCO数据项目中的矛盾,以便调查确定矛盾的可能原因。此外,我们的目标是使其他研究人员能够轻松地对基于gecco的数据集进行数据质量评估,并适应类似的数据模型。方法:将三个NAPKON队列中所有合适的数据项映射到GECCO项目。遵循现有质量评估框架的设计,实现了一致性评估工具(dqGecco),保留了它们定义的一致性分类法,包括逻辑和经验矛盾。评估结果在主要数据源上得到独立验证。结果:我们的一致性评估工具帮助纠正了制图程序,并揭示了COVID-19症状、生命体征和COVID-19严重程度之间剩余的矛盾值组合。不同指标和队列之间的一致性率从95.84%到100%不等。结论:开发了一种能够发现COVID-19领域不一致性的高效便携式工具,并将其应用于三个不同的队列。由于GECCO数据集用于不同的平台和研究,该工具可以直接应用于不同的平台和研究,也可以适应类似的信息模型。
{"title":"Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections.","authors":"Khalid O Yusuf,&nbsp;Olga Miljukov,&nbsp;Anne Schoneberg,&nbsp;Sabine Hanß,&nbsp;Martin Wiesenfeldt,&nbsp;Melanie Stecher,&nbsp;Lazar Mitrov,&nbsp;Sina Marie Hopff,&nbsp;Sarah Steinbrecher,&nbsp;Florian Kurth,&nbsp;Thomas Bahmer,&nbsp;Stefan Schreiber,&nbsp;Daniel Pape,&nbsp;Anna-Lena Hofmann,&nbsp;Mirjam Kohls,&nbsp;Stefan Störk,&nbsp;Hans Christian Stubbe,&nbsp;Johannes J Tebbe,&nbsp;Johannes C Hellmuth,&nbsp;Johanna Erber,&nbsp;Lilian Krist,&nbsp;Siegbert Rieg,&nbsp;Lisa Pilgram,&nbsp;Jörg J Vehreschild,&nbsp;Jens-Peter Reese,&nbsp;Dagmar Krefting","doi":"10.1055/a-2006-1086","DOIUrl":"https://doi.org/10.1055/a-2006-1086","url":null,"abstract":"<p><strong>Background: </strong>As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.</p><p><strong>Objectives: </strong>The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.</p><p><strong>Methods: </strong>All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their<i>-</i>defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.</p><p><strong>Results: </strong>Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.</p><p><strong>Conclusion: </strong>An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e47-e56"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/4d/05/10-1055-a-2006-1086.PMC10306447.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9842097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Targeted Data Quality Analysis for a Clinical Decision Support System for SIRS Detection in Critically Ill Pediatric Patients. 危重儿科SIRS检测临床决策支持系统的目标数据质量分析。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/s-0042-1760238
Erik Tute, Marcel Mast, Antje Wulff

Background: Data quality issues can cause false decisions of clinical decision support systems (CDSSs). Analyzing local data quality has the potential to prevent data quality-related failure of CDSS adoption.

Objectives: To define a shareable set of applicable measurement methods (MMs) for a targeted data quality assessment determining the suitability of local data for our CDSS.

Methods: We derived task-specific MMs using four approaches: (1) a GUI-based data quality analysis using the open source tool openCQA. (2) Analyzing cases of known false CDSS decisions. (3) Data-driven learning on MM-results. (4) A systematic check to find blind spots in our set of MMs based on the HIDQF data quality framework. We expressed the derived data quality-related knowledge about the CDSS using the 5-tuple-formalization for MMs.

Results: We identified some task-specific dataset characteristics that a targeted data quality assessment for our use case should inspect. Altogether, we defined 394 MMs organized in 13 data quality knowledge bases.

Conclusions: We have created a set of shareable, applicable MMs that can support targeted data quality assessment for CDSS-based systemic inflammatory response syndrome (SIRS) detection in critically ill, pediatric patients. With the demonstrated approaches for deriving and expressing task-specific MMs, we intend to help promoting targeted data quality assessment as a commonly recognized usual part of research on data-consuming application systems in health care.

背景:数据质量问题可能导致临床决策支持系统(cdss)的错误决策。分析本地数据质量有可能防止采用CDSS时出现与数据质量相关的失败。目的:定义一套可共享的适用测量方法(mm),用于有针对性的数据质量评估,确定本地数据对我们CDSS的适用性。方法:我们使用四种方法推导特定于任务的mm:(1)使用开源工具openCQA进行基于gui的数据质量分析。(2)分析已知的CDSS错误决策案例。(3)基于mm结果的数据驱动学习。(4)基于HIDQF数据质量框架系统检查我们的mm集中的盲点。我们使用mm的5元形式化表达了关于CDSS的衍生数据质量相关知识。结果:我们确定了一些任务特定的数据集特征,用例的目标数据质量评估应该检查这些特征。我们总共定义了394个mm,组织在13个数据质量知识库中。结论:我们已经创建了一套可共享的、适用的mm,可以支持危重儿科患者基于cdss的系统性炎症反应综合征(SIRS)检测的目标数据质量评估。通过演示导出和表达特定任务mm的方法,我们打算帮助促进有针对性的数据质量评估,使其成为医疗保健中数据消费应用系统研究中公认的常规部分。
{"title":"Targeted Data Quality Analysis for a Clinical Decision Support System for SIRS Detection in Critically Ill Pediatric Patients.","authors":"Erik Tute,&nbsp;Marcel Mast,&nbsp;Antje Wulff","doi":"10.1055/s-0042-1760238","DOIUrl":"https://doi.org/10.1055/s-0042-1760238","url":null,"abstract":"<p><strong>Background: </strong>Data quality issues can cause false decisions of clinical decision support systems (CDSSs). Analyzing local data quality has the potential to prevent data quality-related failure of CDSS adoption.</p><p><strong>Objectives: </strong>To define a shareable set of applicable measurement methods (MMs) for a targeted data quality assessment determining the suitability of local data for our CDSS.</p><p><strong>Methods: </strong>We derived task-specific MMs using four approaches: (1) a GUI-based data quality analysis using the open source tool <i>openCQA</i>. (2) Analyzing cases of known false CDSS decisions. (3) Data-driven learning on MM-results. (4) A systematic check to find blind spots in our set of MMs based on the <i>HIDQF</i> data quality framework. We expressed the derived data quality-related knowledge about the CDSS using the 5-tuple-formalization for MMs.</p><p><strong>Results: </strong>We identified some task-specific dataset characteristics that a targeted data quality assessment for our use case should inspect. Altogether, we defined 394 MMs organized in 13 data quality knowledge bases.</p><p><strong>Conclusions: </strong>We have created a set of shareable, applicable MMs that can support targeted data quality assessment for CDSS-based systemic inflammatory response syndrome (SIRS) detection in critically ill, pediatric patients. With the demonstrated approaches for deriving and expressing task-specific MMs, we intend to help promoting targeted data quality assessment as a commonly recognized usual part of research on data-consuming application systems in health care.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e1-e9"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/23/e5/10-1055-s-0042-1760238.PMC10306443.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10163000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Nurse Managers' Opinions of Information System Support for Performance Management: A Correlational Study. 护理管理者对信息系统支持绩效管理的看法:一项相关研究。
IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-01 DOI: 10.1055/a-1978-9727
Kaija Saranto, Samuli Koponen, Tuulikki Vehko, Eija Kivekäs

Background: Current information systems do not effectively support nurse managers' duties, such as reporting, resource management, and assessing clinical performance. Few performance management information systems are available and features in many are scattered.

Objectives: The purpose of the study was to determine nurse managers' opinions of information system support for performance management.

Methods: An online questionnaire was used to collect data from nurse managers (n = 419). Pearson's correlation coefficients and linear regression were used to examine the relationships between variables, which were nurse managers' ability to manage resources, to report and evaluate productivity, and to assess nursing performance and clinical procedures.

Results: More than half of the managers used performance management systems daily. Managers (60%) felt that they can use information systems to follow the use of physical resources, and in general (63%), they felt that it is easy to perform searches with the systems used for following up activity. Nurse managers' ability to manage resources, to report productivity, and to assess nursing care performance were correlated significantly with each other.

Conclusion: Currently, managers have to collect data from various systems for management purposes, as system integration does not support performance data collection. The availability of continuous in-service training had a positive effect on information system use.

背景:当前的信息系统不能有效地支持护士管理者的职责,如报告、资源管理和评估临床表现。可用的绩效管理信息系统很少,而且许多系统的功能是分散的。目的:本研究旨在了解护理管理者对信息系统支持绩效管理的看法。方法:采用在线调查问卷对419名护理管理人员进行调查。使用Pearson相关系数和线性回归来检验变量之间的关系,这些变量是护士管理者管理资源的能力,报告和评估生产力的能力,以及评估护理绩效和临床程序的能力。结果:超过一半的管理者每天都在使用绩效管理系统。管理人员(60%)认为他们可以使用信息系统来跟踪物理资源的使用情况,总的来说(63%),他们认为使用用于跟踪活动的系统进行搜索很容易。护士管理者管理资源的能力、报告工作效率的能力和评估护理绩效的能力之间存在显著相关。结论:目前,由于系统集成不支持绩效数据的收集,管理人员必须从各个系统中收集数据进行管理。提供持续的在职培训对信息系统的使用有积极的影响。
{"title":"Nurse Managers' Opinions of Information System Support for Performance Management: A Correlational Study.","authors":"Kaija Saranto,&nbsp;Samuli Koponen,&nbsp;Tuulikki Vehko,&nbsp;Eija Kivekäs","doi":"10.1055/a-1978-9727","DOIUrl":"https://doi.org/10.1055/a-1978-9727","url":null,"abstract":"<p><strong>Background: </strong>Current information systems do not effectively support nurse managers' duties, such as reporting, resource management, and assessing clinical performance. Few performance management information systems are available and features in many are scattered.</p><p><strong>Objectives: </strong>The purpose of the study was to determine nurse managers' opinions of information system support for performance management.</p><p><strong>Methods: </strong>An online questionnaire was used to collect data from nurse managers (<i>n</i> = 419). Pearson's correlation coefficients and linear regression were used to examine the relationships between variables, which were nurse managers' ability to manage resources, to report and evaluate productivity, and to assess nursing performance and clinical procedures.</p><p><strong>Results: </strong>More than half of the managers used performance management systems daily. Managers (60%) felt that they can use information systems to follow the use of physical resources, and in general (63%), they felt that it is easy to perform searches with the systems used for following up activity. Nurse managers' ability to manage resources, to report productivity, and to assess nursing care performance were correlated significantly with each other.</p><p><strong>Conclusion: </strong>Currently, managers have to collect data from various systems for management purposes, as system integration does not support performance data collection. The availability of continuous in-service training had a positive effect on information system use.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e63-e72"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b2/85/10-1055-a-1978-9727.PMC10306445.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9786706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Methods of Information in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1