Background: Owing to the linguistic situation, Japanese natural language processing (NLP) requires morphological analyses for word segmentation using dictionary techniques.
Objective: We aimed to clarify whether it can be substituted with an open-end discovery-based NLP (OD-NLP), which does not use any dictionary techniques.
Methods: Clinical texts at the first medical visit were collected for comparison of OD-NLP with word dictionary-based-NLP (WD-NLP). Topics were generated in each document using a topic model, which later corresponded to the respective diseases determined in International Statistical Classification of Diseases and Related Health Problems 10 revision. The prediction accuracy and expressivity of each disease were examined in equivalent number of entities/words after filtration with either term frequency and inverse document frequency (TF-IDF) or dominance value (DMV).
Results: In documents from 10,520 observed patients, 169,913 entities and 44,758 words were segmented using OD-NLP and WD-NLP, simultaneously. Without filtering, accuracy and recall levels were low, and there was no difference in the harmonic mean of the F-measure between NLPs. However, physicians reported OD-NLP contained more meaningful words than WD-NLP. When datasets were created in an equivalent number of entities/words with TF-IDF, F-measure in OD-NLP was higher than WD-NLP at lower thresholds. When the threshold increased, the number of datasets created decreased, resulting in increased values of F-measure, although the differences disappeared. Two datasets near the maximum threshold showing differences in F-measure were examined whether their topics were associated with diseases. The results showed that more diseases were found in OD-NLP at lower thresholds, indicating that the topics described characteristics of diseases. The superiority remained as much as that of TF-IDF when filtration was changed to DMV.
Conclusion: The current findings prefer the use of OD-NLP to express characteristics of diseases from Japanese clinical texts and may help in the construction of document summaries and retrieval in clinical settings.
{"title":"An Alternative Application of Natural Language Processing to Express a Characteristic Feature of Diseases in Japanese Medical Records.","authors":"Yoshinori Yamanouchi, Taishi Nakamura, Tokunori Ikeda, Koichiro Usuku","doi":"10.1055/a-2039-3773","DOIUrl":"https://doi.org/10.1055/a-2039-3773","url":null,"abstract":"<p><strong>Background: </strong>Owing to the linguistic situation, Japanese natural language processing (NLP) requires morphological analyses for word segmentation using dictionary techniques.</p><p><strong>Objective: </strong>We aimed to clarify whether it can be substituted with an open-end discovery-based NLP (OD-NLP), which does not use any dictionary techniques.</p><p><strong>Methods: </strong>Clinical texts at the first medical visit were collected for comparison of OD-NLP with word dictionary-based-NLP (WD-NLP). Topics were generated in each document using a topic model, which later corresponded to the respective diseases determined in International Statistical Classification of Diseases and Related Health Problems 10 revision. The prediction accuracy and expressivity of each disease were examined in equivalent number of entities/words after filtration with either term frequency and inverse document frequency (TF-IDF) or dominance value (DMV).</p><p><strong>Results: </strong>In documents from 10,520 observed patients, 169,913 entities and 44,758 words were segmented using OD-NLP and WD-NLP, simultaneously. Without filtering, accuracy and recall levels were low, and there was no difference in the harmonic mean of the F-measure between NLPs. However, physicians reported OD-NLP contained more meaningful words than WD-NLP. When datasets were created in an equivalent number of entities/words with TF-IDF, F-measure in OD-NLP was higher than WD-NLP at lower thresholds. When the threshold increased, the number of datasets created decreased, resulting in increased values of F-measure, although the differences disappeared. Two datasets near the maximum threshold showing differences in F-measure were examined whether their topics were associated with diseases. The results showed that more diseases were found in OD-NLP at lower thresholds, indicating that the topics described characteristics of diseases. The superiority remained as much as that of TF-IDF when filtration was changed to DMV.</p><p><strong>Conclusion: </strong>The current findings prefer the use of OD-NLP to express characteristics of diseases from Japanese clinical texts and may help in the construction of document summaries and retrieval in clinical settings.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"110-118"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/3b/10-1055-a-2039-3773.PMC10462427.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10141870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Dalhatu, Chinedu Aniekwe, Adebobola Bashorun, Alhassan Abdulkadir, Emilio Dirlikov, Stephen Ohakanu, Oluwasanmi Adedokun, Ademola Oladipo, Ibrahim Jahun, Lisa Murie, Steven Yoon, Mubarak G Abdu-Aguye, Ahmed Sylvanus, Samuel Indyer, Isah Abbas, Mustapha Bello, Nannim Nalda, Matthias Alagi, Solomon Odafe, Sylvia Adebajo, Otse Ogorry, Murphy Akpu, Ifeanyi Okoye, Kunle Kakanfo, Amobi Andrew Onovo, Gregory Ashefor, Charles Nzelu, Akudo Ikpeazu, Gambo Aliyu, Tedd Ellerbrock, Mary Boyd, Kristen A Stafford, Mahesh Swaminathan
Background: Timely and reliable data are crucial for clinical, epidemiologic, and program management decision making. Electronic health information systems provide platforms for managing large longitudinal patient records. Nigeria implemented the National Data Repository (NDR) to create a central data warehouse of all people living with human immunodeficiency virus (PLHIV) while providing useful functionalities to aid decision making at different levels of program implementation.
Objective: We describe the Nigeria NDR and its development process, including its use for surveillance, research, and national HIV program monitoring toward achieving HIV epidemic control.
Methods: Stakeholder engagement meetings were held in 2013 to gather information on data elements and vocabulary standards for reporting patient-level information, technical infrastructure, human capacity requirements, and information flow. Findings from these meetings guided the development of the NDR. An implementation guide provided common terminologies and data reporting structures for data exchange between the NDR and the electronic medical record (EMR) systems. Data from the EMR were encoded in extensible markup language and sent to the NDR over secure hypertext transfer protocol after going through a series of validation processes.
Results: By June 30, 2021, the NDR had up-to-date records of 1,477,064 (94.4%) patients receiving HIV treatment across 1,985 health facilities, of which 1,266,512 (85.7%) patient records had fingerprint template data to support unique patient identification and record linkage to prevent registration of the same patient under different identities. Data from the NDR was used to support HIV program monitoring, case-based surveillance and production of products like the monthly lists of patients who have treatment interruptions and dashboards for monitoring HIV test and start.
Conclusion: The NDR enabled the availability of reliable and timely data for surveillance, research, and HIV program monitoring to guide program improvements to accelerate progress toward epidemic control.
{"title":"From Paper Files to Web-Based Application for Data-Driven Monitoring of HIV Programs: Nigeria's Journey to a National Data Repository for Decision-Making and Patient Care.","authors":"Ibrahim Dalhatu, Chinedu Aniekwe, Adebobola Bashorun, Alhassan Abdulkadir, Emilio Dirlikov, Stephen Ohakanu, Oluwasanmi Adedokun, Ademola Oladipo, Ibrahim Jahun, Lisa Murie, Steven Yoon, Mubarak G Abdu-Aguye, Ahmed Sylvanus, Samuel Indyer, Isah Abbas, Mustapha Bello, Nannim Nalda, Matthias Alagi, Solomon Odafe, Sylvia Adebajo, Otse Ogorry, Murphy Akpu, Ifeanyi Okoye, Kunle Kakanfo, Amobi Andrew Onovo, Gregory Ashefor, Charles Nzelu, Akudo Ikpeazu, Gambo Aliyu, Tedd Ellerbrock, Mary Boyd, Kristen A Stafford, Mahesh Swaminathan","doi":"10.1055/s-0043-1768711","DOIUrl":"https://doi.org/10.1055/s-0043-1768711","url":null,"abstract":"<p><strong>Background: </strong>Timely and reliable data are crucial for clinical, epidemiologic, and program management decision making. Electronic health information systems provide platforms for managing large longitudinal patient records. Nigeria implemented the National Data Repository (NDR) to create a central data warehouse of all people living with human immunodeficiency virus (PLHIV) while providing useful functionalities to aid decision making at different levels of program implementation.</p><p><strong>Objective: </strong>We describe the Nigeria NDR and its development process, including its use for surveillance, research, and national HIV program monitoring toward achieving HIV epidemic control.</p><p><strong>Methods: </strong>Stakeholder engagement meetings were held in 2013 to gather information on data elements and vocabulary standards for reporting patient-level information, technical infrastructure, human capacity requirements, and information flow. Findings from these meetings guided the development of the NDR. An implementation guide provided common terminologies and data reporting structures for data exchange between the NDR and the electronic medical record (EMR) systems. Data from the EMR were encoded in extensible markup language and sent to the NDR over secure hypertext transfer protocol after going through a series of validation processes.</p><p><strong>Results: </strong>By June 30, 2021, the NDR had up-to-date records of 1,477,064 (94.4%) patients receiving HIV treatment across 1,985 health facilities, of which 1,266,512 (85.7%) patient records had fingerprint template data to support unique patient identification and record linkage to prevent registration of the same patient under different identities. Data from the NDR was used to support HIV program monitoring, case-based surveillance and production of products like the monthly lists of patients who have treatment interruptions and dashboards for monitoring HIV test and start.</p><p><strong>Conclusion: </strong>The NDR enabled the availability of reliable and timely data for surveillance, research, and HIV program monitoring to guide program improvements to accelerate progress toward epidemic control.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"130-139"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b5/f9/10-1055-s-0043-1768711.PMC10462428.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10136836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Multisite research networks such as the project "Collaboration on Rare Diseases" connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.
Objectives: The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.
Methods: We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.
Results: Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.
Conclusion: We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.
背景:“罕见病合作”项目等多站点研究网络将各医院连接起来,以获得临床研究所需的足够数据。然而,对于不同卫生信息系统中记录的数据的二次使用,数据质量仍然是一个挑战。需要高水平的DQ以及适当的质量评估方法来支持这种分布式数据的重用。目的:这项工作的目的是开发一种可互操作的方法来评估异质来源记录的数据质量,以提高罕见病(RD)文献的质量并支持临床研究。方法:我们首先开发了DQ评估的概念框架。使用这一理论指导,我们实现了一个软件框架,该框架为计算DQ度量和生成本地以及跨机构报告提供了适当的工具。我们进一步将我们的方法应用于使用Personal Health Train分布在多家医院的合成数据。最后,我们使用精确度和召回率作为度量来验证我们的实现。结果:定义了四个DQ维度,并将其表示为不相交的本体范畴。基于这些顶级维度,我们开发了9个DQ概念、10个DQ指标和25个DQ参数,并将其应用于不同的数据集。随机引入的DQ问题都被自动识别和报告。生成的报告显示结果DQ指示器和检测到的DQ问题。结论:我们已经表明,我们的方法产生了有希望的结果,可用于本地和跨机构DQ评估。所开发的框架为满足指定需求的DQ互操作和隐私保护评估提供了有用的方法。这项研究表明,我们的方法是能够检测DQ问题,如编码诊断的歧义或不可信。它可以用于DQ基准测试,以提高RD文档的质量,并支持分布式数据的临床研究。
{"title":"Rare Diseases in Hospital Information Systems-An Interoperable Methodology for Distributed Data Quality Assessments.","authors":"Kais Tahar, Tamara Martin, Yongli Mou, Raphael Verbuecheln, Holm Graessner, Dagmar Krefting","doi":"10.1055/a-2006-1018","DOIUrl":"https://doi.org/10.1055/a-2006-1018","url":null,"abstract":"<p><strong>Background: </strong>Multisite research networks such as the project \"Collaboration on Rare Diseases\" connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.</p><p><strong>Objectives: </strong>The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.</p><p><strong>Methods: </strong>We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.</p><p><strong>Results: </strong>Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.</p><p><strong>Conclusion: </strong>We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"71-89"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10138370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgos Koliopanos, Francisco Ojeda, Andreas Ziegler
Background: Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.
Objectives: The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.
Methods: The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.
Results: modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.
Conclusion: The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.
{"title":"A Simple-to-Use R Package for Mimicking Study Data by Simulations.","authors":"Giorgos Koliopanos, Francisco Ojeda, Andreas Ziegler","doi":"10.1055/a-2048-7692","DOIUrl":"https://doi.org/10.1055/a-2048-7692","url":null,"abstract":"<p><strong>Background: </strong>Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.</p><p><strong>Objectives: </strong>The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.</p><p><strong>Methods: </strong>The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.</p><p><strong>Results: </strong>modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.</p><p><strong>Conclusion: </strong>The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"119-129"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/75/40/10-1055-a-2048-7692.PMC10462429.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10492948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, Debbie Rankin
Background: Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.
Objective: The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.
Methods: Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories ("Excellent," "Good," and "Poor"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.
Results: The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.
Conclusion: The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.
{"title":"Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions.","authors":"Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, Debbie Rankin","doi":"10.1055/s-0042-1760247","DOIUrl":"https://doi.org/10.1055/s-0042-1760247","url":null,"abstract":"<p><strong>Background: </strong>Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.</p><p><strong>Objective: </strong>The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.</p><p><strong>Methods: </strong>Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories (\"<i>Excellent,</i>\" \"<i>Good,</i>\" and \"<i>Poor</i>\"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.</p><p><strong>Results: </strong>The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.</p><p><strong>Conclusion: </strong>The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e19-e38"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/31/67/10-1055-s-0042-1760247.PMC10306449.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: FAIR Guiding Principles present a synergy with the use cases for digital health records, in that clinical data need to be found, accessible within a range of environments, and data must interoperate between systems and subsequently reused. The use of HL7 FHIR, openEHR, IHE XDS, and SNOMED CT (FOXS) together represents a specification to create an open digital health platform for modern health care applications.
Objectives: To describe where logical FOXS components align to the European Open Science Cloud Interoperability Framework (EOSC-IF) reference architecture for semantic interoperability. This should provide a means of defining if FOXS aligns to FAIR principles and to establish the data models and structures that support longitudinal care records as being fit to underpin scientific research.
Methods: The EOSC-IF Semantic View is a representation of semantic interoperability where meaning is preserved between systems and users. This was analyzed and cross-referenced with FOXS architectural components, mapping concepts, and objects that describe content such as catalogues and semantic artifacts.
Results: Majority of conceptual Semantic View components were featured within the FOXS architecture. Semantic Business Objects are composed of a range of elements such as openEHR archetypes and templates, FHIR resources and profiles, SNOMED CT concepts, and XDS document identifiers. Semantic Functional Content comprises catalogues of metadata that were also supported by openEHR and FHIR tools.
Conclusions: Despite some elements of EOSC-IF being vague (e.g., FAIR Digital Object), there was a broad conformance to the framework concepts and the components of a FOXS platform. This work supports a health-domain-specific view of semantic interoperability and how this may be achieved to support FAIR data for health research via a standardized framework.
{"title":"Aligning Semantic Interoperability Frameworks with the FOXS Stack for FAIR Health Data.","authors":"John Meredith, Nicola Whitehead, Michael Dacey","doi":"10.1055/a-1993-8036","DOIUrl":"https://doi.org/10.1055/a-1993-8036","url":null,"abstract":"<p><strong>Background: </strong>FAIR Guiding Principles present a synergy with the use cases for digital health records, in that clinical data need to be found, accessible within a range of environments, and data must interoperate between systems and subsequently reused. The use of HL7 FHIR, openEHR, IHE XDS, and SNOMED CT (FOXS) together represents a specification to create an open digital health platform for modern health care applications.</p><p><strong>Objectives: </strong>To describe where logical FOXS components align to the European Open Science Cloud Interoperability Framework (EOSC-IF) reference architecture for semantic interoperability. This should provide a means of defining if FOXS aligns to FAIR principles and to establish the data models and structures that support longitudinal care records as being fit to underpin scientific research.</p><p><strong>Methods: </strong>The EOSC-IF Semantic View is a representation of semantic interoperability where meaning is preserved between systems and users. This was analyzed and cross-referenced with FOXS architectural components, mapping concepts, and objects that describe content such as catalogues and semantic artifacts.</p><p><strong>Results: </strong>Majority of conceptual Semantic View components were featured within the FOXS architecture. Semantic Business Objects are composed of a range of elements such as openEHR archetypes and templates, FHIR resources and profiles, SNOMED CT concepts, and XDS document identifiers. Semantic Functional Content comprises catalogues of metadata that were also supported by openEHR and FHIR tools.</p><p><strong>Conclusions: </strong>Despite some elements of EOSC-IF being vague (e.g., FAIR Digital Object), there was a broad conformance to the framework concepts and the components of a FOXS platform. This work supports a health-domain-specific view of semantic interoperability and how this may be achieved to support FAIR data for health research via a standardized framework.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e39-e46"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a3/76/10-1055-a-1993-8036.PMC10306448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9786736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henriette Rau, Dana Stahl, Anna-Juliana Reichel, Martin Bialke, Thomas Bahls, Wolfgang Hoffmann
Introduction: The informed consent is the legal basis for research with human subjects. Therefore, the consent form (CF) as legally binding document must be valid, that is, be completely filled-in stating the person's decision clearly and signed by the respective person. However, especially paper-based CFs might have quality issues and the transformation into machine-readable information could add to low quality. This paper evaluates the quality and arising quality issues of paper-based CFs using the example of the Baltic Fracture Competence Centre (BFCC) fracture registry. It also evaluates the impact of quality assurance (QA) measures including giving site-specific feedback. Finally, it answers the question whether manual data entry of patients' decisions by clinical staff leads to a significant error rate in digitalized paper-based CFs.
Methods: Based on defined quality criteria, monthly QA including source data verification was conducted by two individual reviewers since the start of recruitment in December 2017. Basis for the analyses are the CFs collected from December 2017 until February 2019 (first recruitment period).
Results: After conducting QA internally, the sudden increase of quality issues in May 2018 led to site-specific feedback reports and follow-up training regarding the CFs' quality starting in June 2018. Specific criteria and descriptions on how to correct the CFs helped in increasing the quality in a timely matter. Most common issues were missing pages, decisions regarding optional modules, and signature(s). Since patients' datasets without valid CFs must be deleted, QA helped in retaining 65 datasets for research so that the final datapool consisted of 840 (99.29%) patients.
Conclusion: All quality issues could be assigned to one predefined criterion. Using the example of the BFCC fracture registry, CF-QA proved to significantly increase CF quality and help retain the number of available datasets for research. Consequently, the described quality indicators, criteria, and QA processes can be seen as the best practice approach.
{"title":"We Know What You Agreed To, Don't We?-Evaluating the Quality of Paper-Based Consents Forms and Their Digitalized Equivalent Using the Example of the Baltic Fracture Competence Centre Project.","authors":"Henriette Rau, Dana Stahl, Anna-Juliana Reichel, Martin Bialke, Thomas Bahls, Wolfgang Hoffmann","doi":"10.1055/s-0042-1760249","DOIUrl":"https://doi.org/10.1055/s-0042-1760249","url":null,"abstract":"<p><strong>Introduction: </strong>The informed consent is the legal basis for research with human subjects. Therefore, the consent form (CF) as legally binding document must be valid, that is, be completely filled-in stating the person's decision clearly and signed by the respective person. However, especially paper-based CFs might have quality issues and the transformation into machine-readable information could add to low quality. This paper evaluates the quality and arising quality issues of paper-based CFs using the example of the Baltic Fracture Competence Centre (BFCC) fracture registry. It also evaluates the impact of quality assurance (QA) measures including giving site-specific feedback. Finally, it answers the question whether manual data entry of patients' decisions by clinical staff leads to a significant error rate in digitalized paper-based CFs.</p><p><strong>Methods: </strong>Based on defined quality criteria, monthly QA including source data verification was conducted by two individual reviewers since the start of recruitment in December 2017. Basis for the analyses are the CFs collected from December 2017 until February 2019 (first recruitment period).</p><p><strong>Results: </strong>After conducting QA internally, the sudden increase of quality issues in May 2018 led to site-specific feedback reports and follow-up training regarding the CFs' quality starting in June 2018. Specific criteria and descriptions on how to correct the CFs helped in increasing the quality in a timely matter. Most common issues were missing pages, decisions regarding optional modules, and signature(s). Since patients' datasets without valid CFs must be deleted, QA helped in retaining 65 datasets for research so that the final datapool consisted of 840 (99.29%) patients.</p><p><strong>Conclusion: </strong>All quality issues could be assigned to one predefined criterion. Using the example of the BFCC fracture registry, CF-QA proved to significantly increase CF quality and help retain the number of available datasets for research. Consequently, the described quality indicators, criteria, and QA processes can be seen as the best practice approach.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e10-e18"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/05/82/10-1055-s-0042-1760249.PMC10306442.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khalid O Yusuf, Olga Miljukov, Anne Schoneberg, Sabine Hanß, Martin Wiesenfeldt, Melanie Stecher, Lazar Mitrov, Sina Marie Hopff, Sarah Steinbrecher, Florian Kurth, Thomas Bahmer, Stefan Schreiber, Daniel Pape, Anna-Lena Hofmann, Mirjam Kohls, Stefan Störk, Hans Christian Stubbe, Johannes J Tebbe, Johannes C Hellmuth, Johanna Erber, Lilian Krist, Siegbert Rieg, Lisa Pilgram, Jörg J Vehreschild, Jens-Peter Reese, Dagmar Krefting
Background: As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.
Objectives: The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.
Methods: All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their-defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.
Results: Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.
Conclusion: An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.
{"title":"Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections.","authors":"Khalid O Yusuf, Olga Miljukov, Anne Schoneberg, Sabine Hanß, Martin Wiesenfeldt, Melanie Stecher, Lazar Mitrov, Sina Marie Hopff, Sarah Steinbrecher, Florian Kurth, Thomas Bahmer, Stefan Schreiber, Daniel Pape, Anna-Lena Hofmann, Mirjam Kohls, Stefan Störk, Hans Christian Stubbe, Johannes J Tebbe, Johannes C Hellmuth, Johanna Erber, Lilian Krist, Siegbert Rieg, Lisa Pilgram, Jörg J Vehreschild, Jens-Peter Reese, Dagmar Krefting","doi":"10.1055/a-2006-1086","DOIUrl":"https://doi.org/10.1055/a-2006-1086","url":null,"abstract":"<p><strong>Background: </strong>As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.</p><p><strong>Objectives: </strong>The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.</p><p><strong>Methods: </strong>All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their<i>-</i>defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.</p><p><strong>Results: </strong>Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.</p><p><strong>Conclusion: </strong>An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e47-e56"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/4d/05/10-1055-a-2006-1086.PMC10306447.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9842097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Data quality issues can cause false decisions of clinical decision support systems (CDSSs). Analyzing local data quality has the potential to prevent data quality-related failure of CDSS adoption.
Objectives: To define a shareable set of applicable measurement methods (MMs) for a targeted data quality assessment determining the suitability of local data for our CDSS.
Methods: We derived task-specific MMs using four approaches: (1) a GUI-based data quality analysis using the open source tool openCQA. (2) Analyzing cases of known false CDSS decisions. (3) Data-driven learning on MM-results. (4) A systematic check to find blind spots in our set of MMs based on the HIDQF data quality framework. We expressed the derived data quality-related knowledge about the CDSS using the 5-tuple-formalization for MMs.
Results: We identified some task-specific dataset characteristics that a targeted data quality assessment for our use case should inspect. Altogether, we defined 394 MMs organized in 13 data quality knowledge bases.
Conclusions: We have created a set of shareable, applicable MMs that can support targeted data quality assessment for CDSS-based systemic inflammatory response syndrome (SIRS) detection in critically ill, pediatric patients. With the demonstrated approaches for deriving and expressing task-specific MMs, we intend to help promoting targeted data quality assessment as a commonly recognized usual part of research on data-consuming application systems in health care.
{"title":"Targeted Data Quality Analysis for a Clinical Decision Support System for SIRS Detection in Critically Ill Pediatric Patients.","authors":"Erik Tute, Marcel Mast, Antje Wulff","doi":"10.1055/s-0042-1760238","DOIUrl":"https://doi.org/10.1055/s-0042-1760238","url":null,"abstract":"<p><strong>Background: </strong>Data quality issues can cause false decisions of clinical decision support systems (CDSSs). Analyzing local data quality has the potential to prevent data quality-related failure of CDSS adoption.</p><p><strong>Objectives: </strong>To define a shareable set of applicable measurement methods (MMs) for a targeted data quality assessment determining the suitability of local data for our CDSS.</p><p><strong>Methods: </strong>We derived task-specific MMs using four approaches: (1) a GUI-based data quality analysis using the open source tool <i>openCQA</i>. (2) Analyzing cases of known false CDSS decisions. (3) Data-driven learning on MM-results. (4) A systematic check to find blind spots in our set of MMs based on the <i>HIDQF</i> data quality framework. We expressed the derived data quality-related knowledge about the CDSS using the 5-tuple-formalization for MMs.</p><p><strong>Results: </strong>We identified some task-specific dataset characteristics that a targeted data quality assessment for our use case should inspect. Altogether, we defined 394 MMs organized in 13 data quality knowledge bases.</p><p><strong>Conclusions: </strong>We have created a set of shareable, applicable MMs that can support targeted data quality assessment for CDSS-based systemic inflammatory response syndrome (SIRS) detection in critically ill, pediatric patients. With the demonstrated approaches for deriving and expressing task-specific MMs, we intend to help promoting targeted data quality assessment as a commonly recognized usual part of research on data-consuming application systems in health care.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e1-e9"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/23/e5/10-1055-s-0042-1760238.PMC10306443.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10163000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Current information systems do not effectively support nurse managers' duties, such as reporting, resource management, and assessing clinical performance. Few performance management information systems are available and features in many are scattered.
Objectives: The purpose of the study was to determine nurse managers' opinions of information system support for performance management.
Methods: An online questionnaire was used to collect data from nurse managers (n = 419). Pearson's correlation coefficients and linear regression were used to examine the relationships between variables, which were nurse managers' ability to manage resources, to report and evaluate productivity, and to assess nursing performance and clinical procedures.
Results: More than half of the managers used performance management systems daily. Managers (60%) felt that they can use information systems to follow the use of physical resources, and in general (63%), they felt that it is easy to perform searches with the systems used for following up activity. Nurse managers' ability to manage resources, to report productivity, and to assess nursing care performance were correlated significantly with each other.
Conclusion: Currently, managers have to collect data from various systems for management purposes, as system integration does not support performance data collection. The availability of continuous in-service training had a positive effect on information system use.
{"title":"Nurse Managers' Opinions of Information System Support for Performance Management: A Correlational Study.","authors":"Kaija Saranto, Samuli Koponen, Tuulikki Vehko, Eija Kivekäs","doi":"10.1055/a-1978-9727","DOIUrl":"https://doi.org/10.1055/a-1978-9727","url":null,"abstract":"<p><strong>Background: </strong>Current information systems do not effectively support nurse managers' duties, such as reporting, resource management, and assessing clinical performance. Few performance management information systems are available and features in many are scattered.</p><p><strong>Objectives: </strong>The purpose of the study was to determine nurse managers' opinions of information system support for performance management.</p><p><strong>Methods: </strong>An online questionnaire was used to collect data from nurse managers (<i>n</i> = 419). Pearson's correlation coefficients and linear regression were used to examine the relationships between variables, which were nurse managers' ability to manage resources, to report and evaluate productivity, and to assess nursing performance and clinical procedures.</p><p><strong>Results: </strong>More than half of the managers used performance management systems daily. Managers (60%) felt that they can use information systems to follow the use of physical resources, and in general (63%), they felt that it is easy to perform searches with the systems used for following up activity. Nurse managers' ability to manage resources, to report productivity, and to assess nursing care performance were correlated significantly with each other.</p><p><strong>Conclusion: </strong>Currently, managers have to collect data from various systems for management purposes, as system integration does not support performance data collection. The availability of continuous in-service training had a positive effect on information system use.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e63-e72"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b2/85/10-1055-a-1978-9727.PMC10306445.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9786706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}