JMIR Medical Informatics最新文献_第3页

Smart Contracts and Shared Platforms in Sustainable Health Care: Systematic Review.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-31 DOI: 10.2196/58575

Carlos Antonio Marino, Claudia Diaz Paz

Background: The benefits of smart contracts (SCs) for sustainable health care are a relatively recent topic that has gathered attention given its relationship with trust and the advantages of decentralization, immutability, and traceability introduced in health care. Nevertheless, more studies need to explore the role of SCs in this sector based on the frameworks propounded in the literature that reflect business logic that has been customized, automatized, and prioritized, as well as system trust. This study addressed this lacuna.

Objective: This study aimed to provide a comprehensive understanding of SCs in health care based on reviewing the frameworks propounded in the literature.

Methods: A structured literature review was performed based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) principles. One database-Web of Science (WoS)-was selected to avoid bias generated by database differences and data wrangling. A quantitative assessment of the studies based on machine learning and data reduction methodologies was complemented with a qualitative, in-depth, detailed review of the frameworks propounded in the literature.

Results: A total of 70 studies, which constituted 18.7% (70/374) of the studies on this subject, met the selection criteria and were analyzed. A multiple correspondence analysis-with 74.44% of the inertia-produced 3 factors describing the advances in the topic. Two of them referred to the leading roles of SCs: (1) health care process enhancement and (2) assurance of patients' privacy protection. The first role included 6 themes, and the second one included 3 themes. The third factor encompassed the technical features that improve system efficiency. The in-depth review of these 3 factors and the identification of stakeholders allowed us to characterize the system trust in health care SCs. We assessed the risk of coverage bias, and good percentages of overlap were obtained-66% (49/74) of PubMed articles were also in WoS, and 88.3% (181/205) of WoS articles also appeared in Scopus.

Conclusions: This comprehensive review allows us to understand the relevance of SCs and the potentiality of their use in patient-centric health care that considers more than technical aspects. It also provides insights for further research based on specific stakeholders, locations, and behaviors.

{"title":"Smart Contracts and Shared Platforms in Sustainable Health Care: Systematic Review.","authors":"Carlos Antonio Marino, Claudia Diaz Paz","doi":"10.2196/58575","DOIUrl":"10.2196/58575","url":null,"abstract":"Background: The benefits of smart contracts (SCs) for sustainable health care are a relatively recent topic that has gathered attention given its relationship with trust and the advantages of decentralization, immutability, and traceability introduced in health care. Nevertheless, more studies need to explore the role of SCs in this sector based on the frameworks propounded in the literature that reflect business logic that has been customized, automatized, and prioritized, as well as system trust. This study addressed this lacuna.Objective: This study aimed to provide a comprehensive understanding of SCs in health care based on reviewing the frameworks propounded in the literature.Methods: A structured literature review was performed based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) principles. One database-Web of Science (WoS)-was selected to avoid bias generated by database differences and data wrangling. A quantitative assessment of the studies based on machine learning and data reduction methodologies was complemented with a qualitative, in-depth, detailed review of the frameworks propounded in the literature.Results: A total of 70 studies, which constituted 18.7% (70/374) of the studies on this subject, met the selection criteria and were analyzed. A multiple correspondence analysis-with 74.44% of the inertia-produced 3 factors describing the advances in the topic. Two of them referred to the leading roles of SCs: (1) health care process enhancement and (2) assurance of patients' privacy protection. The first role included 6 themes, and the second one included 3 themes. The third factor encompassed the technical features that improve system efficiency. The in-depth review of these 3 factors and the identification of stakeholders allowed us to characterize the system trust in health care SCs. We assessed the risk of coverage bias, and good percentages of overlap were obtained-66% (49/74) of PubMed articles were also in WoS, and 88.3% (181/205) of WoS articles also appeared in Scopus.Conclusions: This comprehensive review allows us to understand the relevance of SCs and the potentiality of their use in patient-centric health care that considers more than technical aspects. It also provides insights for further research based on specific stakeholders, locations, and behaviors.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58575"},"PeriodicalIF":3.1,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874880/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143071269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards interoperable digital medication records on FHIR: development and technical validation of a minimal core dataset.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-31 DOI: 10.2196/64099

Eduardo Salgado-Baez, Raphael Heidepriem, Renate Delucchi Danhier, Eugenia Rinaldi, Vishnu Ravi, Akira-Sebastian Poncette, Iris Dahlhaus, Daniel Fürstenau, Felix Balzer, Sylvia Thun, Julian Sass

Background: Medication errors represent a widespread, hazardous, and costly challenge in healthcare settings. The lack of interoperable medication data within and across hospitals not only creates administrative burden through redundant data entry but also increases the risk of errors due to human mistakes, imprecise data transformations, and misinterpretations. While digital solutions exist, fragmented systems and non-standardized data continue to hinder effective medication management.

Objective: This study aimed to assess medication data available across the multiple systems of a large university hospital, identify a minimum data set with the most relevant information and propose a standard interoperable FHIR-based solution that can import and transfer information from a standardized drug master database to various target systems.

Methods: Medication data from all relevant departments of a large German hospital were thoroughly analyzed. To ensure interoperability, data elements for developing a minimum dataset were defined based on relevant medication identifiers, the Health Level 7 Fast Health Interoperability Resources (HL7® FHIR®) standard, and the German Medical Informatics Initiative (MII) specifications. The dataset was further enriched with information from Germany's most comprehensive drug database and European Standard Drug Terms (EDQM) to enhance medication identification accuracy. Finally, data on 60 frequently used medications within the institution was systematically extracted from multiple medication systems and integrated into a newly structured, dedicated database.

Results: The analysis of all the available medication datasets within the institution identified 7,964 drugs. However, limited interoperability was observed due to a fragmented local IT infrastructure and challenges in medication data standardization. Data integrated and available in the new structured medication dataset with key elements to ensure data identification accuracy and interoperability, successfully enabled the generation of medication order messages, ensuring medication interoperability and standardized data exchange.

Conclusions: Our approach addresses the lack of interoperability in medication data and the need for standardized data exchange. We propose a minimum set of data elements aligned with German and international coding systems, to be used in combination with the FHIR standard for processes such as the digital transfer of discharge medication prescriptions from intensive care units to general wards, which can help to reduce medication errors and enhance patient safety.

{"title":"Towards interoperable digital medication records on FHIR: development and technical validation of a minimal core dataset.","authors":"Eduardo Salgado-Baez, Raphael Heidepriem, Renate Delucchi Danhier, Eugenia Rinaldi, Vishnu Ravi, Akira-Sebastian Poncette, Iris Dahlhaus, Daniel Fürstenau, Felix Balzer, Sylvia Thun, Julian Sass","doi":"10.2196/64099","DOIUrl":"https://doi.org/10.2196/64099","url":null,"abstract":"Background: Medication errors represent a widespread, hazardous, and costly challenge in healthcare settings. The lack of interoperable medication data within and across hospitals not only creates administrative burden through redundant data entry but also increases the risk of errors due to human mistakes, imprecise data transformations, and misinterpretations. While digital solutions exist, fragmented systems and non-standardized data continue to hinder effective medication management.Objective: This study aimed to assess medication data available across the multiple systems of a large university hospital, identify a minimum data set with the most relevant information and propose a standard interoperable FHIR-based solution that can import and transfer information from a standardized drug master database to various target systems.Methods: Medication data from all relevant departments of a large German hospital were thoroughly analyzed. To ensure interoperability, data elements for developing a minimum dataset were defined based on relevant medication identifiers, the Health Level 7 Fast Health Interoperability Resources (HL7® FHIR®) standard, and the German Medical Informatics Initiative (MII) specifications. The dataset was further enriched with information from Germany's most comprehensive drug database and European Standard Drug Terms (EDQM) to enhance medication identification accuracy. Finally, data on 60 frequently used medications within the institution was systematically extracted from multiple medication systems and integrated into a newly structured, dedicated database.Results: The analysis of all the available medication datasets within the institution identified 7,964 drugs. However, limited interoperability was observed due to a fragmented local IT infrastructure and challenges in medication data standardization. Data integrated and available in the new structured medication dataset with key elements to ensure data identification accuracy and interoperability, successfully enabled the generation of medication order messages, ensuring medication interoperability and standardized data exchange.Conclusions: Our approach addresses the lack of interoperability in medication data and the need for standardized data exchange. We propose a minimum set of data elements aligned with German and international coding systems, to be used in combination with the FHIR standard for processes such as the digital transfer of discharge medication prescriptions from intensive care units to general wards, which can help to reduce medication errors and enhance patient safety.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-29 DOI: 10.2196/63109

Faisal Ghaffar, Nadine M Furtado, Imad Ali, Catherine Burns

Background: While expert optometrists tend to rely on a deep understanding of the disease and intuitive pattern recognition, those with less experience may depend more on extensive data, comparisons, and external guidance. Understanding these variations is important for developing artificial intelligence (AI) systems that can effectively support optometrists with varying degrees of experience and minimize decision inconsistencies.Objective: The main objective of this study is to identify and analyze the variations in diagnostic decision-making approaches between novice and expert optometrists. By understanding these variations, we aim to provide guidelines for the development of AI systems that can support optometrists with varying levels of expertise. These guidelines will assist in developing AI systems for glaucoma diagnosis, ultimately enhancing the diagnostic accuracy of optometrists and minimizing inconsistencies in their decisions.Methods: We conducted in-depth interviews with 14 optometrists using within-subject design, including both novices and experts, focusing on their approaches to glaucoma diagnosis. The responses were coded and analyzed using a mixed method approach incorporating both qualitative and quantitative analysis. Statistical tests such as Mann-Whitney U and chi-square tests were used to find significance in intergroup variations. These findings were further supported by themes extracted through qualitative analysis, which helped to identify decision-making patterns and understand variations in their approaches.Results: Both groups showed lower concordance rates with clinical diagnosis, with experts showing almost double (7/35, 20%) concordance rates with limited data in comparison to novices (7/69, 10%), highlighting the impact of experience and data availability on clinical judgment; this rate increased to nearly 40% for both groups (experts: 5/12, 42% and novices: 8/21, 42%) when they had access to complete historical data of the patient. We also found statistically significant intergroup differences between the first visits and subsequent visits with a P value of less than .05 on the Mann-Whitney U test in many assessments. Furthermore, approaches to the exam assessment and decision differed significantly: experts emphasized comprehensive risk assessments and progression analysis, demonstrating cognitive efficiency and intuitive decision-making, while novices relied more on structured, analytical methods and external references. Additionally, significant variations in patient follow-up times were observed, with a P value of <.001 on the chi-square test, showing a stronger influence of experience on follow-up time decisions.Conclusions: The study highlights significant variations in the decision-making process of novice and expert optometrists in glaucoma diagnosis, with experience playing a key role in ac

{"title":"Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design.","authors":"Faisal Ghaffar, Nadine M Furtado, Imad Ali, Catherine Burns","doi":"10.2196/63109","DOIUrl":"10.2196/63109","url":null,"abstract":"Background: While expert optometrists tend to rely on a deep understanding of the disease and intuitive pattern recognition, those with less experience may depend more on extensive data, comparisons, and external guidance. Understanding these variations is important for developing artificial intelligence (AI) systems that can effectively support optometrists with varying degrees of experience and minimize decision inconsistencies.Objective: The main objective of this study is to identify and analyze the variations in diagnostic decision-making approaches between novice and expert optometrists. By understanding these variations, we aim to provide guidelines for the development of AI systems that can support optometrists with varying levels of expertise. These guidelines will assist in developing AI systems for glaucoma diagnosis, ultimately enhancing the diagnostic accuracy of optometrists and minimizing inconsistencies in their decisions.Methods: We conducted in-depth interviews with 14 optometrists using within-subject design, including both novices and experts, focusing on their approaches to glaucoma diagnosis. The responses were coded and analyzed using a mixed method approach incorporating both qualitative and quantitative analysis. Statistical tests such as Mann-Whitney U and chi-square tests were used to find significance in intergroup variations. These findings were further supported by themes extracted through qualitative analysis, which helped to identify decision-making patterns and understand variations in their approaches.Results: Both groups showed lower concordance rates with clinical diagnosis, with experts showing almost double (7/35, 20%) concordance rates with limited data in comparison to novices (7/69, 10%), highlighting the impact of experience and data availability on clinical judgment; this rate increased to nearly 40% for both groups (experts: 5/12, 42% and novices: 8/21, 42%) when they had access to complete historical data of the patient. We also found statistically significant intergroup differences between the first visits and subsequent visits with a P value of less than .05 on the Mann-Whitney U test in many assessments. Furthermore, approaches to the exam assessment and decision differed significantly: experts emphasized comprehensive risk assessments and progression analysis, demonstrating cognitive efficiency and intuitive decision-making, while novices relied more on structured, analytical methods and external references. Additionally, significant variations in patient follow-up times were observed, with a P value of <.001 on the chi-square test, showing a stronger influence of experience on follow-up time decisions.Conclusions: The study highlights significant variations in the decision-making process of novice and expert optometrists in glaucoma diagnosis, with experience playing a key role in ac","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e63109"},"PeriodicalIF":3.1,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Digital Representation of Patients as Medical Digital Twins: Data-Centric Viewpoint.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-28 DOI: 10.2196/53542

Stanislas Demuth, Jérôme De Sèze, Gilles Edan, Tjalf Ziemssen, Françoise Simon, Pierre-Antoine Gourraud

Unlabelled: Precision medicine involves a paradigm shift toward personalized data-driven clinical decisions. The concept of a medical "digital twin" has recently become popular to designate digital representations of patients as a support for a wide range of data science applications. However, the concept is ambiguous when it comes to practical implementations. Here, we propose a medical digital twin framework with a data-centric approach. We argue that a single digital representation of patients cannot support all the data uses of digital twins for technical and regulatory reasons. Instead, we propose a data architecture leveraging three main families of digital representations: (1) multimodal dashboards integrating various raw health records at points of care to assist with perception and documentation, (2) virtual patients, which provide nonsensitive data for collective secondary uses, and (3) individual predictions that support clinical decisions. For a given patient, multiple digital representations may be generated according to the different clinical pathways the patient goes through, each tailored to balance the trade-offs associated with the respective intended uses. Therefore, our proposed framework conceives the medical digital twin as a data architecture leveraging several digital representations of patients along clinical pathways.

{"title":"Digital Representation of Patients as Medical Digital Twins: Data-Centric Viewpoint.","authors":"Stanislas Demuth, Jérôme De Sèze, Gilles Edan, Tjalf Ziemssen, Françoise Simon, Pierre-Antoine Gourraud","doi":"10.2196/53542","DOIUrl":"10.2196/53542","url":null,"abstract":"Unlabelled: Precision medicine involves a paradigm shift toward personalized data-driven clinical decisions. The concept of a medical \"digital twin\" has recently become popular to designate digital representations of patients as a support for a wide range of data science applications. However, the concept is ambiguous when it comes to practical implementations. Here, we propose a medical digital twin framework with a data-centric approach. We argue that a single digital representation of patients cannot support all the data uses of digital twins for technical and regulatory reasons. Instead, we propose a data architecture leveraging three main families of digital representations: (1) multimodal dashboards integrating various raw health records at points of care to assist with perception and documentation, (2) virtual patients, which provide nonsensitive data for collective secondary uses, and (3) individual predictions that support clinical decisions. For a given patient, multiple digital representations may be generated according to the different clinical pathways the patient goes through, each tailored to balance the trade-offs associated with the respective intended uses. Therefore, our proposed framework conceives the medical digital twin as a data architecture leveraging several digital representations of patients along clinical pathways.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e53542"},"PeriodicalIF":3.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Preclinical Cognitive Markers of Alzheimer Disease and Early Diagnosis Using Virtual Reality and Artificial Intelligence: Literature Review.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-28 DOI: 10.2196/62914

María de la Paz Scribano Parada, Fátima González Palau, Sonia Valladares Rodríguez, Mariano Rincon, Maria José Rico Barroeta, Marta García Rodriguez, Yolanda Bueno Aguado, Ana Herrero Blanco, Estela Díaz-López, Margarita Bachiller Mayoral, Raquel Losada Durán

Background: This review explores the potential of virtual reality (VR) and artificial intelligence (AI) to identify preclinical cognitive markers of Alzheimer disease (AD). By synthesizing recent studies, it aims to advance early diagnostic methods to detect AD before significant symptoms occur.

Objective: Research emphasizes the significance of early detection in AD during the preclinical phase, which does not involve cognitive impairment but nevertheless requires reliable biomarkers. Current biomarkers face challenges, prompting the exploration of cognitive behavior indicators beyond episodic memory.

Methods: Using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we searched Scopus, PubMed, and Google Scholar for studies on neuropsychiatric disorders utilizing conversational data.

Results: Following an analysis of 38 selected articles, we highlight verbal episodic memory as a sensitive preclinical AD marker, with supporting evidence from neuroimaging and genetic profiling. Executive functions precede memory decline, while processing speed is a significant correlate. The potential of VR remains underexplored, and AI algorithms offer a multidimensional approach to early neurocognitive disorder diagnosis.

Conclusions: Emerging technologies like VR and AI show promise for preclinical diagnostics, but thorough validation and regulation for clinical safety and efficacy are necessary. Continued technological advancements are expected to enhance early detection and management of AD.

{"title":"Preclinical Cognitive Markers of Alzheimer Disease and Early Diagnosis Using Virtual Reality and Artificial Intelligence: Literature Review.","authors":"María de la Paz Scribano Parada, Fátima González Palau, Sonia Valladares Rodríguez, Mariano Rincon, Maria José Rico Barroeta, Marta García Rodriguez, Yolanda Bueno Aguado, Ana Herrero Blanco, Estela Díaz-López, Margarita Bachiller Mayoral, Raquel Losada Durán","doi":"10.2196/62914","DOIUrl":"10.2196/62914","url":null,"abstract":"Background: This review explores the potential of virtual reality (VR) and artificial intelligence (AI) to identify preclinical cognitive markers of Alzheimer disease (AD). By synthesizing recent studies, it aims to advance early diagnostic methods to detect AD before significant symptoms occur.Objective: Research emphasizes the significance of early detection in AD during the preclinical phase, which does not involve cognitive impairment but nevertheless requires reliable biomarkers. Current biomarkers face challenges, prompting the exploration of cognitive behavior indicators beyond episodic memory.Methods: Using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we searched Scopus, PubMed, and Google Scholar for studies on neuropsychiatric disorders utilizing conversational data.Results: Following an analysis of 38 selected articles, we highlight verbal episodic memory as a sensitive preclinical AD marker, with supporting evidence from neuroimaging and genetic profiling. Executive functions precede memory decline, while processing speed is a significant correlate. The potential of VR remains underexplored, and AI algorithms offer a multidimensional approach to early neurocognitive disorder diagnosis.Conclusions: Emerging technologies like VR and AI show promise for preclinical diagnostics, but thorough validation and regulation for clinical safety and efficacy are necessary. Continued technological advancements are expected to enhance early detection and management of AD.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e62914"},"PeriodicalIF":3.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-28 DOI: 10.2196/59452

Theresa Willem, Alessandro Wollek, Theodor Cheslerean-Boghiu, Martha Kenney, Alena Buyx

Background: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.Objective: This study aimed to explore categorical data's effects on machine learning model outputs, rooted the effects in the data collection and dataset publication processes, and proposed a mixed methods approach to examining datasets' data categories before using them for machine learning training.Methods: Against the theoretical background of the social construction of categories, we suggest a mixed methods approach to assess categorical data's utility for machine learning model training. As an example, we applied our approach to a Brazilian dermatological dataset (Dermatological and Surgical Assistance Program at the Federal University of Espírito Santo [PAD-UFES] 20). We first present an exploratory, quantitative study that assesses the effects when including or excluding each of the unique categorical data features of the PAD-UFES 20 dataset for training a transformer-based model using a data fusion algorithm. We then pair our quantitative analysis with a qualitative examination of the data categories based on interviews with the dataset authors.Results: Our quantitative study suggests scattered effects of including categorical data for machine learning model training across predictive classes. Our qualitative analysis gives insights into how the categorical data were collected and why they were published, explaining some of the quantitative effects that we observed. Our findings highlight the social constructedness of categorical data in publicly available datasets, meaning that the data in a category heavily depend on both how these categories are defined by the dataset creators and the sociomedico context in which the data are collected. This reveals relevant limitations of using publicly available datasets in contexts different from those of the collection of their data.Conclusions: We caution against using data features of publicly available datasets without reflection on the social construction and context dependency of their categorical data features, particularly in data-sparse areas. We conclude that social scientific, context-dependent analysis of available data features using both quantitative and qualitative methods is helpful in judging the utility of categorical data for the population for wh

{"title":"The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets.","authors":"Theresa Willem, Alessandro Wollek, Theodor Cheslerean-Boghiu, Martha Kenney, Alena Buyx","doi":"10.2196/59452","DOIUrl":"10.2196/59452","url":null,"abstract":"Background: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.Objective: This study aimed to explore categorical data's effects on machine learning model outputs, rooted the effects in the data collection and dataset publication processes, and proposed a mixed methods approach to examining datasets' data categories before using them for machine learning training.Methods: Against the theoretical background of the social construction of categories, we suggest a mixed methods approach to assess categorical data's utility for machine learning model training. As an example, we applied our approach to a Brazilian dermatological dataset (Dermatological and Surgical Assistance Program at the Federal University of Espírito Santo [PAD-UFES] 20). We first present an exploratory, quantitative study that assesses the effects when including or excluding each of the unique categorical data features of the PAD-UFES 20 dataset for training a transformer-based model using a data fusion algorithm. We then pair our quantitative analysis with a qualitative examination of the data categories based on interviews with the dataset authors.Results: Our quantitative study suggests scattered effects of including categorical data for machine learning model training across predictive classes. Our qualitative analysis gives insights into how the categorical data were collected and why they were published, explaining some of the quantitative effects that we observed. Our findings highlight the social constructedness of categorical data in publicly available datasets, meaning that the data in a category heavily depend on both how these categories are defined by the dataset creators and the sociomedico context in which the data are collected. This reveals relevant limitations of using publicly available datasets in contexts different from those of the collection of their data.Conclusions: We caution against using data features of publicly available datasets without reflection on the social construction and context dependency of their categorical data features, particularly in data-sparse areas. We conclude that social scientific, context-dependent analysis of available data features using both quantitative and qualitative methods is helpful in judging the utility of categorical data for the population for wh","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e59452"},"PeriodicalIF":3.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11815297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Use of the FHTHWA Index as a Novel Approach for Predicting the Incidence of Diabetes in a Japanese Population Without Diabetes: Data Analysis Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-28 DOI: 10.2196/64992

Jiao Wang, Jianrong Chen, Ying Liu, Jixiong Xu

Background: Many tools have been developed to predict the risk of diabetes in a population without diabetes; however, these tools have shortcomings that include the omission of race, inclusion of variables that are not readily available to patients, and low sensitivity or specificity.

Objective: We aimed to develop and validate an easy, systematic index for predicting diabetes risk in the Asian population.

Methods: We collected the data from the NAGALA (NAfld [nonalcoholic fatty liver disease] in the Gifu Area, Longitudinal Analysis) database. The least absolute shrinkage and selection operator model was used to select potentially relevant features. Multiple Cox proportional hazard analysis was used to develop a model based on the training set.

Results: The final study population of 15464 participants had a mean age of 42 (range 18-79) years; 54.5% (8430) were men. The mean follow-up duration was 6.05 (SD 3.78) years. A total of 373 (2.41%) participants showed progression to diabetes during the follow-up period. Then, we established a novel parameter (the FHTHWA index), to evaluate the incidence of diabetes in a population without diabetes, comprising 6 parameters based on the training set. After multivariable adjustment, individuals in tertile 3 had a significantly higher rate of diabetes compared with those in tertile 1 (hazard ratio 32.141, 95% CI 11.545-89.476). Time receiver operating characteristic curve analyses showed that the FHTHWA index had high accuracy, with the area under the curve value being around 0.9 during the more than 12 years of follow-up.

Conclusions: This research successfully developed a diabetes risk assessment index tailored for the Japanese population by utilizing an extensive dataset and a wide range of indices. By categorizing the diabetes risk levels among Japanese individuals, this study offers a novel predictive tool for identifying potential patients, while also delivering valuable insights into diabetes prevention strategies for the healthy Japanese populace.

{"title":"Use of the FHTHWA Index as a Novel Approach for Predicting the Incidence of Diabetes in a Japanese Population Without Diabetes: Data Analysis Study.","authors":"Jiao Wang, Jianrong Chen, Ying Liu, Jixiong Xu","doi":"10.2196/64992","DOIUrl":"10.2196/64992","url":null,"abstract":"Background: Many tools have been developed to predict the risk of diabetes in a population without diabetes; however, these tools have shortcomings that include the omission of race, inclusion of variables that are not readily available to patients, and low sensitivity or specificity.Objective: We aimed to develop and validate an easy, systematic index for predicting diabetes risk in the Asian population.Methods: We collected the data from the NAGALA (NAfld [nonalcoholic fatty liver disease] in the Gifu Area, Longitudinal Analysis) database. The least absolute shrinkage and selection operator model was used to select potentially relevant features. Multiple Cox proportional hazard analysis was used to develop a model based on the training set.Results: The final study population of 15464 participants had a mean age of 42 (range 18-79) years; 54.5% (8430) were men. The mean follow-up duration was 6.05 (SD 3.78) years. A total of 373 (2.41%) participants showed progression to diabetes during the follow-up period. Then, we established a novel parameter (the FHTHWA index), to evaluate the incidence of diabetes in a population without diabetes, comprising 6 parameters based on the training set. After multivariable adjustment, individuals in tertile 3 had a significantly higher rate of diabetes compared with those in tertile 1 (hazard ratio 32.141, 95% CI 11.545-89.476). Time receiver operating characteristic curve analyses showed that the FHTHWA index had high accuracy, with the area under the curve value being around 0.9 during the more than 12 years of follow-up.Conclusions: This research successfully developed a diabetes risk assessment index tailored for the Japanese population by utilizing an extensive dataset and a wide range of indices. By categorizing the diabetes risk levels among Japanese individuals, this study offers a novel predictive tool for identifying potential patients, while also delivering valuable insights into diabetes prevention strategies for the healthy Japanese populace.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e64992"},"PeriodicalIF":3.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143069867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-22 DOI: 10.2196/54133

Doris Yang, Doudou Zhou, Steven Cai, Ziming Gan, Michael Pencina, Paul Avillach, Tianxi Cai, Chuan Hong

Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.

Objective: We propose SONAR (Semantic and Distribution-Based Harmonization) as a method for harmonizing variables across cohort studies to facilitate multicohort studies.

Methods: SONAR used semantic learning from variable descriptions and distribution learning from study participant data. Our method learned an embedding vector for each variable and used pairwise cosine similarity to score the similarity between variables. This approach was built off 3 National Institutes of Health cohorts, including the Cardiovascular Health Study, the Multi-Ethnic Study of Atherosclerosis, and the Women's Health Initiative. We also used gold standard labels to further refine the embeddings in a supervised manner.

Results: The method was evaluated using manually curated gold standard labels from the 3 National Institutes of Health cohorts. We evaluated both the intracohort and intercohort variable harmonization performance. The supervised SONAR method outperformed existing benchmark methods for almost all intracohort and intercohort comparisons using area under the curve and top-k accuracy metrics. Notably, SONAR was able to significantly improve harmonization of concepts that were difficult for existing semantic methods to harmonize.

Conclusions: SONAR achieves accurate variable harmonization within and between cohort studies by harnessing the complementary strengths of semantic learning and variable distribution learning.

{"title":"Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study.","authors":"Doris Yang, Doudou Zhou, Steven Cai, Ziming Gan, Michael Pencina, Paul Avillach, Tianxi Cai, Chuan Hong","doi":"10.2196/54133","DOIUrl":"10.2196/54133","url":null,"abstract":"Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.Objective: We propose SONAR (Semantic and Distribution-Based Harmonization) as a method for harmonizing variables across cohort studies to facilitate multicohort studies.Methods: SONAR used semantic learning from variable descriptions and distribution learning from study participant data. Our method learned an embedding vector for each variable and used pairwise cosine similarity to score the similarity between variables. This approach was built off 3 National Institutes of Health cohorts, including the Cardiovascular Health Study, the Multi-Ethnic Study of Atherosclerosis, and the Women's Health Initiative. We also used gold standard labels to further refine the embeddings in a supervised manner.Results: The method was evaluated using manually curated gold standard labels from the 3 National Institutes of Health cohorts. We evaluated both the intracohort and intercohort variable harmonization performance. The supervised SONAR method outperformed existing benchmark methods for almost all intracohort and intercohort comparisons using area under the curve and top-k accuracy metrics. Notably, SONAR was able to significantly improve harmonization of concepts that were difficult for existing semantic methods to harmonize.Conclusions: SONAR achieves accurate variable harmonization within and between cohort studies by harnessing the complementary strengths of semantic learning and variable distribution learning.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e54133"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11778729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-22 DOI: 10.2196/66444

Dongle Wei, Pan Gao, Yunkai Zhai

Background: Health data typically include patient-generated data and clinical medical data. Different types of data contribute to disease prevention, precision medicine, and the overall improvement of health care. With the introduction of regulations such as the Health Insurance Portability and Accountability Act (HIPAA), individuals play a key role in the sharing and application of personal health data.

Objective: This study aims to explore the impact of different types of health data on users' willingness to share. Additionally, it analyzes the effect of data control and delay discounting rate on this process.

Methods: The results of a web-based survey were analyzed to examine individuals' perceptions of sharing different types of health data and how data control and delay discounting rates influenced their decisions. We recruited participants for our study through the web-based platform "Wenjuanxing." After screening, we obtained 257 valid responses. Regression analysis was used to investigate the impact of data control, delayed discounting, and mental accounting on the public's willingness to share different types of health care data.

Results: Our findings indicate that the type of health data does not significantly affect the perceived benefits of data sharing. Instead, it negatively influences willingness to share by indirectly affecting data acquisition costs and perceived risks. Our results also show that data control reduces the perceived risks associated with sharing, while higher delay discounting rates lead to an overestimation of data acquisition costs and perceived risks.

Conclusions: Individuals' willingness to share data is primarily influenced by costs. To promote the acquisition and development of personal health data, stakeholders should strengthen individuals' control over their data or provide direct short-term incentives.

{"title":"The Impact of Data Control and Delayed Discounting on the Public's Willingness to Share Different Types of Health Care Data: Empirical Study.","authors":"Dongle Wei, Pan Gao, Yunkai Zhai","doi":"10.2196/66444","DOIUrl":"10.2196/66444","url":null,"abstract":"Background: Health data typically include patient-generated data and clinical medical data. Different types of data contribute to disease prevention, precision medicine, and the overall improvement of health care. With the introduction of regulations such as the Health Insurance Portability and Accountability Act (HIPAA), individuals play a key role in the sharing and application of personal health data.Objective: This study aims to explore the impact of different types of health data on users' willingness to share. Additionally, it analyzes the effect of data control and delay discounting rate on this process.Methods: The results of a web-based survey were analyzed to examine individuals' perceptions of sharing different types of health data and how data control and delay discounting rates influenced their decisions. We recruited participants for our study through the web-based platform \"Wenjuanxing.\" After screening, we obtained 257 valid responses. Regression analysis was used to investigate the impact of data control, delayed discounting, and mental accounting on the public's willingness to share different types of health care data.Results: Our findings indicate that the type of health data does not significantly affect the perceived benefits of data sharing. Instead, it negatively influences willingness to share by indirectly affecting data acquisition costs and perceived risks. Our results also show that data control reduces the perceived risks associated with sharing, while higher delay discounting rates lead to an overestimation of data acquisition costs and perceived risks.Conclusions: Individuals' willingness to share data is primarily influenced by costs. To promote the acquisition and development of personal health data, stakeholders should strengthen individuals' control over their data or provide direct short-term incentives.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66444"},"PeriodicalIF":3.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11778728/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics

Pub Date : 2025-01-21 DOI: 10.2196/65454

Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus

Background: Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable.

Objective: This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs).

Methods: Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into "mental health" or "physical health", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories.

Results: There was high agreement between the LLM and clinical experts when categorizing 4553 terms as "mental health" or "physical health" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70).

Conclusions: The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.

{"title":"Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.","authors":"Nicholas C Cardamone, Mark Olfson, Timothy Schmutte, Lyle Ungar, Tony Liu, Sara W Cullen, Nathaniel J Williams, Steven C Marcus","doi":"10.2196/65454","DOIUrl":"https://doi.org/10.2196/65454","url":null,"abstract":"Background: Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable.Objective: This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs).Methods: Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into \"mental health\" or \"physical health\", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories.Results: There was high agreement between the LLM and clinical experts when categorizing 4553 terms as \"mental health\" or \"physical health\" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70).Conclusions: The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65454"},"PeriodicalIF":3.1,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0