Background: The World Health Organization has set a global strategy to eliminate cervical cancer, emphasizing the need for cervical cancer screening coverage to reach 70%. In response, China has developed an action plan to accelerate the elimination of cervical cancer, with Hubei province implementing China's first provincial full-coverage screening program using an artificial intelligence (AI) and cloud-based diagnostic system.
Objective: This study aimed to evaluate the performance of AI technology in this full-coverage screening program. The evaluation indicators included accessibility, screening efficiency, diagnostic quality, and program cost.
Methods: Characteristics of 1,704,461 individuals screened from July 2022 to January 2023 were used to analyze accessibility and AI screening efficiency. A random sample of 220 individuals was used for external diagnostic quality control. The costs of different participating screening institutions were assessed.
Results: Cervical cancer screening services were extended to all administrative districts, especially in rural areas. Rural women had the highest participation rate at 67.54% (1,147,839/1,699,591). Approximately 1.7 million individuals were screened, achieving a cumulative coverage of 13.45% in about 6 months. Full-coverage programs could be achieved by AI technology in approximately 1 year, which was 87.5 times more efficient than the manual reading of slides. The sample compliance rate was as high as 99.1%, and compliance rates for positive, negative, and pathology biopsy reviews exceeded 96%. The cost of this program was CN ¥49 (the average exchange rate in 2022 is as follows: US $1=CN ¥6.7261) per person, with the primary screening institution and the third-party testing institute receiving CN ¥19 and ¥27, respectively.
Conclusions: AI-assisted diagnosis has proven to be accessible, efficient, reliable, and low cost, which could support the implementation of full-coverage screening programs, especially in areas with insufficient health resources. AI technology served as a crucial tool for rapidly and effectively increasing screening coverage, which would accelerate the achievement of the World Health Organization's goals of eliminating cervical cancer.
Background: Recent studies offer conflicting conclusions about the effectiveness of digital health interventions in changing physical activity behaviors. In addition, research focusing on digital health interventions for college students remains relatively scarce.
Objective: This study aims to examine the impact of digital health interventions on physical activity behaviors among college students, using objective measures as outcome indicators.
Methods: In accordance with the 2020 PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a comprehensive literature search was conducted across several databases, including MEDLINE (PubMed), Web of Science, Cochrane Library, and EBSCO (CINAHL Plus with full text), to identify relevant intervention studies published up to June 6, 2023. The inclusion criteria specified studies that examined the quantitative relationships between digital health interventions and physical activity among adults aged 18 years to 29 years, focusing on light physical activity (LPA), moderate to vigorous physical activity (MVPA), sedentary time (ST), or steps. Non-randomized controlled trials were excluded. The quality of the studies was assessed using the Cochrane Risk of Bias tool. Results were synthesized both narratively and quantitatively, where applicable. When sufficient homogeneity was found among studies, a random-effects model was used for meta-analysis to account for variability.
Results: In total, 8 studies, encompassing 569 participants, were included in the analysis. The primary outcomes measured were LPA, MVPA, ST, and steps. Among these studies, 3 reported on LPA, 5 on MVPA, 5 on ST, and 3 on steps. The meta-analysis revealed a significant increase in steps for the intervention group compared with the control group (standardized mean difference [SMD] 0.64, 95% CI 0.37-0.92; P<.001). However, no significant differences were observed between the intervention and control groups regarding LPA (SMD -0.08, 95% CI -0.32 to 0.16; P=.51), MVPA (SMD 0.02, 95% CI -0.19 to 0.22; P=.88), and ST (SMD 0.03, 95% CI -0.18 to 0.24; P=.78).
Conclusions: Digital health interventions are effective in increasing steps among college students; however, their effects on LPA, MVPA, and sedentary behavior are limited.
Trial registration: PROSPERO CRD42024533180; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=533180.
Background: The advancement of large language models (LLMs) offers significant opportunities for health care, particularly in the generation of medical documentation. However, challenges related to ensuring the accuracy and reliability of LLM outputs, coupled with the absence of established quality standards, have raised concerns about their clinical application.
Objective: This study aimed to develop and validate an evaluation framework for assessing the accuracy and clinical applicability of LLM-generated emergency department (ED) records, aiming to enhance artificial intelligence integration in health care documentation.
Methods: We organized the Healthcare Prompt-a-thon, a competitive event designed to explore the capabilities of LLMs in generating accurate medical records. The event involved 52 participants who generated 33 initial ED records using HyperCLOVA X, a Korean-specialized LLM. We applied a dual evaluation approach. First, clinical evaluation: 4 medical professionals evaluated the records using a 5-point Likert scale across 5 criteria-appropriateness, accuracy, structure/format, conciseness, and clinical validity. Second, quantitative evaluation: We developed a framework to categorize and count errors in the LLM outputs, identifying 7 key error types. Statistical methods, including Pearson correlation and intraclass correlation coefficients (ICC), were used to assess consistency and agreement among evaluators.
Results: The clinical evaluation demonstrated strong interrater reliability, with ICC values ranging from 0.653 to 0.887 (P<.001), and a test-retest reliability Pearson correlation coefficient of 0.776 (P<.001). Quantitative analysis revealed that invalid generation errors were the most common, constituting 35.38% of total errors, while structural malformation errors had the most significant negative impact on the clinical evaluation score (Pearson r=-0.654; P<.001). A strong negative correlation was found between the number of quantitative errors and clinical evaluation scores (Pearson r=-0.633; P<.001), indicating that higher error rates corresponded to lower clinical acceptability.
Conclusions: Our research provides robust support for the reliability and clinical acceptability of the proposed evaluation framework. It underscores the framework's potential to mitigate clinical burdens and foster the responsible integration of artificial intelligence technologies in health care, suggesting a promising direction for future research and practical applications in the field.
The COVID-19-Curated and Open Analysis and Research Platform (CO-CONNECT) project worked with 22 organizations across the United Kingdom to build a federated platform, enabling researchers to instantaneously and dynamically query federated datasets to find relevant data for their study. Finding relevant data takes time and effort, reducing the efficiency of research. Although data controllers could understand the value of such a system, there were significant challenges and delays in setting up the platform in response to COVID-19. This paper aims to present the challenges and lessons learned from the CO-CONNECT project to support other similar initiatives in the future. The project encountered many challenges, including the impacts of lockdowns on collaboration, understanding the new architecture, competing demands on people's time during a pandemic, data governance approvals, different levels of technical capabilities, data transformation to a common data model, access to granular-level laboratory data, and how to engage public and patient representatives meaningfully on a highly technical project. To overcome these challenges, we developed a range of methods to support data partners such as explainer videos; regular, short, "touch base" videoconference calls; drop-in workshops; live demos; and a standardized technical onboarding documentation pack. A 4-stage data governance process emerged. The patient and public representatives were fully integrated team members. Persistence, patience, and understanding were key. We make 8 recommendations to change the landscape for future similar initiatives. The new architecture and processes developed are being built upon for non-COVID-19-related data, providing an infrastructural legacy.
Background: Health information technologies, including electronic health records (EHRs), have revolutionized health care delivery. These technologies promise to enhance the efficiency and quality of care through improved patient health information management. Despite the transformative potential of EHRs, the extent to which patient access contributes to increased engagement with health care services within different clinical setting remains a distinct and underexplored facet.
Objective: This systematic review aims to investigate the impact of patient access to EHRs on health care engagement. Specifically, we seek to determine whether providing patients with access to their EHRs contributes to improved engagement with health care services.
Methods: A comprehensive systematic review search was conducted across various international databases, including Ovid MEDLINE, Embase, PsycINFO, and CINAHL, to identify relevant studies published from January 1, 2010, to November 15, 2023. The search on these databases was conducted using a combination of keywords and Medical Subject Heading terms related to patient access to electronic health records, patient engagement, and health care services. Studies were included if they assessed the impact of patient access to EHRs on health care engagement and provided evidence (quantitative or qualitative) for that. The guidelines of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 statement were followed for study selection, data extraction, and quality assessment. The included studies were assessed for quality using the Mixed Methods Appraisal Tool, and the results were reported using a narrative synthesis.
Results: The initial search from the databases yielded 1737 studies, to which, after scanning their reference lists, we added 10 studies. Of these 1747 studies, 18 (1.03%) met the inclusion criteria for the final review. The synthesized evidence from these studies revealed a positive relationship between patient access to EHRs and health care engagement, addressing 6 categories of health care engagement dimensions and outcomes, including treatment adherence and self-management, patient involvement and empowerment, health care communication and relationship, patient satisfaction and health outcomes, use of health care resources, and usability concerns and barriers.
Conclusions: The findings suggested a positive association between patient access to EHRs and health care engagement. The implications of these findings for health care providers, policy makers, and patients should be considered, highlighting the potential benefits and challenges associated with implementing and promoting patient access to EHRs. Further research directions have been proposed to deepen our understanding of this dynamic relationship.
Background: Social determinants of health (SDoH) such as housing insecurity are known to be intricately linked to patients' health status. More efficient methods for abstracting structured data on SDoH can help accelerate the inclusion of exposome variables in biomedical research and support health care systems in identifying patients who could benefit from proactive outreach. Large language models (LLMs) developed from Generative Pre-trained Transformers (GPTs) have shown potential for performing complex abstraction tasks on unstructured clinical notes.
Objective: Here, we assess the performance of GPTs on identifying temporal aspects of housing insecurity and compare results between both original and deidentified notes.
Methods: We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual abstraction, a named entity recognition model, and regular expressions.
Results: Compared with GPT-3.5 and the named entity recognition model, GPT-4 had the highest performance and had a much higher recall (0.924) than human abstractors (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human abstractors (0.971). GPT-4's precision improved slightly (0.936 original, 0.939 deidentified) on deidentified versions of the same notes, while recall dropped (0.781 original, 0.704 deidentified).
Conclusions: This work demonstrates that while manual abstraction is likely to yield slightly more accurate results overall, LLMs can provide a scalable, cost-effective solution with the advantage of greater recall. This could support semiautomated abstraction, but given the potential risk for harm, human review would be essential before using results for any patient engagement or care decisions. Furthermore, recall was lower when notes were deidentified prior to LLM abstraction.
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study aimed to explore the role of large language models (LLMs) in mitigating these biases through the use of the multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy compared with humans.
Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 (OpenAI) to facilitate interactions among different simulated agents to replicate clinical team dynamics. Each agent was assigned a distinct role: (1) making the final diagnosis after considering the discussions, (2) acting as a devil's advocate to correct confirmation and anchoring biases, (3) serving as a field expert in the required medical subspecialty, (4) facilitating discussions to mitigate premature closure bias, and (5) recording and summarizing findings. We tested varying combinations of these agents within the framework to determine which configuration yielded the highest rate of correct final diagnoses. Each scenario was repeated 5 times for consistency. The accuracy of the initial diagnoses and the final differential diagnoses were evaluated, and comparisons with human-generated answers were made using the Fisher exact test.
Results: A total of 240 responses were evaluated (3 different multi-agent frameworks). The initial diagnosis had an accuracy of 0% (0/80). However, following multi-agent discussions, the accuracy for the top 2 differential diagnoses increased to 76% (61/80) for the best-performing multi-agent framework (Framework 4-C). This was significantly higher compared with the accuracy achieved by human evaluators (odds ratio 3.49; P=.002).
Conclusions: The multi-agent framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. In addition, the LLM-driven, multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.