Background: Evaluating the accuracy and educational utility of artificial intelligence-generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored.
Objective: This study aimed to assess the educational utility of ChatGPT-4-generated clinical vignettes and their applicability in educational settings.
Methods: Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney U tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians' experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases.
Results: Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (P<.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations.
Conclusions: ChatGPT-4-generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4's value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.
Background: Learning and teaching interdisciplinary health data science (HDS) is highly challenging, and despite the growing interest in HDS education, little is known about the learning experiences and preferences of HDS students.
Objective: We conducted a systematic review to identify learning preferences and strategies in the HDS discipline.
Methods: We searched 10 bibliographic databases (PubMed, ACM Digital Library, Web of Science, Cochrane Library, Wiley Online Library, ScienceDirect, SpringerLink, EBSCOhost, ERIC, and IEEE Xplore) from the date of inception until June 2023. We followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and included primary studies written in English that investigated the learning preferences or strategies of students in HDS-related disciplines, such as bioinformatics, at any academic level. Risk of bias was independently assessed by 2 screeners using the Mixed Methods Appraisal Tool, and we used narrative data synthesis to present the study results.
Results: After abstract screening and full-text reviewing of the 849 papers retrieved from the databases, 8 (0.9%) studies, published between 2009 and 2021, were selected for narrative synthesis. The majority of these papers (7/8, 88%) investigated learning preferences, while only 1 (12%) paper studied learning strategies in HDS courses. The systematic review revealed that most HDS learners prefer visual presentations as their primary learning input. In terms of learning process and organization, they mostly tend to follow logical, linear, and sequential steps. Moreover, they focus more on abstract information, rather than detailed and concrete information. Regarding collaboration, HDS students sometimes prefer teamwork, and sometimes they prefer to work alone.
Conclusions: The studies' quality, assessed using the Mixed Methods Appraisal Tool, ranged between 73% and 100%, indicating excellent quality overall. However, the number of studies in this area is small, and the results of all studies are based on self-reported data. Therefore, more research needs to be conducted to provide insight into HDS education. We provide some suggestions, such as using learning analytics and educational data mining methods, for conducting future research to address gaps in the literature. We also discuss implications for HDS educators, and we make recommendations for HDS course design; for example, we recommend including visual materials, such as diagrams and videos, and offering step-by-step instructions for students.
Unlabelled: Virtual care appointments expanded rapidly during COVID-19 out of necessity and to enable access and continuity of care for many patients. While previous work has explored health care providers' experiences with telehealth usage on small-scale projects, the broad-level adoption of virtual care during the pandemic has expounded opportunities for a better understanding of how to enhance the integration of telehealth as a regular mode of health care services delivery. Training and education for health care providers on the effective use of virtual care technologies are factors that can help facilitate improved adoption and use. We describe our approach to designing and developing an accredited continuing professional development (CPD) program using e-learning technologies to foster better knowledge and comfort among health care providers with the use of virtual care technologies. First, we discuss our approach to undertaking a systematic needs assessment study using a survey questionnaire of providers, key informant interviews, and a patient focus group. Next, we describe our steps in consulting with key stakeholder groups in the health system and arranging committees to inform the design of the program and address accreditation requirements. The instructional design features and aspects of the e-learning module are then described in depth, and our plan for evaluating the program is shared as well. As a CPD modality, e-learning offers the opportunity to enhance access to timely continuing professional education for health care providers who may be geographically dispersed across rural and remote communities.
Background: ChatGPT showcases exceptional conversational capabilities and extensive cross-disciplinary knowledge. In addition, it can perform multiple roles in a single chat session. This unique multirole-playing feature positions ChatGPT as a promising tool for exploring interdisciplinary subjects.
Objective: The aim of this study was to evaluate ChatGPT's competency in addressing interdisciplinary inquiries based on a case study exploring the opportunities and challenges of chatbot uses in sports rehabilitation.
Methods: We developed a model termed PanelGPT to assess ChatGPT's competency in addressing interdisciplinary topics through simulated panel discussions. Taking chatbot uses in sports rehabilitation as an example of an interdisciplinary topic, we prompted ChatGPT through PanelGPT to role-play a physiotherapist, psychologist, nutritionist, artificial intelligence expert, and athlete in a simulated panel discussion. During the simulation, we posed questions to the panel while ChatGPT acted as both the panelists for responses and the moderator for steering the discussion. We performed the simulation using ChatGPT-4 and evaluated the responses by referring to the literature and our human expertise.
Results: By tackling questions related to chatbot uses in sports rehabilitation with respect to patient education, physiotherapy, physiology, nutrition, and ethical considerations, responses from the ChatGPT-simulated panel discussion reasonably pointed to various benefits such as 24/7 support, personalized advice, automated tracking, and reminders. ChatGPT also correctly emphasized the importance of patient education, and identified challenges such as limited interaction modes, inaccuracies in emotion-related advice, assurance of data privacy and security, transparency in data handling, and fairness in model training. It also stressed that chatbots are to assist as a copilot, not to replace human health care professionals in the rehabilitation process.
Conclusions: ChatGPT exhibits strong competency in addressing interdisciplinary inquiry by simulating multiple experts from complementary backgrounds, with significant implications in assisting medical education.
Background: Clinician educators are experts in procedural skills that students need to learn. Some clinician educators are interested in creating their own procedural videos but are typically not experts in video production, and there is limited information on this topic in the clinical education literature. Therefore, we present a tutorial for clinician educators to develop a procedural video.
Objective: We describe the steps needed to develop a medical procedural video from the perspective of a clinician educator new to creating videos, informed by best practices as evidenced by the literature. We also produce a checklist of elements that ensure a quality video. Finally, we identify the barriers and facilitators to making such a video.
Methods: We used the example of processing a piece of skeletal muscle in a pathology laboratory to make a video. We developed the video by dividing it into 3 phases: preproduction, production, and postproduction. After writing the learning outcomes, we created a storyboard and script, which were validated by subject matter and audiovisual experts. Photos and videos were captured on a digital camera mounted on a monopod. Video editing software was used to sequence the video clips and photos, insert text and audio narration, and generate closed captions. The finished video was uploaded to YouTube (Google) and then inserted into open-source authoring software to enable an interactive quiz.
Results: The final video was 4 minutes and 4 seconds long and took 70 hours to create. The final video included audio narration, closed captioning, bookmarks, and an interactive quiz. We identified that an effective video has six key factors: (1) clear learning outcomes, (2) being engaging, (3) being learner-centric, (4) incorporating principles of multimedia learning, (5) incorporating adult learning theories, and (6) being of high audiovisual quality. To ensure educational quality, we developed a checklist of elements that educators can use to develop a video. One of the barriers to creating procedural videos for a clinician educator who is new to making videos is the significant time commitment to build videography and editing skills. The facilitators for developing an online video include creating a community of practice and repeated skill-building rehearsals using simulations.
Conclusions: We outlined the steps in procedural video production and developed a checklist of quality elements. These steps and the checklist can guide a clinician educator in creating a quality video while recognizing the time, technical, and cognitive requirements.
Background: The COVID-19 pandemic underscored the necessity for innovative educational methods in nursing. Our study takes a unique approach using a multidisciplinary simulation design, which offers a systematic and comprehensive strategy for developing virtual reality (VR) simulations in nursing education.
Objective: The aim of this study is to develop VR simulation content for a pediatric nursing module based on a multidisciplinary simulation design and to evaluate its feasibility for nursing education.
Methods: This study used a 1-group, posttest-only design. VR content for pediatric nursing practice was developed by integrating the technological characteristics of a multimodal VR system with the learning elements of traditional nursing simulation, combining various disciplines, including education, engineering, and nursing. A user test was conducted with 12 nursing graduates (preservice nurses) followed by post hoc surveys (assessing presence, VR systems, VR sickness, and simulation satisfaction) and in-depth, one-on-one interviews.
Results: User tests showed mean scores of 4.01 (SD 1.43) for presence, 4.91 (SD 0.81) for the VR system, 0.64 (SD 0.35) for VR sickness, and 5.00 (SD 1.00) for simulation satisfaction. In-depth interviews revealed that the main strengths of the immersive VR simulation for pediatric pneumonia nursing were effective visualization and direct experience through hands-on manipulation; the drawback was keyword-based voice interaction. To improve VR simulation quality, participants suggested increasing the number of nursing techniques and refining them in more detail.
Conclusions: This VR simulation content for a pediatric nursing practice using a multidisciplinary educational design model was confirmed to have positive educational potential. Further research is needed to confirm the specific learning effects of immersive nursing content based on multidisciplinary design models.
Background: Teaching medical students the skills required to acquire, interpret, apply, and communicate clinical information is an integral part of medical education. A crucial aspect of this process involves providing students with feedback regarding the quality of their free-text clinical notes.
Objective: The goal of this study was to assess the ability of ChatGPT 3.5, a large language model, to score medical students' free-text history and physical notes.
Methods: This is a single-institution, retrospective study. Standardized patients learned a prespecified clinical case and, acting as the patient, interacted with medical students. Each student wrote a free-text history and physical note of their interaction. The students' notes were scored independently by the standardized patients and ChatGPT using a prespecified scoring rubric that consisted of 85 case elements. The measure of accuracy was percent correct.
Results: The study population consisted of 168 first-year medical students. There was a total of 14,280 scores. The ChatGPT incorrect scoring rate was 1.0%, and the standardized patient incorrect scoring rate was 7.2%. The ChatGPT error rate was 86%, lower than the standardized patient error rate. The ChatGPT mean incorrect scoring rate of 12 (SD 11) was significantly lower than the standardized patient mean incorrect scoring rate of 85 (SD 74; P=.002).
Conclusions: ChatGPT demonstrated a significantly lower error rate compared to standardized patients. This is the first study to assess the ability of a generative pretrained transformer (GPT) program to score medical students' standardized patient-based free-text clinical notes. It is expected that, in the near future, large language models will provide real-time feedback to practicing physicians regarding their free-text notes. GPT artificial intelligence programs represent an important advance in medical education and medical practice.