{"title":"ChatGPT 3.5 在急诊放射学中的作用,重点是心胸急诊:实例证明","authors":"Arosh S. Perera Molligoda Arachchige","doi":"10.1002/ird3.65","DOIUrl":null,"url":null,"abstract":"<p>As the authors of this commentary, we would like to clarify that the figures presented originated from ChatGPT 3.5. Unless specified otherwise, in all figures, questions were provided as input through its user interface and the responses generated have been illustrated in a distinct font. The human authors subsequently undertook the editing process where we edited the ChatGPT 3.5 generated responses for better clarity (in terms of text organization) [<span>1-3</span>].</p><p>ChatGPT 3.5, created by OpenAI in San Francisco, is an advanced artificial intelligence conversational tool. Operating as a large language model (LLM), it can engage in conversations across more than 90 languages. Developed through deep-learning techniques utilizing multilayered recurrent feedforward neural networks, the model has undergone training on an extensive dataset with over 175 billion parameters. This dataset comprises information from diverse internet sources, including websites, articles, fiction, and books, collected until September 2021. The architecture of ChatGPT 3.5 is based on transformers, allowing it to simultaneously process a vast amount of data. This design enables the model to grasp the context and relationships between words in input sequences, facilitating the generation of coherent and relevant responses. Notably, ChatGPT 3.5 can comprehend questions and furnish persuasive, grammatically correct answers. Moreover, it has the capability to generate code, stories, poetry, scientific abstracts, and various other types of content in different styles. It is crucial to emphasize that ChatGPT 3.5 does not merely replicate stored information. Instead, it generates the most probable next word based on probabilities acquired through reinforcement learning during its training process [<span>4-6</span>].</p><p>ChatGPT 3.5 has the potential to greatly assist radiologists in image analysis and interpretation, leveraging its deep-learning capabilities to scrutinize extensive imaging data. By presenting alternative perspectives and highlighting potential areas of concern, ChatGPT 3.5 can enhance diagnostic accuracy and efficiency [<span>4, 5</span>]. Furthermore, the tool can optimize workflow in radiology departments by automating repetitive tasks, such as report generation, leading to time savings for radiologists, being crucial in particular for emergency radiologists.</p><p>Indeed, recently a course called “The Radiological Society of North America (RSNA) Emergency Imaging AI Certificate” has been introduced by the RSNA, which signifies the importance of AI technologies including LLMs in emergency settings. Thus, we decided to explore the role that ChatGPT 3.5 can play in a specific setting of radiological emergencies, in particular in the setting of imaging of cardiothoracic emergencies [<span>7, 8</span>].</p><p>As shown in Figure 1, we first inquired ChatGPT 3.5 regarding the radiation dose in a diagnostic coronary angiogram providing also patient specific data including weight, age, and sex. To our knowledge, the typical effective dose value for a coronary angiogram is between 5 and 10 mSv [<span>9</span>]. This is one instance, which highlights that the results given by the model must be considered with caution. However, it acknowledges that the value may differ due to modern techniques available for reducing radiation exposure. Furthermore, ChatGPT 3.5 also informs the technician or radiologist about the requirement for additional data, such as number of projections, fluoroscopy time, and other technical parameters, to enhance the accuracy of the dose estimate. Future versions may incorporate weight-based data as well and up-to-date data potentially making it relatively accurate.</p><p>ChatGPT 3.5 can serve as a valuable resource by rapidly offering age-specific normal values for emergency radiologists and trainees. These resources can be conveniently accessed on the go, facilitating image interpretation and professional development.</p><p>Therefore, we inquired from ChatGPT 3.5 the normal dimensions of the great vessels of the heart; see Figure 2. We noted that the values reported by the Chun et al. [<span>10</span>] accurately fit into these ranges. Age and weight-based normal values in emergency radiology are crucial, considering that not all of these values can be easily memorized. ChatGPT 3.5 provides a convenient and efficient means of accessing this information, offering a more streamlined alternative to internet searches. However, it is important to note that the reliability of its reference sources needs clarification, a feature currently unavailable in the current version of ChatGPT 3.5. In particular, we do not know whether ChatGPT 3.5 preferentially generates its texts from peer-reviewed sources or not.</p><p>Another valuable application of ChatGPT 3.5 is in the creation of pre-procedure checklists, aiming to prevent mishaps during emergent radiological procedures; see Figure 3. This tool proves particularly beneficial for emergency radiology trainees or fellows [<span>7</span>], contributing to enhanced procedural accuracy and safety.</p><p>It is evident that ChatGPT 3.5 offers a comprehensive step-by-step to-do list for each procedure, serving as a preventive measure against negligence. However, it is crucial that these checklists undergo verification for accuracy by a trained emergency radiologist before being implemented for clinical use [<span>7</span>].</p><p>Currently, ChatGPT 3.5 serves the additional function of aiding in report generation. This involves streamlining the input process to a few \"keywords of the diagnosis\" and patient clinical information. Subsequently, ChatGPT 3.5 can compose an entire report, which the radiologist can then review for accuracy and finalize after necessary editing; see Figure 4. We observe that while the report may not be flawless, it serves well as a preliminary draft that can be refined by the radiologist, potentially reducing the time required to verbally dictate or manually type the entire report. This streamlined process is anticipated to reduce turnaround time and alleviate workload-related burnout, both of which are crucial aspects for emergency radiology [<span>8, 11</span>].</p><p>Despite the advantages, we acknowledge that there are debates about the usefulness of the generated reports, especially considering the existence of current radiology dictation software that already uses voice commands for rapid template-based reporting. Additionally, caution is advised as ChatGPT 3.5 has been shown to invent findings not present in the user prompt, introducing a potential risk [<span>11</span>]. Another concern lies in the possible incompleteness of ChatGPT 3.5's report generation, as it may omit essential details. Enhancing completeness and readability can be achieved by employing a refined prompt with explicit instructions on preserving key information, based on the observed outcomes.</p><p>Furthermore, one of the major limitations of ChatGPT 3.5 is that it is not able to read medical images. At the time of the current writing, its upgraded paid version GPT 4 is acknowledged to possess the capability to convert images to text. Looking ahead, the future holds the possibility of training GPT 4 on extensive datasets comprising radiological images and associated clinical data. We believe that this approach could further benefit from narrowing down of the datasets needed by focusing on a subspecialty like emergency radiology. This training could enhance ChatGPT 3.5's ability to assist emergency radiologists in formulating more precise and well-informed diagnoses. By integrating patient information and medical history, ChatGPT 3.5 contributes valuable insights, aiding in the formulation of comprehensive and precise differential diagnoses. As a second reader, ChatGPT 3.5 enhances quality assurance by detecting errors or oversights in radiology reports with real-time feedback during the imaging process. Its ability to cross-reference vast amounts of data helps identify inconsistencies, minimizing the risk of misdiagnosis and contributing to improved patient safety. What could be more helpful than this for emergency radiologists?</p><p>Nevertheless, to gain regulatory approval, demonstrating the safety and effectiveness of such algorithms will hinge on their intended use, considering associated risks and benefits. For example, there is potential for overreliance on this technology, which may diminish the interpretive abilities of experienced radiologists. To avoid this, it is imperative that the ChatGPT 3.5 generated results are made available only after a first level confirmation by the emergency radiologist [<span>12, 13</span>].</p><p>We see that in Figure 5, ChatGPT 3.5 was able to accurately mention all five imaging goals in aortic dissection, which include identification of the site of intimal tear, extent of dissection (for classification), cardiac involvement (pericardial, myocardial, and valvular), aortic rupture, and major branch vessel involvement [<span>10</span>]. The same is true for Figure 6, and in addition, it demonstrates its ability to rapidly provide the risks versus benefits of diagnostic procedures as well as what to expect after. This may indicate the potential for future residents and emergency radiology fellows to be educated using the tool.</p><p>Next, ChatGPT 3.5 has the capability to aid emergency radiologists and trainees in determining the optimal MRI techniques for each diagnosis, potentially minimizing unnecessary anesthesia time, reducing patient noncompliance associated to lengthy acquisitions, and ultimately enhancing the overall workflow. This assistance encompasses selecting the suitable pulse sequence, optimizing the field of view, and deciding on the necessity of contrast for obtaining essential information; see Figure 7.</p><p>We observe that ChatGPT 3.5 not only provides the necessary sequences for imaging of pericardial effusion but also emphasizes that protocols may differ based on individual hospital guidelines. Over time, each hospital has the potential to develop its own customized version of ChatGPT 3.5, tailored to provide information specific to its radiology department. A recent study confirmed our opinion by showing that GPT 4 was indeed able to offer decision support in selecting imaging examinations and generating radiology referrals in the emergency department and demonstrated robust alignment with the American College of Radiology (ACR) guidelines, with 95% concordance with experts in imaging recommendations [<span>4</span>]. Additionally, it aids emergency radiologists and trainees by offering insights into specific radiological features to observe in particular medical conditions. This dual functionality contributes significantly to both procedural accuracy and diagnostic proficiency.</p><p>As shown in Figures 8 and 9, while the information provided by ChatGPT 3.5 may not be as precise as that found in established emergency radiology textbooks, there is potential for improvement in future versions by incorporating quoted references.</p><p>Beyond diagnostics, ChatGPT 3.5 can assist radiologists in making evidence-based treatment decisions by analyzing medical literature and clinical guidelines. Although limitations exist, including the generation of fake references/sources, future updates could potentially overcome these challenges [<span>11, 12</span>].</p><p>ChatGPT 3.5's integration of machine-learning algorithms allows it to adapt to emerging radiology research, keeping emergency radiologists informed about the latest advancements in the field.</p><p>Having demonstrated success in passing radiology board exams, ChatGPT 3.5 can also serve as an educational resource. It provides access to up-to-date information, case studies, and reference materials, fostering collaborative learning and discussions among radiologists. Thus, we decided to provide ChatGPT 3.5 some emergency radiology multiple-choice questions (MCQs); see Figure 10.</p><p>In our case, a notable limitation was the absence of image-based questions, hindering the real-world applicability and supporting the view that passing an exam without image analysis may not be sufficient for actual radiology practice. To address this, future investigations should employ a multimodal approach, combining image analysis with text interpretation for a more comprehensive evaluation, and the tool could be tested on subspecialty board exams like the European Diploma in Emergency Radiology. Already, new LLMs are underway, which will have access to this feature (image analysis). For example, Google's Med-PaLM 2 is designed to enhance radiology image analysis capabilities by leveraging a Language Model for Medical Imaging (LLM). Unlike conventional AI systems that analyze images and provide conclusions without transparent reasoning, Med-PaLM 2 combines image analysis with question answering capabilities. The key feature is that this model enables a two-way interaction between physicians and the AI. In a typical scenario, a doctor can provide an image to the LLM, and the model generates a detailed report. The unique aspect is that the physician can then engage in follow-up questions, effectively turning the AI system into an ongoing dialog rather than a black box. For example, a doctor can question the model about its initial assessment, seek clarifications, or ask for additional insights. This feature enhances the interpretability of the AI system, allowing healthcare professionals to better understand the reasoning behind the model's conclusions. The potential impact of this capability is significant in the field of radiology. Physicians can potentially use the LLM not just for its diagnostic capabilities but also as a collaborative tool in analyzing complex medical images. This real-time interaction could lead to more accurate diagnoses, better-informed treatment decisions, and overall improvements in the efficiency of healthcare workflows [<span>14</span>].</p><p>Another important concern is that ChatGPT 3.5 consistently uses confident language, even when providing incorrect answers, posing a potential risk, especially in health care. This was the case in 2 out of 5 MCQs we provided (correct answer for question 3 is C while for 4 is B) [<span>15, 16</span>]. Adopting more nuanced language that reflects the degree of confidence would be safer. Additionally, considering that ChatGPT 3.5's confidence may change with follow-up questions like \"are you sure?\" undermines its initial confidence [<span>17</span>]. Examining ChatGPT 3.5's contextual understanding, especially in questions involving patient history and complex symptoms, is crucial for assessing its practical utility in real-world radiology practice. Addressing these concerns is essential for a better understanding of ChatGPT 3.5's capabilities and limitations in emergency radiology.</p><p>We believe that continuous learning from new data and feedback allows ChatGPT 3.5 to improve its performance and expand its knowledge base over time. Nevertheless, we acknowledge the potential for ChatGPT 3.5 to transform radiology training with improvements, in general, not being limited to emergency radiology. As more radiological cases are analyzed and incorporated into its training, its integration into teaching curriculums and assistance to residents hinges on adequate enhancement of the tool. This ongoing evolution would prompt emergency radiology training and fellowship programs to rethink their education approaches for residents in the future.</p><p>Though we could not focus on all types of emergencies in this commentary, we believe that these examples gave an overall idea of the usefulness of AI chatbots in emergency radiology. However, there are limitations to its adoption in real-world clinical practice that need careful consideration and need to be addressed.</p><p>At the moment, there exist few studies comparing LLMs at subspecialty level, in particular in emergency radiology. According to a recent study by Barash et al., OpenAI's ChatGPT 4 showcased promise in the realm of imaging referrals by providing imaging recommendations and generating radiology referrals based on clinical notes from emergency department cases. The chatbot demonstrated a 95% accuracy rate in responding appropriately, evaluated against the ACR Appropriateness Criteria. Experts believe that, with proper training, large language models like ChatGPT could help address inappropriate imaging referrals, a common issue in emergency departments. The study highlighted the importance of high-quality referral notes in influencing the accuracy of interpretation, clinical relevance of radiology reports, and radiologists' confidence. While ChatGPT 4 aligned well with ACR AC in imaging recommendations and excelled in suggesting protocols, it faced challenges in excluding time frames in its recommendations in 35% of cases, despite access to such information. The authors expressed optimism about the chatbot's potential to enhance clinical efficiency but underscored the need to be mindful of challenges and risks associated with adopting such technology in radiology practice [<span>18</span>].</p><p>In another study by Infante et al., the performances of different language models (LLMs) in emergency radiology was compared. The study showed that ChatGPT consistently outperforms others across subspecialties, while Bard exhibits lower performance except in negative predictive value. Perplexity falls in between. In future, if LLMs prove to have real-word utility in the imaging department, more of such comparative studies may be expected [23].</p><p>One primary concern is the potential excessive dependence on technology, revolving around the diminishing clinical expertise and interpretative skills of experienced radiologists. If AI becomes the sole decision-maker in diagnosis and treatment, there is a risk of jeopardizing essential human oversight in patient care, potentially leading to troublesome consequences, in particular more probable in emergency radiology units where high-risk patients are very common. It is important to acknowledge that ChatGPT 3.5 is an AI tool and should not be considered a substitute for the expertise of a trained emergency radiologist. While ChatGPT 3.5 can provide valuable insights and assistance, the final interpretation and diagnosis should always be made by the emergency radiologist. The accuracy of ChatGPT 3.5 is heavily dependent on the quality and quantity of the data used to train the model, posing a risk of bias and inaccuracy in the generated responses. Ensuring that the training data is representative of the population being studied and consistently updating and refining the model is essential to enhance its accuracy. Another limitation of ChatGPT 3.5 is its potential for misinterpreting natural language queries. Referring emergency room physicians may not always use medical terminology accurately or provide all relevant information, leading to incorrect or incomplete responses from the tool. Additionally, concerns related to privacy and data security, as seen with the temporary ban imposed in Italy, underscore the need to establish robust measures for securely storing and protecting patient data when utilizing ChatGPT 3.5 or any other AI tool in radiology [<span>19, 20</span>].</p><p>In conclusion, the role of ChatGPT 3.5 in emergency radiology is an evolving area with significant potential to positively impact patient care and the radiology profession itself. However, careful consideration of its limitations and ethical considerations is essential for responsible and effective integration into emergency units.</p><p>This is a single author manuscript.</p><p>The author declares no conflict of interest.</p><p>Not applicable.</p><p>Not applicable.</p>","PeriodicalId":73508,"journal":{"name":"iRadiology","volume":"2 5","pages":"510-521"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.65","citationCount":"0","resultStr":"{\"title\":\"Role of ChatGPT 3.5 in emergency radiology, with a focus on cardiothoracic emergencies: Proof with examples\",\"authors\":\"Arosh S. Perera Molligoda Arachchige\",\"doi\":\"10.1002/ird3.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>As the authors of this commentary, we would like to clarify that the figures presented originated from ChatGPT 3.5. Unless specified otherwise, in all figures, questions were provided as input through its user interface and the responses generated have been illustrated in a distinct font. The human authors subsequently undertook the editing process where we edited the ChatGPT 3.5 generated responses for better clarity (in terms of text organization) [<span>1-3</span>].</p><p>ChatGPT 3.5, created by OpenAI in San Francisco, is an advanced artificial intelligence conversational tool. Operating as a large language model (LLM), it can engage in conversations across more than 90 languages. Developed through deep-learning techniques utilizing multilayered recurrent feedforward neural networks, the model has undergone training on an extensive dataset with over 175 billion parameters. This dataset comprises information from diverse internet sources, including websites, articles, fiction, and books, collected until September 2021. The architecture of ChatGPT 3.5 is based on transformers, allowing it to simultaneously process a vast amount of data. This design enables the model to grasp the context and relationships between words in input sequences, facilitating the generation of coherent and relevant responses. Notably, ChatGPT 3.5 can comprehend questions and furnish persuasive, grammatically correct answers. Moreover, it has the capability to generate code, stories, poetry, scientific abstracts, and various other types of content in different styles. It is crucial to emphasize that ChatGPT 3.5 does not merely replicate stored information. Instead, it generates the most probable next word based on probabilities acquired through reinforcement learning during its training process [<span>4-6</span>].</p><p>ChatGPT 3.5 has the potential to greatly assist radiologists in image analysis and interpretation, leveraging its deep-learning capabilities to scrutinize extensive imaging data. By presenting alternative perspectives and highlighting potential areas of concern, ChatGPT 3.5 can enhance diagnostic accuracy and efficiency [<span>4, 5</span>]. Furthermore, the tool can optimize workflow in radiology departments by automating repetitive tasks, such as report generation, leading to time savings for radiologists, being crucial in particular for emergency radiologists.</p><p>Indeed, recently a course called “The Radiological Society of North America (RSNA) Emergency Imaging AI Certificate” has been introduced by the RSNA, which signifies the importance of AI technologies including LLMs in emergency settings. Thus, we decided to explore the role that ChatGPT 3.5 can play in a specific setting of radiological emergencies, in particular in the setting of imaging of cardiothoracic emergencies [<span>7, 8</span>].</p><p>As shown in Figure 1, we first inquired ChatGPT 3.5 regarding the radiation dose in a diagnostic coronary angiogram providing also patient specific data including weight, age, and sex. To our knowledge, the typical effective dose value for a coronary angiogram is between 5 and 10 mSv [<span>9</span>]. This is one instance, which highlights that the results given by the model must be considered with caution. However, it acknowledges that the value may differ due to modern techniques available for reducing radiation exposure. Furthermore, ChatGPT 3.5 also informs the technician or radiologist about the requirement for additional data, such as number of projections, fluoroscopy time, and other technical parameters, to enhance the accuracy of the dose estimate. Future versions may incorporate weight-based data as well and up-to-date data potentially making it relatively accurate.</p><p>ChatGPT 3.5 can serve as a valuable resource by rapidly offering age-specific normal values for emergency radiologists and trainees. These resources can be conveniently accessed on the go, facilitating image interpretation and professional development.</p><p>Therefore, we inquired from ChatGPT 3.5 the normal dimensions of the great vessels of the heart; see Figure 2. We noted that the values reported by the Chun et al. [<span>10</span>] accurately fit into these ranges. Age and weight-based normal values in emergency radiology are crucial, considering that not all of these values can be easily memorized. ChatGPT 3.5 provides a convenient and efficient means of accessing this information, offering a more streamlined alternative to internet searches. However, it is important to note that the reliability of its reference sources needs clarification, a feature currently unavailable in the current version of ChatGPT 3.5. In particular, we do not know whether ChatGPT 3.5 preferentially generates its texts from peer-reviewed sources or not.</p><p>Another valuable application of ChatGPT 3.5 is in the creation of pre-procedure checklists, aiming to prevent mishaps during emergent radiological procedures; see Figure 3. This tool proves particularly beneficial for emergency radiology trainees or fellows [<span>7</span>], contributing to enhanced procedural accuracy and safety.</p><p>It is evident that ChatGPT 3.5 offers a comprehensive step-by-step to-do list for each procedure, serving as a preventive measure against negligence. However, it is crucial that these checklists undergo verification for accuracy by a trained emergency radiologist before being implemented for clinical use [<span>7</span>].</p><p>Currently, ChatGPT 3.5 serves the additional function of aiding in report generation. This involves streamlining the input process to a few \\\"keywords of the diagnosis\\\" and patient clinical information. Subsequently, ChatGPT 3.5 can compose an entire report, which the radiologist can then review for accuracy and finalize after necessary editing; see Figure 4. We observe that while the report may not be flawless, it serves well as a preliminary draft that can be refined by the radiologist, potentially reducing the time required to verbally dictate or manually type the entire report. This streamlined process is anticipated to reduce turnaround time and alleviate workload-related burnout, both of which are crucial aspects for emergency radiology [<span>8, 11</span>].</p><p>Despite the advantages, we acknowledge that there are debates about the usefulness of the generated reports, especially considering the existence of current radiology dictation software that already uses voice commands for rapid template-based reporting. Additionally, caution is advised as ChatGPT 3.5 has been shown to invent findings not present in the user prompt, introducing a potential risk [<span>11</span>]. Another concern lies in the possible incompleteness of ChatGPT 3.5's report generation, as it may omit essential details. Enhancing completeness and readability can be achieved by employing a refined prompt with explicit instructions on preserving key information, based on the observed outcomes.</p><p>Furthermore, one of the major limitations of ChatGPT 3.5 is that it is not able to read medical images. At the time of the current writing, its upgraded paid version GPT 4 is acknowledged to possess the capability to convert images to text. Looking ahead, the future holds the possibility of training GPT 4 on extensive datasets comprising radiological images and associated clinical data. We believe that this approach could further benefit from narrowing down of the datasets needed by focusing on a subspecialty like emergency radiology. This training could enhance ChatGPT 3.5's ability to assist emergency radiologists in formulating more precise and well-informed diagnoses. By integrating patient information and medical history, ChatGPT 3.5 contributes valuable insights, aiding in the formulation of comprehensive and precise differential diagnoses. As a second reader, ChatGPT 3.5 enhances quality assurance by detecting errors or oversights in radiology reports with real-time feedback during the imaging process. Its ability to cross-reference vast amounts of data helps identify inconsistencies, minimizing the risk of misdiagnosis and contributing to improved patient safety. What could be more helpful than this for emergency radiologists?</p><p>Nevertheless, to gain regulatory approval, demonstrating the safety and effectiveness of such algorithms will hinge on their intended use, considering associated risks and benefits. For example, there is potential for overreliance on this technology, which may diminish the interpretive abilities of experienced radiologists. To avoid this, it is imperative that the ChatGPT 3.5 generated results are made available only after a first level confirmation by the emergency radiologist [<span>12, 13</span>].</p><p>We see that in Figure 5, ChatGPT 3.5 was able to accurately mention all five imaging goals in aortic dissection, which include identification of the site of intimal tear, extent of dissection (for classification), cardiac involvement (pericardial, myocardial, and valvular), aortic rupture, and major branch vessel involvement [<span>10</span>]. The same is true for Figure 6, and in addition, it demonstrates its ability to rapidly provide the risks versus benefits of diagnostic procedures as well as what to expect after. This may indicate the potential for future residents and emergency radiology fellows to be educated using the tool.</p><p>Next, ChatGPT 3.5 has the capability to aid emergency radiologists and trainees in determining the optimal MRI techniques for each diagnosis, potentially minimizing unnecessary anesthesia time, reducing patient noncompliance associated to lengthy acquisitions, and ultimately enhancing the overall workflow. This assistance encompasses selecting the suitable pulse sequence, optimizing the field of view, and deciding on the necessity of contrast for obtaining essential information; see Figure 7.</p><p>We observe that ChatGPT 3.5 not only provides the necessary sequences for imaging of pericardial effusion but also emphasizes that protocols may differ based on individual hospital guidelines. Over time, each hospital has the potential to develop its own customized version of ChatGPT 3.5, tailored to provide information specific to its radiology department. A recent study confirmed our opinion by showing that GPT 4 was indeed able to offer decision support in selecting imaging examinations and generating radiology referrals in the emergency department and demonstrated robust alignment with the American College of Radiology (ACR) guidelines, with 95% concordance with experts in imaging recommendations [<span>4</span>]. Additionally, it aids emergency radiologists and trainees by offering insights into specific radiological features to observe in particular medical conditions. This dual functionality contributes significantly to both procedural accuracy and diagnostic proficiency.</p><p>As shown in Figures 8 and 9, while the information provided by ChatGPT 3.5 may not be as precise as that found in established emergency radiology textbooks, there is potential for improvement in future versions by incorporating quoted references.</p><p>Beyond diagnostics, ChatGPT 3.5 can assist radiologists in making evidence-based treatment decisions by analyzing medical literature and clinical guidelines. Although limitations exist, including the generation of fake references/sources, future updates could potentially overcome these challenges [<span>11, 12</span>].</p><p>ChatGPT 3.5's integration of machine-learning algorithms allows it to adapt to emerging radiology research, keeping emergency radiologists informed about the latest advancements in the field.</p><p>Having demonstrated success in passing radiology board exams, ChatGPT 3.5 can also serve as an educational resource. It provides access to up-to-date information, case studies, and reference materials, fostering collaborative learning and discussions among radiologists. Thus, we decided to provide ChatGPT 3.5 some emergency radiology multiple-choice questions (MCQs); see Figure 10.</p><p>In our case, a notable limitation was the absence of image-based questions, hindering the real-world applicability and supporting the view that passing an exam without image analysis may not be sufficient for actual radiology practice. To address this, future investigations should employ a multimodal approach, combining image analysis with text interpretation for a more comprehensive evaluation, and the tool could be tested on subspecialty board exams like the European Diploma in Emergency Radiology. Already, new LLMs are underway, which will have access to this feature (image analysis). For example, Google's Med-PaLM 2 is designed to enhance radiology image analysis capabilities by leveraging a Language Model for Medical Imaging (LLM). Unlike conventional AI systems that analyze images and provide conclusions without transparent reasoning, Med-PaLM 2 combines image analysis with question answering capabilities. The key feature is that this model enables a two-way interaction between physicians and the AI. In a typical scenario, a doctor can provide an image to the LLM, and the model generates a detailed report. The unique aspect is that the physician can then engage in follow-up questions, effectively turning the AI system into an ongoing dialog rather than a black box. For example, a doctor can question the model about its initial assessment, seek clarifications, or ask for additional insights. This feature enhances the interpretability of the AI system, allowing healthcare professionals to better understand the reasoning behind the model's conclusions. The potential impact of this capability is significant in the field of radiology. Physicians can potentially use the LLM not just for its diagnostic capabilities but also as a collaborative tool in analyzing complex medical images. This real-time interaction could lead to more accurate diagnoses, better-informed treatment decisions, and overall improvements in the efficiency of healthcare workflows [<span>14</span>].</p><p>Another important concern is that ChatGPT 3.5 consistently uses confident language, even when providing incorrect answers, posing a potential risk, especially in health care. This was the case in 2 out of 5 MCQs we provided (correct answer for question 3 is C while for 4 is B) [<span>15, 16</span>]. Adopting more nuanced language that reflects the degree of confidence would be safer. Additionally, considering that ChatGPT 3.5's confidence may change with follow-up questions like \\\"are you sure?\\\" undermines its initial confidence [<span>17</span>]. Examining ChatGPT 3.5's contextual understanding, especially in questions involving patient history and complex symptoms, is crucial for assessing its practical utility in real-world radiology practice. Addressing these concerns is essential for a better understanding of ChatGPT 3.5's capabilities and limitations in emergency radiology.</p><p>We believe that continuous learning from new data and feedback allows ChatGPT 3.5 to improve its performance and expand its knowledge base over time. Nevertheless, we acknowledge the potential for ChatGPT 3.5 to transform radiology training with improvements, in general, not being limited to emergency radiology. As more radiological cases are analyzed and incorporated into its training, its integration into teaching curriculums and assistance to residents hinges on adequate enhancement of the tool. This ongoing evolution would prompt emergency radiology training and fellowship programs to rethink their education approaches for residents in the future.</p><p>Though we could not focus on all types of emergencies in this commentary, we believe that these examples gave an overall idea of the usefulness of AI chatbots in emergency radiology. However, there are limitations to its adoption in real-world clinical practice that need careful consideration and need to be addressed.</p><p>At the moment, there exist few studies comparing LLMs at subspecialty level, in particular in emergency radiology. According to a recent study by Barash et al., OpenAI's ChatGPT 4 showcased promise in the realm of imaging referrals by providing imaging recommendations and generating radiology referrals based on clinical notes from emergency department cases. The chatbot demonstrated a 95% accuracy rate in responding appropriately, evaluated against the ACR Appropriateness Criteria. Experts believe that, with proper training, large language models like ChatGPT could help address inappropriate imaging referrals, a common issue in emergency departments. The study highlighted the importance of high-quality referral notes in influencing the accuracy of interpretation, clinical relevance of radiology reports, and radiologists' confidence. While ChatGPT 4 aligned well with ACR AC in imaging recommendations and excelled in suggesting protocols, it faced challenges in excluding time frames in its recommendations in 35% of cases, despite access to such information. The authors expressed optimism about the chatbot's potential to enhance clinical efficiency but underscored the need to be mindful of challenges and risks associated with adopting such technology in radiology practice [<span>18</span>].</p><p>In another study by Infante et al., the performances of different language models (LLMs) in emergency radiology was compared. The study showed that ChatGPT consistently outperforms others across subspecialties, while Bard exhibits lower performance except in negative predictive value. Perplexity falls in between. In future, if LLMs prove to have real-word utility in the imaging department, more of such comparative studies may be expected [23].</p><p>One primary concern is the potential excessive dependence on technology, revolving around the diminishing clinical expertise and interpretative skills of experienced radiologists. If AI becomes the sole decision-maker in diagnosis and treatment, there is a risk of jeopardizing essential human oversight in patient care, potentially leading to troublesome consequences, in particular more probable in emergency radiology units where high-risk patients are very common. It is important to acknowledge that ChatGPT 3.5 is an AI tool and should not be considered a substitute for the expertise of a trained emergency radiologist. While ChatGPT 3.5 can provide valuable insights and assistance, the final interpretation and diagnosis should always be made by the emergency radiologist. The accuracy of ChatGPT 3.5 is heavily dependent on the quality and quantity of the data used to train the model, posing a risk of bias and inaccuracy in the generated responses. Ensuring that the training data is representative of the population being studied and consistently updating and refining the model is essential to enhance its accuracy. Another limitation of ChatGPT 3.5 is its potential for misinterpreting natural language queries. Referring emergency room physicians may not always use medical terminology accurately or provide all relevant information, leading to incorrect or incomplete responses from the tool. Additionally, concerns related to privacy and data security, as seen with the temporary ban imposed in Italy, underscore the need to establish robust measures for securely storing and protecting patient data when utilizing ChatGPT 3.5 or any other AI tool in radiology [<span>19, 20</span>].</p><p>In conclusion, the role of ChatGPT 3.5 in emergency radiology is an evolving area with significant potential to positively impact patient care and the radiology profession itself. However, careful consideration of its limitations and ethical considerations is essential for responsible and effective integration into emergency units.</p><p>This is a single author manuscript.</p><p>The author declares no conflict of interest.</p><p>Not applicable.</p><p>Not applicable.</p>\",\"PeriodicalId\":73508,\"journal\":{\"name\":\"iRadiology\",\"volume\":\"2 5\",\"pages\":\"510-521\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.65\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"iRadiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ird3.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"iRadiology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ird3.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Role of ChatGPT 3.5 in emergency radiology, with a focus on cardiothoracic emergencies: Proof with examples
As the authors of this commentary, we would like to clarify that the figures presented originated from ChatGPT 3.5. Unless specified otherwise, in all figures, questions were provided as input through its user interface and the responses generated have been illustrated in a distinct font. The human authors subsequently undertook the editing process where we edited the ChatGPT 3.5 generated responses for better clarity (in terms of text organization) [1-3].
ChatGPT 3.5, created by OpenAI in San Francisco, is an advanced artificial intelligence conversational tool. Operating as a large language model (LLM), it can engage in conversations across more than 90 languages. Developed through deep-learning techniques utilizing multilayered recurrent feedforward neural networks, the model has undergone training on an extensive dataset with over 175 billion parameters. This dataset comprises information from diverse internet sources, including websites, articles, fiction, and books, collected until September 2021. The architecture of ChatGPT 3.5 is based on transformers, allowing it to simultaneously process a vast amount of data. This design enables the model to grasp the context and relationships between words in input sequences, facilitating the generation of coherent and relevant responses. Notably, ChatGPT 3.5 can comprehend questions and furnish persuasive, grammatically correct answers. Moreover, it has the capability to generate code, stories, poetry, scientific abstracts, and various other types of content in different styles. It is crucial to emphasize that ChatGPT 3.5 does not merely replicate stored information. Instead, it generates the most probable next word based on probabilities acquired through reinforcement learning during its training process [4-6].
ChatGPT 3.5 has the potential to greatly assist radiologists in image analysis and interpretation, leveraging its deep-learning capabilities to scrutinize extensive imaging data. By presenting alternative perspectives and highlighting potential areas of concern, ChatGPT 3.5 can enhance diagnostic accuracy and efficiency [4, 5]. Furthermore, the tool can optimize workflow in radiology departments by automating repetitive tasks, such as report generation, leading to time savings for radiologists, being crucial in particular for emergency radiologists.
Indeed, recently a course called “The Radiological Society of North America (RSNA) Emergency Imaging AI Certificate” has been introduced by the RSNA, which signifies the importance of AI technologies including LLMs in emergency settings. Thus, we decided to explore the role that ChatGPT 3.5 can play in a specific setting of radiological emergencies, in particular in the setting of imaging of cardiothoracic emergencies [7, 8].
As shown in Figure 1, we first inquired ChatGPT 3.5 regarding the radiation dose in a diagnostic coronary angiogram providing also patient specific data including weight, age, and sex. To our knowledge, the typical effective dose value for a coronary angiogram is between 5 and 10 mSv [9]. This is one instance, which highlights that the results given by the model must be considered with caution. However, it acknowledges that the value may differ due to modern techniques available for reducing radiation exposure. Furthermore, ChatGPT 3.5 also informs the technician or radiologist about the requirement for additional data, such as number of projections, fluoroscopy time, and other technical parameters, to enhance the accuracy of the dose estimate. Future versions may incorporate weight-based data as well and up-to-date data potentially making it relatively accurate.
ChatGPT 3.5 can serve as a valuable resource by rapidly offering age-specific normal values for emergency radiologists and trainees. These resources can be conveniently accessed on the go, facilitating image interpretation and professional development.
Therefore, we inquired from ChatGPT 3.5 the normal dimensions of the great vessels of the heart; see Figure 2. We noted that the values reported by the Chun et al. [10] accurately fit into these ranges. Age and weight-based normal values in emergency radiology are crucial, considering that not all of these values can be easily memorized. ChatGPT 3.5 provides a convenient and efficient means of accessing this information, offering a more streamlined alternative to internet searches. However, it is important to note that the reliability of its reference sources needs clarification, a feature currently unavailable in the current version of ChatGPT 3.5. In particular, we do not know whether ChatGPT 3.5 preferentially generates its texts from peer-reviewed sources or not.
Another valuable application of ChatGPT 3.5 is in the creation of pre-procedure checklists, aiming to prevent mishaps during emergent radiological procedures; see Figure 3. This tool proves particularly beneficial for emergency radiology trainees or fellows [7], contributing to enhanced procedural accuracy and safety.
It is evident that ChatGPT 3.5 offers a comprehensive step-by-step to-do list for each procedure, serving as a preventive measure against negligence. However, it is crucial that these checklists undergo verification for accuracy by a trained emergency radiologist before being implemented for clinical use [7].
Currently, ChatGPT 3.5 serves the additional function of aiding in report generation. This involves streamlining the input process to a few "keywords of the diagnosis" and patient clinical information. Subsequently, ChatGPT 3.5 can compose an entire report, which the radiologist can then review for accuracy and finalize after necessary editing; see Figure 4. We observe that while the report may not be flawless, it serves well as a preliminary draft that can be refined by the radiologist, potentially reducing the time required to verbally dictate or manually type the entire report. This streamlined process is anticipated to reduce turnaround time and alleviate workload-related burnout, both of which are crucial aspects for emergency radiology [8, 11].
Despite the advantages, we acknowledge that there are debates about the usefulness of the generated reports, especially considering the existence of current radiology dictation software that already uses voice commands for rapid template-based reporting. Additionally, caution is advised as ChatGPT 3.5 has been shown to invent findings not present in the user prompt, introducing a potential risk [11]. Another concern lies in the possible incompleteness of ChatGPT 3.5's report generation, as it may omit essential details. Enhancing completeness and readability can be achieved by employing a refined prompt with explicit instructions on preserving key information, based on the observed outcomes.
Furthermore, one of the major limitations of ChatGPT 3.5 is that it is not able to read medical images. At the time of the current writing, its upgraded paid version GPT 4 is acknowledged to possess the capability to convert images to text. Looking ahead, the future holds the possibility of training GPT 4 on extensive datasets comprising radiological images and associated clinical data. We believe that this approach could further benefit from narrowing down of the datasets needed by focusing on a subspecialty like emergency radiology. This training could enhance ChatGPT 3.5's ability to assist emergency radiologists in formulating more precise and well-informed diagnoses. By integrating patient information and medical history, ChatGPT 3.5 contributes valuable insights, aiding in the formulation of comprehensive and precise differential diagnoses. As a second reader, ChatGPT 3.5 enhances quality assurance by detecting errors or oversights in radiology reports with real-time feedback during the imaging process. Its ability to cross-reference vast amounts of data helps identify inconsistencies, minimizing the risk of misdiagnosis and contributing to improved patient safety. What could be more helpful than this for emergency radiologists?
Nevertheless, to gain regulatory approval, demonstrating the safety and effectiveness of such algorithms will hinge on their intended use, considering associated risks and benefits. For example, there is potential for overreliance on this technology, which may diminish the interpretive abilities of experienced radiologists. To avoid this, it is imperative that the ChatGPT 3.5 generated results are made available only after a first level confirmation by the emergency radiologist [12, 13].
We see that in Figure 5, ChatGPT 3.5 was able to accurately mention all five imaging goals in aortic dissection, which include identification of the site of intimal tear, extent of dissection (for classification), cardiac involvement (pericardial, myocardial, and valvular), aortic rupture, and major branch vessel involvement [10]. The same is true for Figure 6, and in addition, it demonstrates its ability to rapidly provide the risks versus benefits of diagnostic procedures as well as what to expect after. This may indicate the potential for future residents and emergency radiology fellows to be educated using the tool.
Next, ChatGPT 3.5 has the capability to aid emergency radiologists and trainees in determining the optimal MRI techniques for each diagnosis, potentially minimizing unnecessary anesthesia time, reducing patient noncompliance associated to lengthy acquisitions, and ultimately enhancing the overall workflow. This assistance encompasses selecting the suitable pulse sequence, optimizing the field of view, and deciding on the necessity of contrast for obtaining essential information; see Figure 7.
We observe that ChatGPT 3.5 not only provides the necessary sequences for imaging of pericardial effusion but also emphasizes that protocols may differ based on individual hospital guidelines. Over time, each hospital has the potential to develop its own customized version of ChatGPT 3.5, tailored to provide information specific to its radiology department. A recent study confirmed our opinion by showing that GPT 4 was indeed able to offer decision support in selecting imaging examinations and generating radiology referrals in the emergency department and demonstrated robust alignment with the American College of Radiology (ACR) guidelines, with 95% concordance with experts in imaging recommendations [4]. Additionally, it aids emergency radiologists and trainees by offering insights into specific radiological features to observe in particular medical conditions. This dual functionality contributes significantly to both procedural accuracy and diagnostic proficiency.
As shown in Figures 8 and 9, while the information provided by ChatGPT 3.5 may not be as precise as that found in established emergency radiology textbooks, there is potential for improvement in future versions by incorporating quoted references.
Beyond diagnostics, ChatGPT 3.5 can assist radiologists in making evidence-based treatment decisions by analyzing medical literature and clinical guidelines. Although limitations exist, including the generation of fake references/sources, future updates could potentially overcome these challenges [11, 12].
ChatGPT 3.5's integration of machine-learning algorithms allows it to adapt to emerging radiology research, keeping emergency radiologists informed about the latest advancements in the field.
Having demonstrated success in passing radiology board exams, ChatGPT 3.5 can also serve as an educational resource. It provides access to up-to-date information, case studies, and reference materials, fostering collaborative learning and discussions among radiologists. Thus, we decided to provide ChatGPT 3.5 some emergency radiology multiple-choice questions (MCQs); see Figure 10.
In our case, a notable limitation was the absence of image-based questions, hindering the real-world applicability and supporting the view that passing an exam without image analysis may not be sufficient for actual radiology practice. To address this, future investigations should employ a multimodal approach, combining image analysis with text interpretation for a more comprehensive evaluation, and the tool could be tested on subspecialty board exams like the European Diploma in Emergency Radiology. Already, new LLMs are underway, which will have access to this feature (image analysis). For example, Google's Med-PaLM 2 is designed to enhance radiology image analysis capabilities by leveraging a Language Model for Medical Imaging (LLM). Unlike conventional AI systems that analyze images and provide conclusions without transparent reasoning, Med-PaLM 2 combines image analysis with question answering capabilities. The key feature is that this model enables a two-way interaction between physicians and the AI. In a typical scenario, a doctor can provide an image to the LLM, and the model generates a detailed report. The unique aspect is that the physician can then engage in follow-up questions, effectively turning the AI system into an ongoing dialog rather than a black box. For example, a doctor can question the model about its initial assessment, seek clarifications, or ask for additional insights. This feature enhances the interpretability of the AI system, allowing healthcare professionals to better understand the reasoning behind the model's conclusions. The potential impact of this capability is significant in the field of radiology. Physicians can potentially use the LLM not just for its diagnostic capabilities but also as a collaborative tool in analyzing complex medical images. This real-time interaction could lead to more accurate diagnoses, better-informed treatment decisions, and overall improvements in the efficiency of healthcare workflows [14].
Another important concern is that ChatGPT 3.5 consistently uses confident language, even when providing incorrect answers, posing a potential risk, especially in health care. This was the case in 2 out of 5 MCQs we provided (correct answer for question 3 is C while for 4 is B) [15, 16]. Adopting more nuanced language that reflects the degree of confidence would be safer. Additionally, considering that ChatGPT 3.5's confidence may change with follow-up questions like "are you sure?" undermines its initial confidence [17]. Examining ChatGPT 3.5's contextual understanding, especially in questions involving patient history and complex symptoms, is crucial for assessing its practical utility in real-world radiology practice. Addressing these concerns is essential for a better understanding of ChatGPT 3.5's capabilities and limitations in emergency radiology.
We believe that continuous learning from new data and feedback allows ChatGPT 3.5 to improve its performance and expand its knowledge base over time. Nevertheless, we acknowledge the potential for ChatGPT 3.5 to transform radiology training with improvements, in general, not being limited to emergency radiology. As more radiological cases are analyzed and incorporated into its training, its integration into teaching curriculums and assistance to residents hinges on adequate enhancement of the tool. This ongoing evolution would prompt emergency radiology training and fellowship programs to rethink their education approaches for residents in the future.
Though we could not focus on all types of emergencies in this commentary, we believe that these examples gave an overall idea of the usefulness of AI chatbots in emergency radiology. However, there are limitations to its adoption in real-world clinical practice that need careful consideration and need to be addressed.
At the moment, there exist few studies comparing LLMs at subspecialty level, in particular in emergency radiology. According to a recent study by Barash et al., OpenAI's ChatGPT 4 showcased promise in the realm of imaging referrals by providing imaging recommendations and generating radiology referrals based on clinical notes from emergency department cases. The chatbot demonstrated a 95% accuracy rate in responding appropriately, evaluated against the ACR Appropriateness Criteria. Experts believe that, with proper training, large language models like ChatGPT could help address inappropriate imaging referrals, a common issue in emergency departments. The study highlighted the importance of high-quality referral notes in influencing the accuracy of interpretation, clinical relevance of radiology reports, and radiologists' confidence. While ChatGPT 4 aligned well with ACR AC in imaging recommendations and excelled in suggesting protocols, it faced challenges in excluding time frames in its recommendations in 35% of cases, despite access to such information. The authors expressed optimism about the chatbot's potential to enhance clinical efficiency but underscored the need to be mindful of challenges and risks associated with adopting such technology in radiology practice [18].
In another study by Infante et al., the performances of different language models (LLMs) in emergency radiology was compared. The study showed that ChatGPT consistently outperforms others across subspecialties, while Bard exhibits lower performance except in negative predictive value. Perplexity falls in between. In future, if LLMs prove to have real-word utility in the imaging department, more of such comparative studies may be expected [23].
One primary concern is the potential excessive dependence on technology, revolving around the diminishing clinical expertise and interpretative skills of experienced radiologists. If AI becomes the sole decision-maker in diagnosis and treatment, there is a risk of jeopardizing essential human oversight in patient care, potentially leading to troublesome consequences, in particular more probable in emergency radiology units where high-risk patients are very common. It is important to acknowledge that ChatGPT 3.5 is an AI tool and should not be considered a substitute for the expertise of a trained emergency radiologist. While ChatGPT 3.5 can provide valuable insights and assistance, the final interpretation and diagnosis should always be made by the emergency radiologist. The accuracy of ChatGPT 3.5 is heavily dependent on the quality and quantity of the data used to train the model, posing a risk of bias and inaccuracy in the generated responses. Ensuring that the training data is representative of the population being studied and consistently updating and refining the model is essential to enhance its accuracy. Another limitation of ChatGPT 3.5 is its potential for misinterpreting natural language queries. Referring emergency room physicians may not always use medical terminology accurately or provide all relevant information, leading to incorrect or incomplete responses from the tool. Additionally, concerns related to privacy and data security, as seen with the temporary ban imposed in Italy, underscore the need to establish robust measures for securely storing and protecting patient data when utilizing ChatGPT 3.5 or any other AI tool in radiology [19, 20].
In conclusion, the role of ChatGPT 3.5 in emergency radiology is an evolving area with significant potential to positively impact patient care and the radiology profession itself. However, careful consideration of its limitations and ethical considerations is essential for responsible and effective integration into emergency units.