Role of ChatGPT 3.5 in emergency radiology, with a focus on cardiothoracic emergencies: Proof with examples

iRadiology Pub Date : 2024-04-11 DOI:10.1002/ird3.65
Arosh S. Perera Molligoda Arachchige
{"title":"Role of ChatGPT 3.5 in emergency radiology, with a focus on cardiothoracic emergencies: Proof with examples","authors":"Arosh S. Perera Molligoda Arachchige","doi":"10.1002/ird3.65","DOIUrl":null,"url":null,"abstract":"<p>As the authors of this commentary, we would like to clarify that the figures presented originated from ChatGPT 3.5. Unless specified otherwise, in all figures, questions were provided as input through its user interface and the responses generated have been illustrated in a distinct font. The human authors subsequently undertook the editing process where we edited the ChatGPT 3.5 generated responses for better clarity (in terms of text organization) [<span>1-3</span>].</p><p>ChatGPT 3.5, created by OpenAI in San Francisco, is an advanced artificial intelligence conversational tool. Operating as a large language model (LLM), it can engage in conversations across more than 90 languages. Developed through deep-learning techniques utilizing multilayered recurrent feedforward neural networks, the model has undergone training on an extensive dataset with over 175 billion parameters. This dataset comprises information from diverse internet sources, including websites, articles, fiction, and books, collected until September 2021. The architecture of ChatGPT 3.5 is based on transformers, allowing it to simultaneously process a vast amount of data. This design enables the model to grasp the context and relationships between words in input sequences, facilitating the generation of coherent and relevant responses. Notably, ChatGPT 3.5 can comprehend questions and furnish persuasive, grammatically correct answers. Moreover, it has the capability to generate code, stories, poetry, scientific abstracts, and various other types of content in different styles. It is crucial to emphasize that ChatGPT 3.5 does not merely replicate stored information. Instead, it generates the most probable next word based on probabilities acquired through reinforcement learning during its training process [<span>4-6</span>].</p><p>ChatGPT 3.5 has the potential to greatly assist radiologists in image analysis and interpretation, leveraging its deep-learning capabilities to scrutinize extensive imaging data. By presenting alternative perspectives and highlighting potential areas of concern, ChatGPT 3.5 can enhance diagnostic accuracy and efficiency [<span>4, 5</span>]. Furthermore, the tool can optimize workflow in radiology departments by automating repetitive tasks, such as report generation, leading to time savings for radiologists, being crucial in particular for emergency radiologists.</p><p>Indeed, recently a course called “The Radiological Society of North America (RSNA) Emergency Imaging AI Certificate” has been introduced by the RSNA, which signifies the importance of AI technologies including LLMs in emergency settings. Thus, we decided to explore the role that ChatGPT 3.5 can play in a specific setting of radiological emergencies, in particular in the setting of imaging of cardiothoracic emergencies [<span>7, 8</span>].</p><p>As shown in Figure 1, we first inquired ChatGPT 3.5 regarding the radiation dose in a diagnostic coronary angiogram providing also patient specific data including weight, age, and sex. To our knowledge, the typical effective dose value for a coronary angiogram is between 5 and 10 mSv [<span>9</span>]. This is one instance, which highlights that the results given by the model must be considered with caution. However, it acknowledges that the value may differ due to modern techniques available for reducing radiation exposure. Furthermore, ChatGPT 3.5 also informs the technician or radiologist about the requirement for additional data, such as number of projections, fluoroscopy time, and other technical parameters, to enhance the accuracy of the dose estimate. Future versions may incorporate weight-based data as well and up-to-date data potentially making it relatively accurate.</p><p>ChatGPT 3.5 can serve as a valuable resource by rapidly offering age-specific normal values for emergency radiologists and trainees. These resources can be conveniently accessed on the go, facilitating image interpretation and professional development.</p><p>Therefore, we inquired from ChatGPT 3.5 the normal dimensions of the great vessels of the heart; see Figure 2. We noted that the values reported by the Chun et al. [<span>10</span>] accurately fit into these ranges. Age and weight-based normal values in emergency radiology are crucial, considering that not all of these values can be easily memorized. ChatGPT 3.5 provides a convenient and efficient means of accessing this information, offering a more streamlined alternative to internet searches. However, it is important to note that the reliability of its reference sources needs clarification, a feature currently unavailable in the current version of ChatGPT 3.5. In particular, we do not know whether ChatGPT 3.5 preferentially generates its texts from peer-reviewed sources or not.</p><p>Another valuable application of ChatGPT 3.5 is in the creation of pre-procedure checklists, aiming to prevent mishaps during emergent radiological procedures; see Figure 3. This tool proves particularly beneficial for emergency radiology trainees or fellows [<span>7</span>], contributing to enhanced procedural accuracy and safety.</p><p>It is evident that ChatGPT 3.5 offers a comprehensive step-by-step to-do list for each procedure, serving as a preventive measure against negligence. However, it is crucial that these checklists undergo verification for accuracy by a trained emergency radiologist before being implemented for clinical use [<span>7</span>].</p><p>Currently, ChatGPT 3.5 serves the additional function of aiding in report generation. This involves streamlining the input process to a few \"keywords of the diagnosis\" and patient clinical information. Subsequently, ChatGPT 3.5 can compose an entire report, which the radiologist can then review for accuracy and finalize after necessary editing; see Figure 4. We observe that while the report may not be flawless, it serves well as a preliminary draft that can be refined by the radiologist, potentially reducing the time required to verbally dictate or manually type the entire report. This streamlined process is anticipated to reduce turnaround time and alleviate workload-related burnout, both of which are crucial aspects for emergency radiology [<span>8, 11</span>].</p><p>Despite the advantages, we acknowledge that there are debates about the usefulness of the generated reports, especially considering the existence of current radiology dictation software that already uses voice commands for rapid template-based reporting. Additionally, caution is advised as ChatGPT 3.5 has been shown to invent findings not present in the user prompt, introducing a potential risk [<span>11</span>]. Another concern lies in the possible incompleteness of ChatGPT 3.5's report generation, as it may omit essential details. Enhancing completeness and readability can be achieved by employing a refined prompt with explicit instructions on preserving key information, based on the observed outcomes.</p><p>Furthermore, one of the major limitations of ChatGPT 3.5 is that it is not able to read medical images. At the time of the current writing, its upgraded paid version GPT 4 is acknowledged to possess the capability to convert images to text. Looking ahead, the future holds the possibility of training GPT 4 on extensive datasets comprising radiological images and associated clinical data. We believe that this approach could further benefit from narrowing down of the datasets needed by focusing on a subspecialty like emergency radiology. This training could enhance ChatGPT 3.5's ability to assist emergency radiologists in formulating more precise and well-informed diagnoses. By integrating patient information and medical history, ChatGPT 3.5 contributes valuable insights, aiding in the formulation of comprehensive and precise differential diagnoses. As a second reader, ChatGPT 3.5 enhances quality assurance by detecting errors or oversights in radiology reports with real-time feedback during the imaging process. Its ability to cross-reference vast amounts of data helps identify inconsistencies, minimizing the risk of misdiagnosis and contributing to improved patient safety. What could be more helpful than this for emergency radiologists?</p><p>Nevertheless, to gain regulatory approval, demonstrating the safety and effectiveness of such algorithms will hinge on their intended use, considering associated risks and benefits. For example, there is potential for overreliance on this technology, which may diminish the interpretive abilities of experienced radiologists. To avoid this, it is imperative that the ChatGPT 3.5 generated results are made available only after a first level confirmation by the emergency radiologist [<span>12, 13</span>].</p><p>We see that in Figure 5, ChatGPT 3.5 was able to accurately mention all five imaging goals in aortic dissection, which include identification of the site of intimal tear, extent of dissection (for classification), cardiac involvement (pericardial, myocardial, and valvular), aortic rupture, and major branch vessel involvement [<span>10</span>]. The same is true for Figure 6, and in addition, it demonstrates its ability to rapidly provide the risks versus benefits of diagnostic procedures as well as what to expect after. This may indicate the potential for future residents and emergency radiology fellows to be educated using the tool.</p><p>Next, ChatGPT 3.5 has the capability to aid emergency radiologists and trainees in determining the optimal MRI techniques for each diagnosis, potentially minimizing unnecessary anesthesia time, reducing patient noncompliance associated to lengthy acquisitions, and ultimately enhancing the overall workflow. This assistance encompasses selecting the suitable pulse sequence, optimizing the field of view, and deciding on the necessity of contrast for obtaining essential information; see Figure 7.</p><p>We observe that ChatGPT 3.5 not only provides the necessary sequences for imaging of pericardial effusion but also emphasizes that protocols may differ based on individual hospital guidelines. Over time, each hospital has the potential to develop its own customized version of ChatGPT 3.5, tailored to provide information specific to its radiology department. A recent study confirmed our opinion by showing that GPT 4 was indeed able to offer decision support in selecting imaging examinations and generating radiology referrals in the emergency department and demonstrated robust alignment with the American College of Radiology (ACR) guidelines, with 95% concordance with experts in imaging recommendations [<span>4</span>]. Additionally, it aids emergency radiologists and trainees by offering insights into specific radiological features to observe in particular medical conditions. This dual functionality contributes significantly to both procedural accuracy and diagnostic proficiency.</p><p>As shown in Figures 8 and 9, while the information provided by ChatGPT 3.5 may not be as precise as that found in established emergency radiology textbooks, there is potential for improvement in future versions by incorporating quoted references.</p><p>Beyond diagnostics, ChatGPT 3.5 can assist radiologists in making evidence-based treatment decisions by analyzing medical literature and clinical guidelines. Although limitations exist, including the generation of fake references/sources, future updates could potentially overcome these challenges [<span>11, 12</span>].</p><p>ChatGPT 3.5's integration of machine-learning algorithms allows it to adapt to emerging radiology research, keeping emergency radiologists informed about the latest advancements in the field.</p><p>Having demonstrated success in passing radiology board exams, ChatGPT 3.5 can also serve as an educational resource. It provides access to up-to-date information, case studies, and reference materials, fostering collaborative learning and discussions among radiologists. Thus, we decided to provide ChatGPT 3.5 some emergency radiology multiple-choice questions (MCQs); see Figure 10.</p><p>In our case, a notable limitation was the absence of image-based questions, hindering the real-world applicability and supporting the view that passing an exam without image analysis may not be sufficient for actual radiology practice. To address this, future investigations should employ a multimodal approach, combining image analysis with text interpretation for a more comprehensive evaluation, and the tool could be tested on subspecialty board exams like the European Diploma in Emergency Radiology. Already, new LLMs are underway, which will have access to this feature (image analysis). For example, Google's Med-PaLM 2 is designed to enhance radiology image analysis capabilities by leveraging a Language Model for Medical Imaging (LLM). Unlike conventional AI systems that analyze images and provide conclusions without transparent reasoning, Med-PaLM 2 combines image analysis with question answering capabilities. The key feature is that this model enables a two-way interaction between physicians and the AI. In a typical scenario, a doctor can provide an image to the LLM, and the model generates a detailed report. The unique aspect is that the physician can then engage in follow-up questions, effectively turning the AI system into an ongoing dialog rather than a black box. For example, a doctor can question the model about its initial assessment, seek clarifications, or ask for additional insights. This feature enhances the interpretability of the AI system, allowing healthcare professionals to better understand the reasoning behind the model's conclusions. The potential impact of this capability is significant in the field of radiology. Physicians can potentially use the LLM not just for its diagnostic capabilities but also as a collaborative tool in analyzing complex medical images. This real-time interaction could lead to more accurate diagnoses, better-informed treatment decisions, and overall improvements in the efficiency of healthcare workflows [<span>14</span>].</p><p>Another important concern is that ChatGPT 3.5 consistently uses confident language, even when providing incorrect answers, posing a potential risk, especially in health care. This was the case in 2 out of 5 MCQs we provided (correct answer for question 3 is C while for 4 is B) [<span>15, 16</span>]. Adopting more nuanced language that reflects the degree of confidence would be safer. Additionally, considering that ChatGPT 3.5's confidence may change with follow-up questions like \"are you sure?\" undermines its initial confidence [<span>17</span>]. Examining ChatGPT 3.5's contextual understanding, especially in questions involving patient history and complex symptoms, is crucial for assessing its practical utility in real-world radiology practice. Addressing these concerns is essential for a better understanding of ChatGPT 3.5's capabilities and limitations in emergency radiology.</p><p>We believe that continuous learning from new data and feedback allows ChatGPT 3.5 to improve its performance and expand its knowledge base over time. Nevertheless, we acknowledge the potential for ChatGPT 3.5 to transform radiology training with improvements, in general, not being limited to emergency radiology. As more radiological cases are analyzed and incorporated into its training, its integration into teaching curriculums and assistance to residents hinges on adequate enhancement of the tool. This ongoing evolution would prompt emergency radiology training and fellowship programs to rethink their education approaches for residents in the future.</p><p>Though we could not focus on all types of emergencies in this commentary, we believe that these examples gave an overall idea of the usefulness of AI chatbots in emergency radiology. However, there are limitations to its adoption in real-world clinical practice that need careful consideration and need to be addressed.</p><p>At the moment, there exist few studies comparing LLMs at subspecialty level, in particular in emergency radiology. According to a recent study by Barash et al., OpenAI's ChatGPT 4 showcased promise in the realm of imaging referrals by providing imaging recommendations and generating radiology referrals based on clinical notes from emergency department cases. The chatbot demonstrated a 95% accuracy rate in responding appropriately, evaluated against the ACR Appropriateness Criteria. Experts believe that, with proper training, large language models like ChatGPT could help address inappropriate imaging referrals, a common issue in emergency departments. The study highlighted the importance of high-quality referral notes in influencing the accuracy of interpretation, clinical relevance of radiology reports, and radiologists' confidence. While ChatGPT 4 aligned well with ACR AC in imaging recommendations and excelled in suggesting protocols, it faced challenges in excluding time frames in its recommendations in 35% of cases, despite access to such information. The authors expressed optimism about the chatbot's potential to enhance clinical efficiency but underscored the need to be mindful of challenges and risks associated with adopting such technology in radiology practice [<span>18</span>].</p><p>In another study by Infante et al., the performances of different language models (LLMs) in emergency radiology was compared. The study showed that ChatGPT consistently outperforms others across subspecialties, while Bard exhibits lower performance except in negative predictive value. Perplexity falls in between. In future, if LLMs prove to have real-word utility in the imaging department, more of such comparative studies may be expected [23].</p><p>One primary concern is the potential excessive dependence on technology, revolving around the diminishing clinical expertise and interpretative skills of experienced radiologists. If AI becomes the sole decision-maker in diagnosis and treatment, there is a risk of jeopardizing essential human oversight in patient care, potentially leading to troublesome consequences, in particular more probable in emergency radiology units where high-risk patients are very common. It is important to acknowledge that ChatGPT 3.5 is an AI tool and should not be considered a substitute for the expertise of a trained emergency radiologist. While ChatGPT 3.5 can provide valuable insights and assistance, the final interpretation and diagnosis should always be made by the emergency radiologist. The accuracy of ChatGPT 3.5 is heavily dependent on the quality and quantity of the data used to train the model, posing a risk of bias and inaccuracy in the generated responses. Ensuring that the training data is representative of the population being studied and consistently updating and refining the model is essential to enhance its accuracy. Another limitation of ChatGPT 3.5 is its potential for misinterpreting natural language queries. Referring emergency room physicians may not always use medical terminology accurately or provide all relevant information, leading to incorrect or incomplete responses from the tool. Additionally, concerns related to privacy and data security, as seen with the temporary ban imposed in Italy, underscore the need to establish robust measures for securely storing and protecting patient data when utilizing ChatGPT 3.5 or any other AI tool in radiology [<span>19, 20</span>].</p><p>In conclusion, the role of ChatGPT 3.5 in emergency radiology is an evolving area with significant potential to positively impact patient care and the radiology profession itself. However, careful consideration of its limitations and ethical considerations is essential for responsible and effective integration into emergency units.</p><p>This is a single author manuscript.</p><p>The author declares no conflict of interest.</p><p>Not applicable.</p><p>Not applicable.</p>","PeriodicalId":73508,"journal":{"name":"iRadiology","volume":"2 5","pages":"510-521"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ird3.65","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"iRadiology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ird3.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As the authors of this commentary, we would like to clarify that the figures presented originated from ChatGPT 3.5. Unless specified otherwise, in all figures, questions were provided as input through its user interface and the responses generated have been illustrated in a distinct font. The human authors subsequently undertook the editing process where we edited the ChatGPT 3.5 generated responses for better clarity (in terms of text organization) [1-3].

ChatGPT 3.5, created by OpenAI in San Francisco, is an advanced artificial intelligence conversational tool. Operating as a large language model (LLM), it can engage in conversations across more than 90 languages. Developed through deep-learning techniques utilizing multilayered recurrent feedforward neural networks, the model has undergone training on an extensive dataset with over 175 billion parameters. This dataset comprises information from diverse internet sources, including websites, articles, fiction, and books, collected until September 2021. The architecture of ChatGPT 3.5 is based on transformers, allowing it to simultaneously process a vast amount of data. This design enables the model to grasp the context and relationships between words in input sequences, facilitating the generation of coherent and relevant responses. Notably, ChatGPT 3.5 can comprehend questions and furnish persuasive, grammatically correct answers. Moreover, it has the capability to generate code, stories, poetry, scientific abstracts, and various other types of content in different styles. It is crucial to emphasize that ChatGPT 3.5 does not merely replicate stored information. Instead, it generates the most probable next word based on probabilities acquired through reinforcement learning during its training process [4-6].

ChatGPT 3.5 has the potential to greatly assist radiologists in image analysis and interpretation, leveraging its deep-learning capabilities to scrutinize extensive imaging data. By presenting alternative perspectives and highlighting potential areas of concern, ChatGPT 3.5 can enhance diagnostic accuracy and efficiency [4, 5]. Furthermore, the tool can optimize workflow in radiology departments by automating repetitive tasks, such as report generation, leading to time savings for radiologists, being crucial in particular for emergency radiologists.

Indeed, recently a course called “The Radiological Society of North America (RSNA) Emergency Imaging AI Certificate” has been introduced by the RSNA, which signifies the importance of AI technologies including LLMs in emergency settings. Thus, we decided to explore the role that ChatGPT 3.5 can play in a specific setting of radiological emergencies, in particular in the setting of imaging of cardiothoracic emergencies [7, 8].

As shown in Figure 1, we first inquired ChatGPT 3.5 regarding the radiation dose in a diagnostic coronary angiogram providing also patient specific data including weight, age, and sex. To our knowledge, the typical effective dose value for a coronary angiogram is between 5 and 10 mSv [9]. This is one instance, which highlights that the results given by the model must be considered with caution. However, it acknowledges that the value may differ due to modern techniques available for reducing radiation exposure. Furthermore, ChatGPT 3.5 also informs the technician or radiologist about the requirement for additional data, such as number of projections, fluoroscopy time, and other technical parameters, to enhance the accuracy of the dose estimate. Future versions may incorporate weight-based data as well and up-to-date data potentially making it relatively accurate.

ChatGPT 3.5 can serve as a valuable resource by rapidly offering age-specific normal values for emergency radiologists and trainees. These resources can be conveniently accessed on the go, facilitating image interpretation and professional development.

Therefore, we inquired from ChatGPT 3.5 the normal dimensions of the great vessels of the heart; see Figure 2. We noted that the values reported by the Chun et al. [10] accurately fit into these ranges. Age and weight-based normal values in emergency radiology are crucial, considering that not all of these values can be easily memorized. ChatGPT 3.5 provides a convenient and efficient means of accessing this information, offering a more streamlined alternative to internet searches. However, it is important to note that the reliability of its reference sources needs clarification, a feature currently unavailable in the current version of ChatGPT 3.5. In particular, we do not know whether ChatGPT 3.5 preferentially generates its texts from peer-reviewed sources or not.

Another valuable application of ChatGPT 3.5 is in the creation of pre-procedure checklists, aiming to prevent mishaps during emergent radiological procedures; see Figure 3. This tool proves particularly beneficial for emergency radiology trainees or fellows [7], contributing to enhanced procedural accuracy and safety.

It is evident that ChatGPT 3.5 offers a comprehensive step-by-step to-do list for each procedure, serving as a preventive measure against negligence. However, it is crucial that these checklists undergo verification for accuracy by a trained emergency radiologist before being implemented for clinical use [7].

Currently, ChatGPT 3.5 serves the additional function of aiding in report generation. This involves streamlining the input process to a few "keywords of the diagnosis" and patient clinical information. Subsequently, ChatGPT 3.5 can compose an entire report, which the radiologist can then review for accuracy and finalize after necessary editing; see Figure 4. We observe that while the report may not be flawless, it serves well as a preliminary draft that can be refined by the radiologist, potentially reducing the time required to verbally dictate or manually type the entire report. This streamlined process is anticipated to reduce turnaround time and alleviate workload-related burnout, both of which are crucial aspects for emergency radiology [8, 11].

Despite the advantages, we acknowledge that there are debates about the usefulness of the generated reports, especially considering the existence of current radiology dictation software that already uses voice commands for rapid template-based reporting. Additionally, caution is advised as ChatGPT 3.5 has been shown to invent findings not present in the user prompt, introducing a potential risk [11]. Another concern lies in the possible incompleteness of ChatGPT 3.5's report generation, as it may omit essential details. Enhancing completeness and readability can be achieved by employing a refined prompt with explicit instructions on preserving key information, based on the observed outcomes.

Furthermore, one of the major limitations of ChatGPT 3.5 is that it is not able to read medical images. At the time of the current writing, its upgraded paid version GPT 4 is acknowledged to possess the capability to convert images to text. Looking ahead, the future holds the possibility of training GPT 4 on extensive datasets comprising radiological images and associated clinical data. We believe that this approach could further benefit from narrowing down of the datasets needed by focusing on a subspecialty like emergency radiology. This training could enhance ChatGPT 3.5's ability to assist emergency radiologists in formulating more precise and well-informed diagnoses. By integrating patient information and medical history, ChatGPT 3.5 contributes valuable insights, aiding in the formulation of comprehensive and precise differential diagnoses. As a second reader, ChatGPT 3.5 enhances quality assurance by detecting errors or oversights in radiology reports with real-time feedback during the imaging process. Its ability to cross-reference vast amounts of data helps identify inconsistencies, minimizing the risk of misdiagnosis and contributing to improved patient safety. What could be more helpful than this for emergency radiologists?

Nevertheless, to gain regulatory approval, demonstrating the safety and effectiveness of such algorithms will hinge on their intended use, considering associated risks and benefits. For example, there is potential for overreliance on this technology, which may diminish the interpretive abilities of experienced radiologists. To avoid this, it is imperative that the ChatGPT 3.5 generated results are made available only after a first level confirmation by the emergency radiologist [12, 13].

We see that in Figure 5, ChatGPT 3.5 was able to accurately mention all five imaging goals in aortic dissection, which include identification of the site of intimal tear, extent of dissection (for classification), cardiac involvement (pericardial, myocardial, and valvular), aortic rupture, and major branch vessel involvement [10]. The same is true for Figure 6, and in addition, it demonstrates its ability to rapidly provide the risks versus benefits of diagnostic procedures as well as what to expect after. This may indicate the potential for future residents and emergency radiology fellows to be educated using the tool.

Next, ChatGPT 3.5 has the capability to aid emergency radiologists and trainees in determining the optimal MRI techniques for each diagnosis, potentially minimizing unnecessary anesthesia time, reducing patient noncompliance associated to lengthy acquisitions, and ultimately enhancing the overall workflow. This assistance encompasses selecting the suitable pulse sequence, optimizing the field of view, and deciding on the necessity of contrast for obtaining essential information; see Figure 7.

We observe that ChatGPT 3.5 not only provides the necessary sequences for imaging of pericardial effusion but also emphasizes that protocols may differ based on individual hospital guidelines. Over time, each hospital has the potential to develop its own customized version of ChatGPT 3.5, tailored to provide information specific to its radiology department. A recent study confirmed our opinion by showing that GPT 4 was indeed able to offer decision support in selecting imaging examinations and generating radiology referrals in the emergency department and demonstrated robust alignment with the American College of Radiology (ACR) guidelines, with 95% concordance with experts in imaging recommendations [4]. Additionally, it aids emergency radiologists and trainees by offering insights into specific radiological features to observe in particular medical conditions. This dual functionality contributes significantly to both procedural accuracy and diagnostic proficiency.

As shown in Figures 8 and 9, while the information provided by ChatGPT 3.5 may not be as precise as that found in established emergency radiology textbooks, there is potential for improvement in future versions by incorporating quoted references.

Beyond diagnostics, ChatGPT 3.5 can assist radiologists in making evidence-based treatment decisions by analyzing medical literature and clinical guidelines. Although limitations exist, including the generation of fake references/sources, future updates could potentially overcome these challenges [11, 12].

ChatGPT 3.5's integration of machine-learning algorithms allows it to adapt to emerging radiology research, keeping emergency radiologists informed about the latest advancements in the field.

Having demonstrated success in passing radiology board exams, ChatGPT 3.5 can also serve as an educational resource. It provides access to up-to-date information, case studies, and reference materials, fostering collaborative learning and discussions among radiologists. Thus, we decided to provide ChatGPT 3.5 some emergency radiology multiple-choice questions (MCQs); see Figure 10.

In our case, a notable limitation was the absence of image-based questions, hindering the real-world applicability and supporting the view that passing an exam without image analysis may not be sufficient for actual radiology practice. To address this, future investigations should employ a multimodal approach, combining image analysis with text interpretation for a more comprehensive evaluation, and the tool could be tested on subspecialty board exams like the European Diploma in Emergency Radiology. Already, new LLMs are underway, which will have access to this feature (image analysis). For example, Google's Med-PaLM 2 is designed to enhance radiology image analysis capabilities by leveraging a Language Model for Medical Imaging (LLM). Unlike conventional AI systems that analyze images and provide conclusions without transparent reasoning, Med-PaLM 2 combines image analysis with question answering capabilities. The key feature is that this model enables a two-way interaction between physicians and the AI. In a typical scenario, a doctor can provide an image to the LLM, and the model generates a detailed report. The unique aspect is that the physician can then engage in follow-up questions, effectively turning the AI system into an ongoing dialog rather than a black box. For example, a doctor can question the model about its initial assessment, seek clarifications, or ask for additional insights. This feature enhances the interpretability of the AI system, allowing healthcare professionals to better understand the reasoning behind the model's conclusions. The potential impact of this capability is significant in the field of radiology. Physicians can potentially use the LLM not just for its diagnostic capabilities but also as a collaborative tool in analyzing complex medical images. This real-time interaction could lead to more accurate diagnoses, better-informed treatment decisions, and overall improvements in the efficiency of healthcare workflows [14].

Another important concern is that ChatGPT 3.5 consistently uses confident language, even when providing incorrect answers, posing a potential risk, especially in health care. This was the case in 2 out of 5 MCQs we provided (correct answer for question 3 is C while for 4 is B) [15, 16]. Adopting more nuanced language that reflects the degree of confidence would be safer. Additionally, considering that ChatGPT 3.5's confidence may change with follow-up questions like "are you sure?" undermines its initial confidence [17]. Examining ChatGPT 3.5's contextual understanding, especially in questions involving patient history and complex symptoms, is crucial for assessing its practical utility in real-world radiology practice. Addressing these concerns is essential for a better understanding of ChatGPT 3.5's capabilities and limitations in emergency radiology.

We believe that continuous learning from new data and feedback allows ChatGPT 3.5 to improve its performance and expand its knowledge base over time. Nevertheless, we acknowledge the potential for ChatGPT 3.5 to transform radiology training with improvements, in general, not being limited to emergency radiology. As more radiological cases are analyzed and incorporated into its training, its integration into teaching curriculums and assistance to residents hinges on adequate enhancement of the tool. This ongoing evolution would prompt emergency radiology training and fellowship programs to rethink their education approaches for residents in the future.

Though we could not focus on all types of emergencies in this commentary, we believe that these examples gave an overall idea of the usefulness of AI chatbots in emergency radiology. However, there are limitations to its adoption in real-world clinical practice that need careful consideration and need to be addressed.

At the moment, there exist few studies comparing LLMs at subspecialty level, in particular in emergency radiology. According to a recent study by Barash et al., OpenAI's ChatGPT 4 showcased promise in the realm of imaging referrals by providing imaging recommendations and generating radiology referrals based on clinical notes from emergency department cases. The chatbot demonstrated a 95% accuracy rate in responding appropriately, evaluated against the ACR Appropriateness Criteria. Experts believe that, with proper training, large language models like ChatGPT could help address inappropriate imaging referrals, a common issue in emergency departments. The study highlighted the importance of high-quality referral notes in influencing the accuracy of interpretation, clinical relevance of radiology reports, and radiologists' confidence. While ChatGPT 4 aligned well with ACR AC in imaging recommendations and excelled in suggesting protocols, it faced challenges in excluding time frames in its recommendations in 35% of cases, despite access to such information. The authors expressed optimism about the chatbot's potential to enhance clinical efficiency but underscored the need to be mindful of challenges and risks associated with adopting such technology in radiology practice [18].

In another study by Infante et al., the performances of different language models (LLMs) in emergency radiology was compared. The study showed that ChatGPT consistently outperforms others across subspecialties, while Bard exhibits lower performance except in negative predictive value. Perplexity falls in between. In future, if LLMs prove to have real-word utility in the imaging department, more of such comparative studies may be expected [23].

One primary concern is the potential excessive dependence on technology, revolving around the diminishing clinical expertise and interpretative skills of experienced radiologists. If AI becomes the sole decision-maker in diagnosis and treatment, there is a risk of jeopardizing essential human oversight in patient care, potentially leading to troublesome consequences, in particular more probable in emergency radiology units where high-risk patients are very common. It is important to acknowledge that ChatGPT 3.5 is an AI tool and should not be considered a substitute for the expertise of a trained emergency radiologist. While ChatGPT 3.5 can provide valuable insights and assistance, the final interpretation and diagnosis should always be made by the emergency radiologist. The accuracy of ChatGPT 3.5 is heavily dependent on the quality and quantity of the data used to train the model, posing a risk of bias and inaccuracy in the generated responses. Ensuring that the training data is representative of the population being studied and consistently updating and refining the model is essential to enhance its accuracy. Another limitation of ChatGPT 3.5 is its potential for misinterpreting natural language queries. Referring emergency room physicians may not always use medical terminology accurately or provide all relevant information, leading to incorrect or incomplete responses from the tool. Additionally, concerns related to privacy and data security, as seen with the temporary ban imposed in Italy, underscore the need to establish robust measures for securely storing and protecting patient data when utilizing ChatGPT 3.5 or any other AI tool in radiology [19, 20].

In conclusion, the role of ChatGPT 3.5 in emergency radiology is an evolving area with significant potential to positively impact patient care and the radiology profession itself. However, careful consideration of its limitations and ethical considerations is essential for responsible and effective integration into emergency units.

This is a single author manuscript.

The author declares no conflict of interest.

Not applicable.

Not applicable.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT 3.5 在急诊放射学中的作用,重点是心胸急诊:实例证明
很明显,ChatGPT 3.5 为每个程序提供了一个全面的分步骤待办事项清单,可作为预防疏忽的措施。不过,这些清单在用于临床之前,必须由经过培训的急诊放射科医生验证其准确性,这一点至关重要[7]。目前,ChatGPT 3.5 还具有辅助生成报告的功能。目前,ChatGPT 3.5 还具有辅助报告生成的功能,包括将输入过程简化为几个 "诊断关键词 "和病人临床信息。随后,ChatGPT 3.5 可以撰写整份报告,放射科医生可以审阅报告的准确性,并在进行必要的编辑后定稿;见图 4。我们注意到,虽然报告可能并非完美无瑕,但它可以很好地充当初稿,供放射科医生完善,从而可能减少口头口述或手动键入整个报告所需的时间。尽管有这些优点,但我们也承认对所生成报告的实用性还存在争议,特别是考虑到目前的放射学口述软件已经使用语音命令来快速生成基于模板的报告。此外,ChatGPT 3.5 还会编造用户提示中没有的检查结果,带来潜在风险,因此建议谨慎使用[11]。另一个问题是 ChatGPT 3.5 生成的报告可能不完整,因为它可能会遗漏重要细节。根据观察到的结果,可以通过使用明确指示保留关键信息的精炼提示来提高完整性和可读性。此外,ChatGPT 3.5 的主要局限之一是无法读取医学影像。在撰写本文时,其升级付费版本 GPT 4 已被确认具有将图像转换为文本的功能。展望未来,我们有可能在包含放射图像和相关临床数据的大量数据集上训练 GPT 4。我们相信,通过专注于像急诊放射学这样的亚专科,缩小所需的数据集范围,这种方法会进一步受益。这种培训可以增强 ChatGPT 3.5 的能力,帮助急诊放射科医生做出更精确、更明智的诊断。通过整合患者信息和病史,ChatGPT 3.5 可以提供有价值的见解,帮助制定全面而精确的鉴别诊断。作为第二阅读器,ChatGPT 3.5 在成像过程中通过实时反馈发现放射报告中的错误或疏漏,从而提高质量保证。ChatGPT 3.5 能够交叉引用大量数据,有助于识别不一致之处,最大限度地降低误诊风险,提高患者安全。对急诊放射科医生来说,还有什么比这更有帮助的呢?然而,要获得监管部门的批准,证明此类算法的安全性和有效性将取决于其预期用途,并考虑相关的风险和益处。例如,过度依赖这种技术可能会削弱经验丰富的放射科医生的判读能力。为了避免这种情况,ChatGPT 3.5 生成的结果必须经过急诊放射医师的一级确认后才能提供[12, 13]。我们看到,在图 5 中,ChatGPT 3.5 能够准确提及主动脉夹层的所有五个成像目标,其中包括识别内膜撕裂部位、夹层范围(用于分类)、心脏受累(心包、心肌和瓣膜)、主动脉破裂和主要分支血管受累[10]。图 6 也是如此,此外,它还展示了快速提供诊断程序的风险与收益以及术后预期的能力。接下来,ChatGPT 3.5 还能帮助急诊放射医师和受训人员确定每项诊断的最佳 MRI 技术,从而最大限度地减少不必要的麻醉时间,降低患者因长时间采集而产生的不服从性,并最终改进整体工作流程。我们注意到 ChatGPT 3.5 不仅提供了心包积液成像所需的序列,还强调了协议可能因各医院的指导方针而异。随着时间的推移,每家医院都有可能开发出自己的定制版 ChatGPT 3.5,为其放射科提供专门的信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Issue Information An unusual large mass of sclerosing angiomatoid nodular transformation Exploring the feasibility of integrating ultra-high field magnetic resonance imaging neuroimaging with multimodal artificial intelligence for clinical diagnostics Three-dimensional time of flight magnetic resonance angiography at 5.0T: Visualization of the superior cerebellar artery Ultra-high field magnetic resonance imaging in theranostics of mental disorders
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1