首页 > 最新文献

Healthcare Informatics Research最新文献

英文 中文
Utility of Multimodal Large Language Models in Analyzing Chest X-Rays with Incomplete Contextual Information. 多模态大语言模型在分析上下文信息不完整的胸部x光片中的应用。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-10-01 Epub Date: 2025-10-31 DOI: 10.4258/hir.2025.31.4.416
Choonghan Kim, Seonhee Cho, Joo Heung Yoon

Objectives: Large language models (LLMs) are increasingly used in clinical practice, but their performance can deteriorate when radiology reports are incomplete. We evaluated whether multimodal LLMs (integrating text and images) could enhance accuracy and interpretability in chest radiography reports, thereby improving their utility for clinical decision support. Specifically, we aimed to assess the robustness of LLMs in generating accurate impressions from chest radiography reports when provided with incomplete data, and whether multimodal input could mitigate performance loss.

Methods: We analyzed 300 radiology image-report pairs from the MIMIC-CXR database. Three LLMs-OpenFlamingo, MedFlamingo, IDEFICS-were tested in text-only and multimodal formats. Chest X-ray impressions were generated from complete text reports and then regenerated after systematically removing 20%, 50%, and 80% of the text. The effect of adding images was evaluated using chest X-rays, and model performance was compared using three statistical methods. Hallucination rates were quantified.

Results: In the text-only setting, OpenFlamingo, MedFlamingo, and IDEFICS demonstrated comparable performance (ROUGE-L: 0.23 vs. 0.21 vs. 0.21; F1RadGraph: 0.20 vs. 0.16 vs. 0.16; F1CheXbert: 0.49 vs. 0.41 vs. 0.41), with OpenFlamingo performing best on complete text (p < 0.001). All models exhibited performance decline with incomplete data. However, multimodal input significantly improved the performance of MedFlamingo and IDEFICS (p < 0.001), equaling or surpassing OpenFlamingo even under incomplete text conditions. Regarding hallucination, MedFlamingo showed a lower false-negative rate in multimodal compared with unimodal use, while false-positive rates were similar.

Conclusions: LLMs may produce suboptimal outputs when radiology data are incomplete, but multimodal LLMs enhance reliability and may strengthen clinical decision-making support.

目的:大型语言模型(llm)越来越多地应用于临床实践,但当放射学报告不完整时,它们的性能会下降。我们评估了多模态llm(整合文本和图像)是否可以提高胸片报告的准确性和可解释性,从而提高其在临床决策支持中的实用性。具体来说,我们旨在评估llm在提供不完整数据的情况下从胸片报告中产生准确印象的稳健性,以及多模式输入是否可以减轻性能损失。方法:我们分析了来自MIMIC-CXR数据库的300对放射学图像报告。三个LLMs-OpenFlamingo, MedFlamingo, idefics -在纯文本和多模式格式下进行了测试。从完整的文本报告中生成胸部x线印象,然后在系统地去除20%,50%和80%的文本后重新生成。通过胸部x光片评估添加图像的效果,并通过三种统计方法比较模型性能。对幻觉率进行量化。结果:在纯文本设置中,OpenFlamingo、MedFlamingo和IDEFICS表现出相当的性能(ROUGE-L: 0.23 vs. 0.21 vs. 0.21; F1RadGraph: 0.20 vs. 0.16 vs. 0.16; F1CheXbert: 0.49 vs. 0.41 vs. 0.41),其中OpenFlamingo在完整文本上表现最佳(p < 0.001)。所有模型在数据不完整的情况下均表现出性能下降。然而,多模态输入显著提高了MedFlamingo和IDEFICS的性能(p < 0.001),即使在不完整文本条件下,也能赶上或超过OpenFlamingo。关于幻觉,MedFlamingo在多模态使用时比单模态使用时假阴性率低,而假阳性率相似。结论:当放射学数据不完整时,LLMs可能产生次优输出,但多模态LLMs提高了可靠性,并可能加强临床决策支持。
{"title":"Utility of Multimodal Large Language Models in Analyzing Chest X-Rays with Incomplete Contextual Information.","authors":"Choonghan Kim, Seonhee Cho, Joo Heung Yoon","doi":"10.4258/hir.2025.31.4.416","DOIUrl":"10.4258/hir.2025.31.4.416","url":null,"abstract":"<p><strong>Objectives: </strong>Large language models (LLMs) are increasingly used in clinical practice, but their performance can deteriorate when radiology reports are incomplete. We evaluated whether multimodal LLMs (integrating text and images) could enhance accuracy and interpretability in chest radiography reports, thereby improving their utility for clinical decision support. Specifically, we aimed to assess the robustness of LLMs in generating accurate impressions from chest radiography reports when provided with incomplete data, and whether multimodal input could mitigate performance loss.</p><p><strong>Methods: </strong>We analyzed 300 radiology image-report pairs from the MIMIC-CXR database. Three LLMs-OpenFlamingo, MedFlamingo, IDEFICS-were tested in text-only and multimodal formats. Chest X-ray impressions were generated from complete text reports and then regenerated after systematically removing 20%, 50%, and 80% of the text. The effect of adding images was evaluated using chest X-rays, and model performance was compared using three statistical methods. Hallucination rates were quantified.</p><p><strong>Results: </strong>In the text-only setting, OpenFlamingo, MedFlamingo, and IDEFICS demonstrated comparable performance (ROUGE-L: 0.23 vs. 0.21 vs. 0.21; F1RadGraph: 0.20 vs. 0.16 vs. 0.16; F1CheXbert: 0.49 vs. 0.41 vs. 0.41), with OpenFlamingo performing best on complete text (p < 0.001). All models exhibited performance decline with incomplete data. However, multimodal input significantly improved the performance of MedFlamingo and IDEFICS (p < 0.001), equaling or surpassing OpenFlamingo even under incomplete text conditions. Regarding hallucination, MedFlamingo showed a lower false-negative rate in multimodal compared with unimodal use, while false-positive rates were similar.</p><p><strong>Conclusions: </strong>LLMs may produce suboptimal outputs when radiology data are incomplete, but multimodal LLMs enhance reliability and may strengthen clinical decision-making support.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 4","pages":"416-425"},"PeriodicalIF":2.1,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12640722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145563949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of a Mobile Phone Application for Monitoring Cardiovascular Health. 一种监测心血管健康的手机应用程序的开发。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.310
Gilberto Andrade Tavares, Matheus Henrique Costa Xavier, Iara Victoria Dos Santos Moura, Virna Anfrizio Souza, Wictor Hugo de Souza Silva, Renato Brito Dos Santos Júnior, Iris Tarciana de Freitas Cunha, Ellen Natielly Fonseca de Jesus, Adler Teixeira Machado Nissink Costa, José Augusto Soares Barreto-Filho

Objectives: Cardiovascular diseases have been the leading cause of death worldwide. The American Heart Association defined eight metrics for cardiovascular health to reduce mortality. Mobile health tools can support shared clinical decisionmaking, provide tele-monitoring feedback, and improve patient adherence to medication regimens. This work aims to develop and implement the Cardiovascular Health application for mobile phones according to the parameters defined by the American Heart Association.

Methods: A user-centered design approach was employed using the Dart programming language, the Flutter framework, and a Firebase database.

Results: Each ideal parameter is evaluated as "good" when it meets the requirements, earning the patient one mark. Participants' cardiovascular health is subsequently classified as "good," "can be improved," or "needs to be improved," and PDF reports are generated.

Conclusions: The Cardiovascular Health application is built on a strong scientific foundation, given the high prevalence of individuals at risk for cardiovascular disease. It includes all components necessary to assess cardiovascular health and will enable physicians and other healthcare professionals to make more informed decisions regarding patient care.

目的:心血管疾病已成为世界范围内死亡的主要原因。美国心脏协会定义了心血管健康的8项指标,以降低死亡率。移动医疗工具可以支持共享的临床决策,提供远程监测反馈,并提高患者对药物治疗方案的依从性。本工作旨在根据美国心脏协会定义的参数开发和实现手机心血管健康应用程序。方法:采用以用户为中心的设计方法,使用Dart编程语言、Flutter框架和Firebase数据库。结果:每项理想参数满足要求时均被评为“良好”,患者得1分。参与者的心血管健康随后被分类为“良好”、“可以改善”或“需要改善”,并生成PDF报告。结论:鉴于心血管疾病高危人群的高患病率,心血管健康应用建立在强大的科学基础之上。它包括评估心血管健康所需的所有组成部分,并将使医生和其他医疗保健专业人员能够就患者护理做出更明智的决定。
{"title":"Development of a Mobile Phone Application for Monitoring Cardiovascular Health.","authors":"Gilberto Andrade Tavares, Matheus Henrique Costa Xavier, Iara Victoria Dos Santos Moura, Virna Anfrizio Souza, Wictor Hugo de Souza Silva, Renato Brito Dos Santos Júnior, Iris Tarciana de Freitas Cunha, Ellen Natielly Fonseca de Jesus, Adler Teixeira Machado Nissink Costa, José Augusto Soares Barreto-Filho","doi":"10.4258/hir.2025.31.3.310","DOIUrl":"10.4258/hir.2025.31.3.310","url":null,"abstract":"<p><strong>Objectives: </strong>Cardiovascular diseases have been the leading cause of death worldwide. The American Heart Association defined eight metrics for cardiovascular health to reduce mortality. Mobile health tools can support shared clinical decisionmaking, provide tele-monitoring feedback, and improve patient adherence to medication regimens. This work aims to develop and implement the Cardiovascular Health application for mobile phones according to the parameters defined by the American Heart Association.</p><p><strong>Methods: </strong>A user-centered design approach was employed using the Dart programming language, the Flutter framework, and a Firebase database.</p><p><strong>Results: </strong>Each ideal parameter is evaluated as \"good\" when it meets the requirements, earning the patient one mark. Participants' cardiovascular health is subsequently classified as \"good,\" \"can be improved,\" or \"needs to be improved,\" and PDF reports are generated.</p><p><strong>Conclusions: </strong>The Cardiovascular Health application is built on a strong scientific foundation, given the high prevalence of individuals at risk for cardiovascular disease. It includes all components necessary to assess cardiovascular health and will enable physicians and other healthcare professionals to make more informed decisions regarding patient care.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"310-315"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Korea's Bio Big Data Project: Importance and Challenges of Governance and Data Utilization. 韩国生物大数据项目:治理和数据利用的重要性和挑战。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.226
Jae Sun Kim, Dae Un Hong

Objectives: The Korean government has been developing the National Integrated Biological Data Construction Project (NIBDCP) for over a decade, aiming to establish a comprehensive framework for the collection, production, provision, and utilization of biological data. This study examines the project's structure, features, and governance framework to identify key recommendations for successful implementation.

Methods: A systematic analysis of the NIBDCP was conducted, focusing on governance structures, data management protocols, and operational systems. The evaluation emphasized institutional roles, consent requirements, sustainable data production, and researcher accessibility, identifying areas for improvement.

Results: The analysis identified four critical areas requiring enhancement. First, the governance framework should empower the Secretariat to clearly define institutional responsibilities and facilitate inter-agency collaboration. Second, data collection protocols must address broad consent requirements, including provision of adequate information, explicit consent for secondary use, itemized withdrawal options, protection of minors' rights, and improved participant convenience. Third, establishing a systemic and sustainable data production framework is essential, with an emphasis on data quality, standardization, and scalability. Finally, the system for data provision and utilization should enhance researcher accessibility by ensuring data openness, maintaining a unified Institutional Review Board system, and streamlining application and usage processes.

Conclusions: Strengthening governance, upholding ethical standards in data collection, ensuring sustainable data production, and optimizing researcher accessibility are essential for the success of the NIBDCP. These measures will help achieve the project's goals and establish a robust model for biological data governance and utilization in Korea.

目标:十多年来,韩国政府一直在开发国家综合生物数据建设项目(NIBDCP),旨在建立一个收集、生产、提供和利用生物数据的综合框架。本研究考察了项目的结构、特征和治理框架,以确定成功实施的关键建议。方法:对NIBDCP进行系统分析,重点分析治理结构、数据管理协议和操作系统。评估强调了机构作用、同意要求、可持续数据生产和研究人员可及性,确定了需要改进的领域。结果:分析确定了需要加强的四个关键领域。第一,治理框架应使秘书处能够明确界定机构职责,促进机构间协作。其次,数据收集协议必须满足广泛的同意要求,包括提供充分的信息,明确同意二次使用,分项撤回选项,保护未成年人的权利,以及改善参与者的便利性。第三,建立系统和可持续的数据生产框架至关重要,重点是数据质量、标准化和可扩展性。最后,数据提供和利用系统应通过确保数据开放、维护统一的机构审查委员会系统和简化申请和使用流程来提高研究人员的可访问性。结论:加强治理、坚持数据收集中的道德标准、确保可持续的数据生产和优化研究人员的可及性是NIBDCP成功的关键。这些措施将有助于实现该项目的目标,并为韩国的生物数据治理和利用建立一个健全的模式。
{"title":"Korea's Bio Big Data Project: Importance and Challenges of Governance and Data Utilization.","authors":"Jae Sun Kim, Dae Un Hong","doi":"10.4258/hir.2025.31.3.226","DOIUrl":"10.4258/hir.2025.31.3.226","url":null,"abstract":"<p><strong>Objectives: </strong>The Korean government has been developing the National Integrated Biological Data Construction Project (NIBDCP) for over a decade, aiming to establish a comprehensive framework for the collection, production, provision, and utilization of biological data. This study examines the project's structure, features, and governance framework to identify key recommendations for successful implementation.</p><p><strong>Methods: </strong>A systematic analysis of the NIBDCP was conducted, focusing on governance structures, data management protocols, and operational systems. The evaluation emphasized institutional roles, consent requirements, sustainable data production, and researcher accessibility, identifying areas for improvement.</p><p><strong>Results: </strong>The analysis identified four critical areas requiring enhancement. First, the governance framework should empower the Secretariat to clearly define institutional responsibilities and facilitate inter-agency collaboration. Second, data collection protocols must address broad consent requirements, including provision of adequate information, explicit consent for secondary use, itemized withdrawal options, protection of minors' rights, and improved participant convenience. Third, establishing a systemic and sustainable data production framework is essential, with an emphasis on data quality, standardization, and scalability. Finally, the system for data provision and utilization should enhance researcher accessibility by ensuring data openness, maintaining a unified Institutional Review Board system, and streamlining application and usage processes.</p><p><strong>Conclusions: </strong>Strengthening governance, upholding ethical standards in data collection, ensuring sustainable data production, and optimizing researcher accessibility are essential for the success of the NIBDCP. These measures will help achieve the project's goals and establish a robust model for biological data governance and utilization in Korea.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"226-234"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370415/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of the 2025 Spring Conference of the Korean Society of Medical Informatics: AI and Human Collaboration in the Age of Generative AI. 韩国医学信息学学会2025年春季会议综述:生成人工智能时代的人工智能和人类协作。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.215
Jooyun Lee, Younghee Lee, Seo Yeon Baik, Jisan Lee, Seung-Bo Lee, Jungchan Park
{"title":"Review of the 2025 Spring Conference of the Korean Society of Medical Informatics: AI and Human Collaboration in the Age of Generative AI.","authors":"Jooyun Lee, Younghee Lee, Seo Yeon Baik, Jisan Lee, Seung-Bo Lee, Jungchan Park","doi":"10.4258/hir.2025.31.3.215","DOIUrl":"10.4258/hir.2025.31.3.215","url":null,"abstract":"","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"215-217"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Public Perceptions and Barriers to Tuberculosis Treatment in Korea: A Large Language Model-Based Analysis of Naver Knowledge-iN Data from 2002 to 2024. 韩国公众对结核病治疗的认知和障碍:2002年至2024年Naver Knowledge-iN数据的大型语言模型分析。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.263
Hyewon Park, Siho Kim, Gaeun Kim, Seunghyeok Chang, Jae-Gook Shin, Sangzin Ahn

Objectives: This study was conducted to investigate public perceptions and concerns surrounding tuberculosis (TB) treatment in Korea through an analysis of online queries about antitubercular medications. Additionally, it evaluated the effectiveness of large language models (LLMs) as analytical tools for processing unstructured healthcare data.

Methods: Using LLMs, this study analyzed 44,174 questions that mentioned TB from Naver Knowledge-iN (2002-2024). Questions referencing antitubercular medications were extracted and thematically categorized. Side effects were analyzed through parallel approaches examining general and medication-specific effects. Questions about infectivity and social implications were further analyzed using text embedding, dimensionality reduction, and clustering. The performance of LLMs was evaluated against human researchers and traditional methods.

Results: Among questions mentioning specific medications (n = 919), rifampin (31.8%) and isoniazid (31.6%) were most frequently referenced. Of the 10,044 questions regarding antitubercular medication, management challenges represented the largest category (44.8%). Analysis of infectivity and social implications (n = 583) revealed previously unidentified concerns about blood donation and immigration eligibility. Employment-related concerns constituted the largest distinct subgroup (20.6%). Hepatotoxicity, dermatosis, and vomiting were the most frequently reported side effects. LLMs outperformed keyword matching in data processing and offered cost advantages over human analysis, with finetuning further reducing processing costs.

Conclusions: This study produced novel insights into public concerns regarding TB treatment and demonstrated the effectiveness of combining social media platform data with LLM-based analysis, providing a systematic framework for future healthcare research using unstructured public data and LLMs.

目的:本研究旨在通过分析网上关于抗结核药物的查询,调查公众对韩国结核病治疗的看法和关注。此外,它还评估了大型语言模型(llm)作为处理非结构化医疗保健数据的分析工具的有效性。方法:本研究使用LLMs分析了Naver Knowledge-iN(2002-2024)中涉及TB的44174个问题。提取有关抗结核药物的问题并按主题分类。通过检查一般和药物特异性效应的平行方法分析副作用。使用文本嵌入、降维和聚类进一步分析有关传染性和社会影响的问题。对比人类研究者和传统方法对llm的性能进行了评估。结果:在涉及特异性药物的问题中(n = 919),利福平(31.8%)和异烟肼(31.6%)被提及最多。在10044个关于抗结核药物的问题中,管理方面的挑战占了最大的类别(44.8%)。传染性和社会影响分析(n = 583)揭示了之前未被确认的献血和移民资格问题。与就业相关的担忧构成了最大的不同亚组(20.6%)。肝毒性、皮肤病和呕吐是最常见的副作用。llm在数据处理方面优于关键字匹配,并且比人工分析具有成本优势,通过微调进一步降低了处理成本。结论:本研究对公众对结核病治疗的关注产生了新的见解,并证明了将社交媒体平台数据与基于法学硕士的分析相结合的有效性,为未来使用非结构化公共数据和法学硕士的医疗保健研究提供了一个系统框架。
{"title":"Public Perceptions and Barriers to Tuberculosis Treatment in Korea: A Large Language Model-Based Analysis of Naver Knowledge-iN Data from 2002 to 2024.","authors":"Hyewon Park, Siho Kim, Gaeun Kim, Seunghyeok Chang, Jae-Gook Shin, Sangzin Ahn","doi":"10.4258/hir.2025.31.3.263","DOIUrl":"10.4258/hir.2025.31.3.263","url":null,"abstract":"<p><strong>Objectives: </strong>This study was conducted to investigate public perceptions and concerns surrounding tuberculosis (TB) treatment in Korea through an analysis of online queries about antitubercular medications. Additionally, it evaluated the effectiveness of large language models (LLMs) as analytical tools for processing unstructured healthcare data.</p><p><strong>Methods: </strong>Using LLMs, this study analyzed 44,174 questions that mentioned TB from Naver Knowledge-iN (2002-2024). Questions referencing antitubercular medications were extracted and thematically categorized. Side effects were analyzed through parallel approaches examining general and medication-specific effects. Questions about infectivity and social implications were further analyzed using text embedding, dimensionality reduction, and clustering. The performance of LLMs was evaluated against human researchers and traditional methods.</p><p><strong>Results: </strong>Among questions mentioning specific medications (n = 919), rifampin (31.8%) and isoniazid (31.6%) were most frequently referenced. Of the 10,044 questions regarding antitubercular medication, management challenges represented the largest category (44.8%). Analysis of infectivity and social implications (n = 583) revealed previously unidentified concerns about blood donation and immigration eligibility. Employment-related concerns constituted the largest distinct subgroup (20.6%). Hepatotoxicity, dermatosis, and vomiting were the most frequently reported side effects. LLMs outperformed keyword matching in data processing and offered cost advantages over human analysis, with finetuning further reducing processing costs.</p><p><strong>Conclusions: </strong>This study produced novel insights into public concerns regarding TB treatment and demonstrated the effectiveness of combining social media platform data with LLM-based analysis, providing a systematic framework for future healthcare research using unstructured public data and LLMs.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"263-273"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning-Based Age Prediction with Feature Subset Selection from Magnetic Resonance Angiography Data. 基于磁共振血管造影数据特征子集选择的机器学习年龄预测。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.284
Hoon-Seok Yoon, Yoon-Chul Kim

Objectives: The objective of this study was to evaluate the effectiveness of machine learning (ML) models using selected subsets of features to predict age based on intracranial arterial segments' tortuosity and diameter characteristics derived from magnetic resonance angiography (MRA) data. Additionally, this study aimed to identify key vascular features important for predicting vascular age.

Methods: Three-dimensional time-of-flight MRA image data from 171 subjects were analyzed. After annotating the endpoints for each arterial segment, 169 features-comprising tortuosity metrics and arterial segment diameter statistics-were extracted. Five ML models (random forest, linear regression, AdaBoost, XGBoost, and lightGBM) were trained and validated. Two feature selection methods, correlation-based feature selection (CFS) and Relief-F, were applied to identify optimal feature subsets.

Results: The random forest model utilizing the CFS-based 50% feature subset achieved the best performance, with a root mean square error of 14.0 years, a coefficient of determination (R2) of 0.275, and a Pearson correlation coefficient of 0.560. Tortuosity metrics (e.g., triangular index of the left posterior cerebral artery P1 segment) appeared more frequently than diameter statistics among the top five most important features.

Conclusions: CFS-based feature selection enhanced the performance of ML-based age prediction compared with using the complete feature set. Linear regression consistently demonstrated the poorest performance across all evaluation metrics. ML-based age prediction using segmental tortuosity metrics and diameter statistics is feasible, potentially revealing significant features related to vascular aging.

目的:本研究的目的是评估机器学习(ML)模型的有效性,该模型使用选择的特征子集来预测基于磁共振血管造影(MRA)数据得出的颅内动脉段扭曲度和直径特征的年龄。此外,本研究旨在确定预测血管年龄的关键血管特征。方法:对171例受试者的三维飞行时间磁共振成像数据进行分析。在对每个动脉段的端点进行注释后,提取了169个特征,包括扭曲度度量和动脉段直径统计。5个ML模型(随机森林、线性回归、AdaBoost、XGBoost和lightGBM)进行了训练和验证。采用基于相关性的特征选择(CFS)和Relief-F两种特征选择方法来识别最优特征子集。结果:利用基于cfs的50%特征子集的随机森林模型获得了最佳性能,均方根误差为14.0年,决定系数(R2)为0.275,Pearson相关系数为0.560。弯曲度指标(如左侧大脑后动脉P1段三角形指数)在前五个最重要的特征中出现的频率高于直径统计。结论:与使用完整的特征集相比,基于cfs的特征选择增强了基于ml的年龄预测的性能。线性回归始终在所有评估指标中显示最差的性能。利用节段扭曲度和直径统计数据进行基于ml的年龄预测是可行的,有可能揭示与血管衰老相关的重要特征。
{"title":"Machine Learning-Based Age Prediction with Feature Subset Selection from Magnetic Resonance Angiography Data.","authors":"Hoon-Seok Yoon, Yoon-Chul Kim","doi":"10.4258/hir.2025.31.3.284","DOIUrl":"10.4258/hir.2025.31.3.284","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study was to evaluate the effectiveness of machine learning (ML) models using selected subsets of features to predict age based on intracranial arterial segments' tortuosity and diameter characteristics derived from magnetic resonance angiography (MRA) data. Additionally, this study aimed to identify key vascular features important for predicting vascular age.</p><p><strong>Methods: </strong>Three-dimensional time-of-flight MRA image data from 171 subjects were analyzed. After annotating the endpoints for each arterial segment, 169 features-comprising tortuosity metrics and arterial segment diameter statistics-were extracted. Five ML models (random forest, linear regression, AdaBoost, XGBoost, and lightGBM) were trained and validated. Two feature selection methods, correlation-based feature selection (CFS) and Relief-F, were applied to identify optimal feature subsets.</p><p><strong>Results: </strong>The random forest model utilizing the CFS-based 50% feature subset achieved the best performance, with a root mean square error of 14.0 years, a coefficient of determination (R2) of 0.275, and a Pearson correlation coefficient of 0.560. Tortuosity metrics (e.g., triangular index of the left posterior cerebral artery P1 segment) appeared more frequently than diameter statistics among the top five most important features.</p><p><strong>Conclusions: </strong>CFS-based feature selection enhanced the performance of ML-based age prediction compared with using the complete feature set. Linear regression consistently demonstrated the poorest performance across all evaluation metrics. ML-based age prediction using segmental tortuosity metrics and diameter statistics is feasible, potentially revealing significant features related to vascular aging.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"284-294"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370420/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and Evaluation of a Retrieval-Augmented Generation-Based Electronic Medical Record Chatbot System. 基于检索增强代的电子病历聊天机器人系统的开发与评价。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.218
Namrye Son, Inchul Kang, Inhu Kim, Keehyuck Lee, Sejin Nam, Donghyoung Lee

Objectives: This study aimed to develop and evaluate a retrieval-augmented generation (RAG)-based chatbot system designed to optimize hospital operations. By leveraging electronic medical record (EMR) manuals, the system seeks to streamline administrative workflows and enhance healthcare delivery.

Methods: The system integrated fine-tuned multilingual embedding models (Multilingual-E5-Large and BGE-M3) for indexing and retrieving information from EMR manuals. A dataset comprising 5,931 question-document pairs was constructed through query augmentation and validated by domain experts. Fine-tuning was performed using contrastive learning to enhance semantic understanding, with performance assessed using top-k accuracy metrics. The Solar Mini Chat API was adopted for text generation, prioritizing Korean-language responses and cost efficiency.

Results: The fine-tuned models demonstrated marked improvements in retrieval accuracy, with BGE-M3 achieving 97.6% and Multilingual-E5-Large reaching 89.7%. The chatbot achieved high performance, with query latency under 10 ms and robust retrieval precision, effectively addressing operational EMR queries. Key applications included administrative task support and billing process optimization, highlighting its potential to reduce staff workload and enhance healthcare service delivery.

Conclusions: The RAG-based chatbot system successfully addressed critical challenges in healthcare administration, improving EMR usability and operational efficiency. Future research should focus on realworld deployment and longitudinal studies to further evaluate its impact on administrative burden reduction and workflow improvement.

目的:本研究旨在开发和评估基于检索增强生成(RAG)的聊天机器人系统,旨在优化医院运营。通过利用电子病历(EMR)手册,该系统旨在简化管理工作流程并增强医疗保健服务。方法:系统集成了微调多语言嵌入模型(multilingual - e5 - large和BGE-M3),用于EMR手册信息的索引和检索。通过查询增强构建了包含5931对问题-文档的数据集,并由领域专家进行了验证。使用对比学习进行微调以增强语义理解,使用top-k精度指标评估性能。在文本生成方面,采用了“Solar Mini Chat API”,优先考虑了韩国语的响应,并提高了成本效率。结果:调整后的模型在检索准确率上有明显提高,其中BGE-M3达到97.6%,Multilingual-E5-Large达到89.7%。该聊天机器人实现了高性能,查询延迟低于10 ms,检索精度高,有效地解决了操作性EMR查询。关键应用包括管理任务支持和计费流程优化,突出了其减少工作人员工作量和增强医疗保健服务交付的潜力。结论:基于rag的聊天机器人系统成功解决了医疗管理中的关键挑战,提高了EMR的可用性和操作效率。未来的研究应侧重于实际部署和纵向研究,以进一步评估其对减轻行政负担和改进工作流程的影响。
{"title":"Development and Evaluation of a Retrieval-Augmented Generation-Based Electronic Medical Record Chatbot System.","authors":"Namrye Son, Inchul Kang, Inhu Kim, Keehyuck Lee, Sejin Nam, Donghyoung Lee","doi":"10.4258/hir.2025.31.3.218","DOIUrl":"10.4258/hir.2025.31.3.218","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to develop and evaluate a retrieval-augmented generation (RAG)-based chatbot system designed to optimize hospital operations. By leveraging electronic medical record (EMR) manuals, the system seeks to streamline administrative workflows and enhance healthcare delivery.</p><p><strong>Methods: </strong>The system integrated fine-tuned multilingual embedding models (Multilingual-E5-Large and BGE-M3) for indexing and retrieving information from EMR manuals. A dataset comprising 5,931 question-document pairs was constructed through query augmentation and validated by domain experts. Fine-tuning was performed using contrastive learning to enhance semantic understanding, with performance assessed using top-k accuracy metrics. The Solar Mini Chat API was adopted for text generation, prioritizing Korean-language responses and cost efficiency.</p><p><strong>Results: </strong>The fine-tuned models demonstrated marked improvements in retrieval accuracy, with BGE-M3 achieving 97.6% and Multilingual-E5-Large reaching 89.7%. The chatbot achieved high performance, with query latency under 10 ms and robust retrieval precision, effectively addressing operational EMR queries. Key applications included administrative task support and billing process optimization, highlighting its potential to reduce staff workload and enhance healthcare service delivery.</p><p><strong>Conclusions: </strong>The RAG-based chatbot system successfully addressed critical challenges in healthcare administration, improving EMR usability and operational efficiency. Future research should focus on realworld deployment and longitudinal studies to further evaluate its impact on administrative burden reduction and workflow improvement.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"218-225"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370418/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How New Chatbots Can Support Personalized Medicine. 新的聊天机器人如何支持个性化医疗。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.245
Leonardo J Ramírez López, Ana María Campos Mora

Objectives: This study proposes the integration of chatbots into personalized medicine by demonstrating how these tools can support the personalized medicine model. Chatbots can deliver tailored health recommendations, facilitate patient-doctor communication, and provide decision support in clinical settings. The goal is to establish a reference framework aligned with national and international standards for personalized healthcare solutions.

Methods: The chatbot model was developed by reviewing 30 scientific and academic articles focused on artificial intelligence and natural language processing in healthcare. The study analyzed the capabilities of existing healthcare chatbots, particularly their capacity to support personalized medicine through accurate data collection and processing of individual health information.

Results: Key parameters identified for effective chatbot deployment in personalized medicine include user engagement, data accuracy, adaptability, and regulatory compliance. The study established a compliance benchmark of 25% based on current industry standards and application performance. The results indicate that the proposed chatbot model significantly increased the precision and efficacy of personalized medical recommendations, surpassing baseline requirements set by standardization organizations.

Conclusions: This model provides healthcare professionals and patients with a robust framework for utilizing chatbots in personalized medicine, focusing on improved patient outcomes and engagement. The research identifies a gap in the application of artificial intelligence-driven tools in personalized healthcare and suggests strategic directions for future innovations. Implementing this model aims to bridge this gap, offering a standardized approach to developing chatbots that support personalized medicine.

目的:本研究通过展示这些工具如何支持个性化医疗模型,提出将聊天机器人集成到个性化医疗中。聊天机器人可以提供量身定制的健康建议,促进医患沟通,并在临床环境中提供决策支持。目标是建立一个与个性化医疗保健解决方案的国家和国际标准一致的参考框架。方法:通过回顾30篇关于医疗保健领域人工智能和自然语言处理的科学和学术文章,建立聊天机器人模型。该研究分析了现有医疗聊天机器人的能力,特别是它们通过准确的数据收集和处理个人健康信息来支持个性化医疗的能力。结果:在个性化医疗中有效部署聊天机器人的关键参数包括用户参与度、数据准确性、适应性和法规遵从性。该研究根据当前的行业标准和应用性能建立了25%的合规基准。结果表明,所提出的聊天机器人模型显著提高了个性化医疗建议的精度和有效性,超过了标准化组织设定的基线要求。结论:该模型为医疗保健专业人员和患者提供了一个强大的框架,可以在个性化医疗中使用聊天机器人,重点是改善患者的治疗效果和参与度。该研究发现了人工智能驱动工具在个性化医疗保健应用中的差距,并为未来的创新提出了战略方向。实现这个模型的目的是弥合这一差距,为开发支持个性化医疗的聊天机器人提供一种标准化的方法。
{"title":"How New Chatbots Can Support Personalized Medicine.","authors":"Leonardo J Ramírez López, Ana María Campos Mora","doi":"10.4258/hir.2025.31.3.245","DOIUrl":"10.4258/hir.2025.31.3.245","url":null,"abstract":"<p><strong>Objectives: </strong>This study proposes the integration of chatbots into personalized medicine by demonstrating how these tools can support the personalized medicine model. Chatbots can deliver tailored health recommendations, facilitate patient-doctor communication, and provide decision support in clinical settings. The goal is to establish a reference framework aligned with national and international standards for personalized healthcare solutions.</p><p><strong>Methods: </strong>The chatbot model was developed by reviewing 30 scientific and academic articles focused on artificial intelligence and natural language processing in healthcare. The study analyzed the capabilities of existing healthcare chatbots, particularly their capacity to support personalized medicine through accurate data collection and processing of individual health information.</p><p><strong>Results: </strong>Key parameters identified for effective chatbot deployment in personalized medicine include user engagement, data accuracy, adaptability, and regulatory compliance. The study established a compliance benchmark of 25% based on current industry standards and application performance. The results indicate that the proposed chatbot model significantly increased the precision and efficacy of personalized medical recommendations, surpassing baseline requirements set by standardization organizations.</p><p><strong>Conclusions: </strong>This model provides healthcare professionals and patients with a robust framework for utilizing chatbots in personalized medicine, focusing on improved patient outcomes and engagement. The research identifies a gap in the application of artificial intelligence-driven tools in personalized healthcare and suggests strategic directions for future innovations. Implementing this model aims to bridge this gap, offering a standardized approach to developing chatbots that support personalized medicine.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"245-252"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370424/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in Parkinson's Disease Prediction Using Machine Learning: A Neurological Perspective. 用机器学习预测帕金森病的进展:神经学的视角。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.274
Aravalli Sainath Chaithanya, Nadipudi Kiran Kumar, Gugulothu Venkatesh Prasad, Bejawada Keerthana

Objectives: This study aims to predict the severity of Parkinson's disease (PD) by leveraging a comprehensive dataset integrating cerebrospinal fluid protein and peptide data sourced from UniProt, normalized protein expression metrics, clinical assessments, and gait data. The dataset comprised 248 PD patients monitored longitudinally, with periodic evaluations including 227 proteins, 971 peptides, gait parameters, and Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) scores at baseline 0, 6, 12, and 24 months.

Methods: A multifaceted machine learning framework was employed, consisting of random forest, TensorFlow decision forests, and a custom-developed phaseshift ensembling model. Additionally, regression techniques such as linear regression, random forest regressor, decision tree regressor, and K-nearest neighbors were utilized to support the predictions. These models aimed to forecast PD severity as reflected by UPDRS scores.

Results: The custom phase-shift ensembling model demonstrated superior predictive performance, achieving an average symmetric mean absolute percentage error (sMAPE) of 55 across all UPDRS sections. Notably, the random forest regressor excelled in predicting motor function severity (UPDRS-III), attaining an sMAPE of 77.32, indicating its ability to model complex disease progression dynamics effectively.

Conclusions: Integrating biological markers, clinical scores, and gait dynamics facilitates accurate modeling of PD progression. The ensemble-based approach, particularly phase-shift ensembling, improves prediction robustness and interpretability, offering a powerful strategy for the early prediction of PD severity. This study highlights the value of multi-source data fusion and advanced machine learning techniques in supporting early diagnosis and informed treatment planning for neurodegenerative diseases.

目的:本研究旨在通过综合来自UniProt的脑脊液蛋白和肽数据、标准化蛋白表达指标、临床评估和步态数据的综合数据集来预测帕金森病(PD)的严重程度。该数据集包括248名PD患者的纵向监测,定期评估包括227种蛋白质、971种肽、步态参数和运动障碍协会赞助的统一帕金森病评定量表(MDS-UPDRS)评分,基线为0、6、12和24个月。方法:采用多层面的机器学习框架,包括随机森林、TensorFlow决策森林和定制开发的移相集成模型。此外,还利用线性回归、随机森林回归、决策树回归和k近邻回归等回归技术来支持预测。这些模型旨在通过UPDRS评分来预测PD的严重程度。结果:自定义相移集成模型显示出优越的预测性能,在所有UPDRS部分中实现了平均对称平均绝对百分比误差(sMAPE)为55。值得注意的是,随机森林回归器在预测运动功能严重程度(UPDRS-III)方面表现出色,达到77.32的sMAPE,表明其能够有效地模拟复杂的疾病进展动态。结论:整合生物标志物、临床评分和步态动力学有助于PD进展的准确建模。基于集成的方法,特别是相移集成,提高了预测的鲁棒性和可解释性,为PD严重程度的早期预测提供了强有力的策略。本研究强调了多源数据融合和先进的机器学习技术在支持神经退行性疾病的早期诊断和知情治疗计划方面的价值。
{"title":"Advancements in Parkinson's Disease Prediction Using Machine Learning: A Neurological Perspective.","authors":"Aravalli Sainath Chaithanya, Nadipudi Kiran Kumar, Gugulothu Venkatesh Prasad, Bejawada Keerthana","doi":"10.4258/hir.2025.31.3.274","DOIUrl":"10.4258/hir.2025.31.3.274","url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to predict the severity of Parkinson's disease (PD) by leveraging a comprehensive dataset integrating cerebrospinal fluid protein and peptide data sourced from UniProt, normalized protein expression metrics, clinical assessments, and gait data. The dataset comprised 248 PD patients monitored longitudinally, with periodic evaluations including 227 proteins, 971 peptides, gait parameters, and Movement Disorder Society-sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) scores at baseline 0, 6, 12, and 24 months.</p><p><strong>Methods: </strong>A multifaceted machine learning framework was employed, consisting of random forest, TensorFlow decision forests, and a custom-developed phaseshift ensembling model. Additionally, regression techniques such as linear regression, random forest regressor, decision tree regressor, and K-nearest neighbors were utilized to support the predictions. These models aimed to forecast PD severity as reflected by UPDRS scores.</p><p><strong>Results: </strong>The custom phase-shift ensembling model demonstrated superior predictive performance, achieving an average symmetric mean absolute percentage error (sMAPE) of 55 across all UPDRS sections. Notably, the random forest regressor excelled in predicting motor function severity (UPDRS-III), attaining an sMAPE of 77.32, indicating its ability to model complex disease progression dynamics effectively.</p><p><strong>Conclusions: </strong>Integrating biological markers, clinical scores, and gait dynamics facilitates accurate modeling of PD progression. The ensemble-based approach, particularly phase-shift ensembling, improves prediction robustness and interpretability, offering a powerful strategy for the early prediction of PD severity. This study highlights the value of multi-source data fusion and advanced machine learning techniques in supporting early diagnosis and informed treatment planning for neurodegenerative diseases.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"274-283"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling. 语境学习与大语言模型:一个简单而有效的方法来提高放射学报告标签。
IF 2.1 Q3 MEDICAL INFORMATICS Pub Date : 2025-07-01 Epub Date: 2025-07-31 DOI: 10.4258/hir.2025.31.3.295
Songsoo Kim, Donghyun Kim, Jaewoong Kim, Jalim Koo, Jinsik Yoon, Dukyong Yoon

Objectives: This study assessed the effectiveness of in-context learning using Generative Pre-trained Transformer-4 (GPT-4) for labeling radiology reports.

Methods: In this retrospective study, radiology reports were obtained from the Medical Information Mart for Intensive Care III database. Two structured prompts-the "basic prompt" and the "in-context prompt"- were compared. An optimization experiment was conducted to assess consistency and the occurrence of output format errors. The primary labeling experiments were performed on 200 unseen head computed tomography (CT) reports for multilabel classification of predefined labels (Experiment 1) and on 400 unseen abdominal CT reports for multi-label classification of actionable findings (Experiment 2).

Results: The inter-reader accuracies in Experiments 1 and 2 were 0.93 and 0.84, respectively. For multi-label classification of head CT reports (Experiment 1), the in-context prompt led to notable increases in F1-scores for the "foreign body" and "mass" labels (gains of 0.66 and 0.22, respectively). However, improvements for other labels were modest. In multi-label classification of abdominal CT reports (Experiment 2), in-context prompts produced substantial improvements in F1-scores across all labels compared to basic prompts. Providing context equipped the model with domain-specific knowledge and helped align its existing knowledge, thereby improving performance.

Conclusions: Incontext learning with GPT-4 consistently improved performance in labeling radiology reports. This approach is particularly effective for subjective labeling tasks and allows the model to align its criteria with those of human annotators for objective labeling. This practical strategy offers a simple, adaptable, and researcher-oriented method that can be applied to diverse labeling tasks.

目的:本研究评估了使用生成式预训练转换器-4 (GPT-4)标记放射学报告的上下文学习的有效性。方法:在这项回顾性研究中,从重症监护医学信息市场III数据库中获得放射学报告。比较了两种结构化提示——“基本提示”和“上下文提示”。进行了优化实验,以评估一致性和输出格式错误的发生情况。对200份未见的头部计算机断层扫描(CT)报告进行初步标记实验,对预定义标签进行多标签分类(实验1),对400份未见的腹部CT报告进行多标签分类,对可操作的发现进行多标签分类(实验2)。结果:实验1和实验2的读间准确度分别为0.93和0.84。对于头部CT报告的多标签分类(实验1),上下文提示导致“异物”和“质量”标签的f1分数显著增加(分别增加0.66和0.22)。然而,其他品牌的改善幅度不大。在腹部CT报告的多标签分类中(实验2),与基本提示相比,上下文提示在所有标签上都产生了显著的f1分数提高。提供上下文为模型配备了特定于领域的知识,并帮助对齐其现有的知识,从而提高了性能。结论:使用GPT-4的情境学习持续提高了标记放射学报告的表现。这种方法对于主观标注任务特别有效,并且允许模型将其标准与用于客观标注的人类注释者的标准保持一致。这个实用的策略提供了一个简单的,适应性强的,面向研究人员的方法,可以应用于不同的标签任务。
{"title":"In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.","authors":"Songsoo Kim, Donghyun Kim, Jaewoong Kim, Jalim Koo, Jinsik Yoon, Dukyong Yoon","doi":"10.4258/hir.2025.31.3.295","DOIUrl":"10.4258/hir.2025.31.3.295","url":null,"abstract":"<p><strong>Objectives: </strong>This study assessed the effectiveness of in-context learning using Generative Pre-trained Transformer-4 (GPT-4) for labeling radiology reports.</p><p><strong>Methods: </strong>In this retrospective study, radiology reports were obtained from the Medical Information Mart for Intensive Care III database. Two structured prompts-the \"basic prompt\" and the \"in-context prompt\"- were compared. An optimization experiment was conducted to assess consistency and the occurrence of output format errors. The primary labeling experiments were performed on 200 unseen head computed tomography (CT) reports for multilabel classification of predefined labels (Experiment 1) and on 400 unseen abdominal CT reports for multi-label classification of actionable findings (Experiment 2).</p><p><strong>Results: </strong>The inter-reader accuracies in Experiments 1 and 2 were 0.93 and 0.84, respectively. For multi-label classification of head CT reports (Experiment 1), the in-context prompt led to notable increases in F1-scores for the \"foreign body\" and \"mass\" labels (gains of 0.66 and 0.22, respectively). However, improvements for other labels were modest. In multi-label classification of abdominal CT reports (Experiment 2), in-context prompts produced substantial improvements in F1-scores across all labels compared to basic prompts. Providing context equipped the model with domain-specific knowledge and helped align its existing knowledge, thereby improving performance.</p><p><strong>Conclusions: </strong>Incontext learning with GPT-4 consistently improved performance in labeling radiology reports. This approach is particularly effective for subjective labeling tasks and allows the model to align its criteria with those of human annotators for objective labeling. This practical strategy offers a simple, adaptable, and researcher-oriented method that can be applied to diverse labeling tasks.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 3","pages":"295-309"},"PeriodicalIF":2.1,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12370419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144951774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Healthcare Informatics Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1