Pub Date : 2026-02-01Epub Date: 2025-04-21DOI: 10.1007/s10278-025-01510-w
Iryna Hartsock, Cyrillo Araujo, Les Folio, Ghulam Rasool
Radiology reports are often lengthy and unstructured, posing challenges for referring physicians to quickly identify critical imaging findings while increasing risk of missed information. This retrospective study aimed to enhance radiology reports by making them concise and well-structured, with findings organized by relevant organs. To achieve this, we utilized private large language models (LLMs) deployed locally within our institution's firewall, ensuring data security and minimizing computational costs. Using a dataset of 814 radiology reports from seven board-certified body radiologists at [-blinded for review-], we tested five prompting strategies within the LangChain framework. After evaluating several models, the Mixtral LLM demonstrated superior adherence to formatting requirements compared to alternatives like Llama. The optimal strategy involved condensing reports first and then applying structured formatting based on specific instructions, reducing verbosity while improving clarity. Across all radiologists and reports, the Mixtral LLM reduced redundant word counts by more than 53%. These findings highlight the potential of locally deployed, open-source LLMs to streamline radiology reporting. By generating concise, well-structured reports, these models enhance information retrieval and better meet the needs of referring physicians, ultimately improving clinical workflows.
{"title":"Improving Radiology Report Conciseness and Structure via Local Large Language Models.","authors":"Iryna Hartsock, Cyrillo Araujo, Les Folio, Ghulam Rasool","doi":"10.1007/s10278-025-01510-w","DOIUrl":"10.1007/s10278-025-01510-w","url":null,"abstract":"<p><p>Radiology reports are often lengthy and unstructured, posing challenges for referring physicians to quickly identify critical imaging findings while increasing risk of missed information. This retrospective study aimed to enhance radiology reports by making them concise and well-structured, with findings organized by relevant organs. To achieve this, we utilized private large language models (LLMs) deployed locally within our institution's firewall, ensuring data security and minimizing computational costs. Using a dataset of 814 radiology reports from seven board-certified body radiologists at [-blinded for review-], we tested five prompting strategies within the LangChain framework. After evaluating several models, the Mixtral LLM demonstrated superior adherence to formatting requirements compared to alternatives like Llama. The optimal strategy involved condensing reports first and then applying structured formatting based on specific instructions, reducing verbosity while improving clarity. Across all radiologists and reports, the Mixtral LLM reduced redundant word counts by more than 53%. These findings highlight the potential of locally deployed, open-source LLMs to streamline radiology reporting. By generating concise, well-structured reports, these models enhance information retrieval and better meet the needs of referring physicians, ultimately improving clinical workflows.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"1005-1016"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-05-06DOI: 10.1007/s10278-025-01525-3
Yun Bai, Zi-Chen An, Lian-Fang Du, Fan Li, Ying-Yu Cai
The purpose of this study is to assess the ability of deep learning models to classify different subtypes of solid renal parenchymal tumors using contrast-enhanced ultrasound (CEUS) images and to compare their classification performance. A retrospective study was conducted using CEUS images of 237 kidney tumors, including 46 angiomyolipomas (AML), 118 clear cell renal cell carcinomas (ccRCC), 48 papillary RCCs (pRCC), and 25 chromophobe RCCs (chRCC), collected from January 2017 to December 2019. Two deep learning models, based on the ResNet-18 and RepVGG architectures, were trained and validated to distinguish between these subtypes. The models' performance was assessed using sensitivity, specificity, positive predictive value, negative predictive value, F1 score, Matthews correlation coefficient, accuracy, area under the receiver operating characteristic curve (AUC), and confusion matrix analysis. Class activation mapping (CAM) was applied to visualize the specific regions that contributed to the models' predictions. The ResNet-18 and RepVGG-A0 models achieved an overall accuracy of 76.7% and 84.5% across all four subtypes. The AUCs for AML, ccRCC, pRCC, and chRCC were 0.832, 0.829, 0.806, and 0.795 for the ResNet-18 model, compared to 0.906, 0.911, 0.840, and 0.827 for the RepVGG-A0 model, respectively. The deep learning models could reliably differentiate between various histological subtypes of renal tumors using CEUS images in an objective and non-invasive manner.
{"title":"Deep Learning for Classification of Solid Renal Parenchymal Tumors Using Contrast-Enhanced Ultrasound.","authors":"Yun Bai, Zi-Chen An, Lian-Fang Du, Fan Li, Ying-Yu Cai","doi":"10.1007/s10278-025-01525-3","DOIUrl":"10.1007/s10278-025-01525-3","url":null,"abstract":"<p><p>The purpose of this study is to assess the ability of deep learning models to classify different subtypes of solid renal parenchymal tumors using contrast-enhanced ultrasound (CEUS) images and to compare their classification performance. A retrospective study was conducted using CEUS images of 237 kidney tumors, including 46 angiomyolipomas (AML), 118 clear cell renal cell carcinomas (ccRCC), 48 papillary RCCs (pRCC), and 25 chromophobe RCCs (chRCC), collected from January 2017 to December 2019. Two deep learning models, based on the ResNet-18 and RepVGG architectures, were trained and validated to distinguish between these subtypes. The models' performance was assessed using sensitivity, specificity, positive predictive value, negative predictive value, F1 score, Matthews correlation coefficient, accuracy, area under the receiver operating characteristic curve (AUC), and confusion matrix analysis. Class activation mapping (CAM) was applied to visualize the specific regions that contributed to the models' predictions. The ResNet-18 and RepVGG-A0 models achieved an overall accuracy of 76.7% and 84.5% across all four subtypes. The AUCs for AML, ccRCC, pRCC, and chRCC were 0.832, 0.829, 0.806, and 0.795 for the ResNet-18 model, compared to 0.906, 0.911, 0.840, and 0.827 for the RepVGG-A0 model, respectively. The deep learning models could reliably differentiate between various histological subtypes of renal tumors using CEUS images in an objective and non-invasive manner.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"277-285"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920943/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-05-05DOI: 10.1007/s10278-025-01505-7
Bardia Khosravi, Theo Dapamede, Frank Li, Zvipo Chisango, Anirudh Bikmal, Sara Garg, Babajide Owosela, Amirali Khosravi, Mohammadreza Chavoshi, Hari M Trivedi, Cody C Wyles, Saptarshi Purkayastha, Bradley J Erickson, Judy W Gichoya
Extracting accurate labels from radiology reports is essential for training medical image analysis models. Large language models (LLMs) show promise for automating this process. The purpose of this study is to evaluate how model size and prompting strategies affect label extraction accuracy and downstream performance in open-source LLMs. Three open-source LLMs (Llama-3, Phi-3 mini, and Zephyr-beta) were used to extract labels from 227,827 MIMIC-CXR radiology reports. Performance was evaluated against human annotations on 2000 MIMIC-CXR reports, and through training image classifiers for pneumothorax and rib fracture detection tested on the CANDID-PTX dataset (n = 19,237). LLM-based labeling outperformed the CheXpert labeler, with the best LLM achieving 95% sensitivity for fracture detection versus CheXpert's 51%. Larger models showed better sensitivity, while chain-of-thought prompting had variable effects. Image classifiers showed resilience to labeling noise when tested externally. The choice of test set labeling schema significantly affected reported performance-a classifier trained on Llama-3 with chain-of-thought labels achieved AUCs of 0.96 and 0.84 for pneumothorax and fracture detection respectively when evaluated against human annotations, compared to 0.91 and 0.73 when evaluated on CheXpert labels. Open-source LLMs effectively extract labels from radiology reports at scale. While larger pre-trained models generally perform better, the choice of model size and prompting strategy should be task specific. Careful consideration of evaluation methods is critical for interpreting classifier performance.
{"title":"Role of Model Size and Prompting Strategies in Extracting Labels from Free-Text Radiology Reports with Open-Source Large Language Models.","authors":"Bardia Khosravi, Theo Dapamede, Frank Li, Zvipo Chisango, Anirudh Bikmal, Sara Garg, Babajide Owosela, Amirali Khosravi, Mohammadreza Chavoshi, Hari M Trivedi, Cody C Wyles, Saptarshi Purkayastha, Bradley J Erickson, Judy W Gichoya","doi":"10.1007/s10278-025-01505-7","DOIUrl":"10.1007/s10278-025-01505-7","url":null,"abstract":"<p><p>Extracting accurate labels from radiology reports is essential for training medical image analysis models. Large language models (LLMs) show promise for automating this process. The purpose of this study is to evaluate how model size and prompting strategies affect label extraction accuracy and downstream performance in open-source LLMs. Three open-source LLMs (Llama-3, Phi-3 mini, and Zephyr-beta) were used to extract labels from 227,827 MIMIC-CXR radiology reports. Performance was evaluated against human annotations on 2000 MIMIC-CXR reports, and through training image classifiers for pneumothorax and rib fracture detection tested on the CANDID-PTX dataset (n = 19,237). LLM-based labeling outperformed the CheXpert labeler, with the best LLM achieving 95% sensitivity for fracture detection versus CheXpert's 51%. Larger models showed better sensitivity, while chain-of-thought prompting had variable effects. Image classifiers showed resilience to labeling noise when tested externally. The choice of test set labeling schema significantly affected reported performance-a classifier trained on Llama-3 with chain-of-thought labels achieved AUCs of 0.96 and 0.84 for pneumothorax and fracture detection respectively when evaluated against human annotations, compared to 0.91 and 0.73 when evaluated on CheXpert labels. Open-source LLMs effectively extract labels from radiology reports at scale. While larger pre-trained models generally perform better, the choice of model size and prompting strategy should be task specific. Careful consideration of evaluation methods is critical for interpreting classifier performance.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"995-1004"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920854/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-04-25DOI: 10.1007/s10278-025-01506-6
Can Berk Biret, Sukru Gurbuz, Erhan Akbal, Mehmet Baygin, Evren Ekingen, Serdar Derya, I Okan Yıldırım, Ilknur Sercek, Sengul Dogan, Turker Tuncer
The main aim of this study is to introduce a new hybrid deep learning model for biomedical image classification. We propose a novel convolutional neural network (CNN), named HybridNeXt, for detecting pulmonary embolism (PE) from computed tomography (CT) images. To evaluate the HybridNeXt model, we created a new dataset consisting of two classes: (1) PE and (2) control. The HybridNeXt architecture combines different advanced CNN blocks, including MobileNet, ResNet, ConvNeXt, and Swin Transformer. We specifically designed this model to combine the strengths of these well-known CNNs. The architecture also includes stem, downsampling, and output stages. By adjusting the parameters, we developed a lightweight version of HybridNeXt, suitable for clinical use. To further improve the classification performance and demonstrate transfer learning capability, we proposed a deep feature engineering (DFE) method using a multilevel discrete wavelet transform (MDWT). This DFE model has three main phases: (i) feature extraction from raw images and wavelet bands, (ii) feature selection using iterative neighborhood component analysis (INCA), and (iii) classification using a k-nearest neighbors (kNN) classifier. We first trained HybridNeXt on the training images, creating a pretrained HybridNeXt model. Then, using this pretrained model, we extracted features and applied the proposed DFE method for classification. The HybridNeXt model achieved a test accuracy of 90.14%, while our DFE model improved accuracy to 96.35%. Overall, the results confirm that our HybridNeXt architecture is highly accurate and effective for biomedical image classification. The presented HybridNeXt and HybridNeXt-based DFE methods can potentially be applied to other image classification tasks.
{"title":"Advancing Pulmonary Embolism Detection with Integrated Deep Learning Architectures.","authors":"Can Berk Biret, Sukru Gurbuz, Erhan Akbal, Mehmet Baygin, Evren Ekingen, Serdar Derya, I Okan Yıldırım, Ilknur Sercek, Sengul Dogan, Turker Tuncer","doi":"10.1007/s10278-025-01506-6","DOIUrl":"10.1007/s10278-025-01506-6","url":null,"abstract":"<p><p>The main aim of this study is to introduce a new hybrid deep learning model for biomedical image classification. We propose a novel convolutional neural network (CNN), named HybridNeXt, for detecting pulmonary embolism (PE) from computed tomography (CT) images. To evaluate the HybridNeXt model, we created a new dataset consisting of two classes: (1) PE and (2) control. The HybridNeXt architecture combines different advanced CNN blocks, including MobileNet, ResNet, ConvNeXt, and Swin Transformer. We specifically designed this model to combine the strengths of these well-known CNNs. The architecture also includes stem, downsampling, and output stages. By adjusting the parameters, we developed a lightweight version of HybridNeXt, suitable for clinical use. To further improve the classification performance and demonstrate transfer learning capability, we proposed a deep feature engineering (DFE) method using a multilevel discrete wavelet transform (MDWT). This DFE model has three main phases: (i) feature extraction from raw images and wavelet bands, (ii) feature selection using iterative neighborhood component analysis (INCA), and (iii) classification using a k-nearest neighbors (kNN) classifier. We first trained HybridNeXt on the training images, creating a pretrained HybridNeXt model. Then, using this pretrained model, we extracted features and applied the proposed DFE method for classification. The HybridNeXt model achieved a test accuracy of 90.14%, while our DFE model improved accuracy to 96.35%. Overall, the results confirm that our HybridNeXt architecture is highly accurate and effective for biomedical image classification. The presented HybridNeXt and HybridNeXt-based DFE methods can potentially be applied to other image classification tasks.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"186-201"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-04-11DOI: 10.1007/s10278-025-01494-7
Mayanka Chandrashekar, Ian Goethert, Md Inzamam Ul Haque, Benjamin McMahon, Sayera Dhaubhadel, Kathryn Knight, Joseph Erdos, Donna Reagan, Caroline Taylor, Peter Kuzmak, John Michael Gaziano, Eileen McAllister, Lauren Costa, Yuk-Lam Ho, Kelly Cho, Suzanne Tamang, Samah Fodeh-Jarad, Olga S Ovchinnikova, Amy C Justice, Jacob Hinkle, Ioana Danciu
This study aims to assess the impact of domain shift on chest X-ray classification accuracy and to analyze the influence of ground truth label quality and demographic factors such as age group, sex, and study year. We used a DenseNet121 model pre-trained MIMIC-CXR dataset for deep learning-based multi-label classification using ground truth labels from radiology reports extracted using the CheXpert and CheXbert Labeler. We compared the performance of the 14 chest X-ray labels on the MIMIC-CXR and Veterans Healthcare Administration chest X-ray dataset (VA-CXR). The validation of ground truth and the assessment of multi-label classification performance across various NLP extraction tools revealed that the VA-CXR dataset exhibited lower disagreement rates than the MIMIC-CXR datasets. Additionally, there were notable differences in AUC scores between models utilizing CheXpert and CheXbert. When evaluating multi-label classification performance across different datasets, minimal domain shift was observed in the unseen VA dataset, except for the label "Enlarged Cardiomediastinum." The subgroup with the most significant variations in multi-label classification performance was study year. These findings underscore the importance of considering domain shift in chest X-ray classification tasks, paying particular attention to the temporality of the exam. Our study reveals the significant impact of domain shift and demographic factors on chest X-ray classification, emphasizing the need for improved transfer learning and robust model development. Addressing these challenges is crucial for advancing medical imaging research and improving patient care.
{"title":"Domain Shift Analysis in Chest Radiographs Classification in a Veterans Healthcare Administration Population.","authors":"Mayanka Chandrashekar, Ian Goethert, Md Inzamam Ul Haque, Benjamin McMahon, Sayera Dhaubhadel, Kathryn Knight, Joseph Erdos, Donna Reagan, Caroline Taylor, Peter Kuzmak, John Michael Gaziano, Eileen McAllister, Lauren Costa, Yuk-Lam Ho, Kelly Cho, Suzanne Tamang, Samah Fodeh-Jarad, Olga S Ovchinnikova, Amy C Justice, Jacob Hinkle, Ioana Danciu","doi":"10.1007/s10278-025-01494-7","DOIUrl":"10.1007/s10278-025-01494-7","url":null,"abstract":"<p><p>This study aims to assess the impact of domain shift on chest X-ray classification accuracy and to analyze the influence of ground truth label quality and demographic factors such as age group, sex, and study year. We used a DenseNet121 model pre-trained MIMIC-CXR dataset for deep learning-based multi-label classification using ground truth labels from radiology reports extracted using the CheXpert and CheXbert Labeler. We compared the performance of the 14 chest X-ray labels on the MIMIC-CXR and Veterans Healthcare Administration chest X-ray dataset (VA-CXR). The validation of ground truth and the assessment of multi-label classification performance across various NLP extraction tools revealed that the VA-CXR dataset exhibited lower disagreement rates than the MIMIC-CXR datasets. Additionally, there were notable differences in AUC scores between models utilizing CheXpert and CheXbert. When evaluating multi-label classification performance across different datasets, minimal domain shift was observed in the unseen VA dataset, except for the label \"Enlarged Cardiomediastinum.\" The subgroup with the most significant variations in multi-label classification performance was study year. These findings underscore the importance of considering domain shift in chest X-ray classification tasks, paying particular attention to the temporality of the exam. Our study reveals the significant impact of domain shift and demographic factors on chest X-ray classification, emphasizing the need for improved transfer learning and robust model development. Addressing these challenges is crucial for advancing medical imaging research and improving patient care.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"484-499"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144016371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-05-16DOI: 10.1007/s10278-025-01526-2
Mpatisi Lobi, Anne Faith Namugenyi, Oladele Vincent Adeniyi
Despite the investments in the picture archiving and communication system (PACS) in the South African health facilities, it is unclear whether clinicians in the rural tertiary hospitals are using this tool maximally. This study determines the level of knowledge, utilisation, and challenges associated with the use of PACS among doctors in a rural tertiary hospital in the Eastern Cape Province. In this cross-sectional descriptive study, a total of 66 medical doctors drawn from different departments completed a structured questionnaire at the Nelson Mandela Academic Hospital, Mthatha. Relevant items on knowledge and use of PACS, including challenges experienced, were obtained. The mean age of the respondents was 36 (± standard deviation 9.58) years. The majority of the doctors (n = 42; 63.7%) demonstrated moderate to good knowledge of PACS. Similarly, a substantial majority (n = 55; 83.3%) have used PACS for years, for both images and reports (49.2%). The highest proportion of the respondents had at least 1 year of PACS experience (63.5%). Though there was no association between the sociodemographics and level of knowledge, the duration of use (p = 0.025) and frequency of use (p = 0.025) were significantly associated with moderate to good knowledge of PACS. Internet connectivity and mobile PACS were the major challenges identified. The study found moderate to good knowledge of PACS among the final sample of 66 clinicians. A substantial majority of the clinicians had used PACS for years; however, there is considerable room for strengthening and expanding the use of PACS in the study setting.
{"title":"Knowledge, Utilisation, and Challenges of Medical Doctors Using Picture Archiving and Communication Systems at a Tertiary Academic Hospital in the Eastern Cape, South Africa.","authors":"Mpatisi Lobi, Anne Faith Namugenyi, Oladele Vincent Adeniyi","doi":"10.1007/s10278-025-01526-2","DOIUrl":"10.1007/s10278-025-01526-2","url":null,"abstract":"<p><p>Despite the investments in the picture archiving and communication system (PACS) in the South African health facilities, it is unclear whether clinicians in the rural tertiary hospitals are using this tool maximally. This study determines the level of knowledge, utilisation, and challenges associated with the use of PACS among doctors in a rural tertiary hospital in the Eastern Cape Province. In this cross-sectional descriptive study, a total of 66 medical doctors drawn from different departments completed a structured questionnaire at the Nelson Mandela Academic Hospital, Mthatha. Relevant items on knowledge and use of PACS, including challenges experienced, were obtained. The mean age of the respondents was 36 (± standard deviation 9.58) years. The majority of the doctors (n = 42; 63.7%) demonstrated moderate to good knowledge of PACS. Similarly, a substantial majority (n = 55; 83.3%) have used PACS for years, for both images and reports (49.2%). The highest proportion of the respondents had at least 1 year of PACS experience (63.5%). Though there was no association between the sociodemographics and level of knowledge, the duration of use (p = 0.025) and frequency of use (p = 0.025) were significantly associated with moderate to good knowledge of PACS. Internet connectivity and mobile PACS were the major challenges identified. The study found moderate to good knowledge of PACS among the final sample of 66 clinicians. A substantial majority of the clinicians had used PACS for years; however, there is considerable room for strengthening and expanding the use of PACS in the study setting.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"34-45"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144087438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-05-16DOI: 10.1007/s10278-025-01538-y
Yawo M Kobara, Ikpe Justice Akpan, Alima Damipe Nam, Firas H AlMukthar, Mbuotidem Peter
Data scieQuerynce (DS) methods and artificial intelligence (AI) are critical in today's healthcare services operations. This study focuses on evaluating the effectiveness of AI and DS in biomedical diagnostics, including automatic detection and counting of white blood cells (WBCs) and types, which provide valuable information for diagnosing and treating blood diseases such as leukemia. Automating these tasks using AI and DS saves time and avoids or minimizes errors compared to manual processes, which can be complex and error prone. The study utilizes bibliographic data from SCOPUS to evaluate research on applying AI algorithms and DS methods for mapping and classifying WBC images for treatment of blood diseases, such as leukemia using literature survey and science mapping methodology. The results show the potency of different DS methods and AI algorithms, such as machine learning, deep learning, and classification algorithms that enable the automatic detection of WBC images. AI and DS algorithms offer critical benefits in effectively and efficiently analyzing microscopic images of blood cells. The automatic identification, localization, and classification of WBCs speed up the patient diagnosis process, allowing hematologists to focus on interpreting results. Automatic processes identify specific abnormalities and patterns, enhancing accuracy and timely diagnoses. Future work will examine the application of generative AI in blood cells diagnostics.
{"title":"Artificial Intelligence and Data Science Methods for Automatic Detection of White Blood Cells in Images.","authors":"Yawo M Kobara, Ikpe Justice Akpan, Alima Damipe Nam, Firas H AlMukthar, Mbuotidem Peter","doi":"10.1007/s10278-025-01538-y","DOIUrl":"10.1007/s10278-025-01538-y","url":null,"abstract":"<p><p>Data scieQuerynce (DS) methods and artificial intelligence (AI) are critical in today's healthcare services operations. This study focuses on evaluating the effectiveness of AI and DS in biomedical diagnostics, including automatic detection and counting of white blood cells (WBCs) and types, which provide valuable information for diagnosing and treating blood diseases such as leukemia. Automating these tasks using AI and DS saves time and avoids or minimizes errors compared to manual processes, which can be complex and error prone. The study utilizes bibliographic data from SCOPUS to evaluate research on applying AI algorithms and DS methods for mapping and classifying WBC images for treatment of blood diseases, such as leukemia using literature survey and science mapping methodology. The results show the potency of different DS methods and AI algorithms, such as machine learning, deep learning, and classification algorithms that enable the automatic detection of WBC images. AI and DS algorithms offer critical benefits in effectively and efficiently analyzing microscopic images of blood cells. The automatic identification, localization, and classification of WBCs speed up the patient diagnosis process, allowing hematologists to focus on interpreting results. Automatic processes identify specific abnormalities and patterns, enhancing accuracy and timely diagnoses. Future work will examine the application of generative AI in blood cells diagnostics.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"583-603"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144087434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-05-20DOI: 10.1007/s10278-025-01543-1
Declan Ikechukwu Emegano, Mubarak Taiwo Mustapha, Dilber Uzun Ozsahin, Ilker Ozsahin, Berna Uzun
Prostate cancer is the most prevalent solid tumor in males and one of the most common causes of male mortality. It is the most common type of cancer in men, a major global public health issue, and accounts for up to 7.3% of all male cancer diagnoses worldwide. To optimize patient outcomes and ensure therapeutic success, an accurate diagnosis must be made promptly. To achieve this, we focused on using ResNet50, a convolutional neural network (CNN) architecture, to analyze prostate histological images to classify prostate cancer. ResNet50, due to its efficiency in medical image classification, was used to classify the histological images as benign or malignant. In this study, a total of 1276 prostate biopsy images were used on the ResNet50 model. We employed evaluation metrics such as accuracy, precision, recall, and F1 score. The results showed that the ResNet50 model performed excellently with an overall accuracy of 0.98, 1.00 as precision, 0.98 as recall, and 0.97 as F1 score for benign. The malignant histological image has 0.99, 0.98, and 0.97 as precision, recall, and F1 scores. It also recorded a 95% confidence interval (CI) for accuracy as (0.91, 1.00) and a performance gain of 4.26% compared to MobileNet and CNN-RNN. The result of our model was also compared with the state-of-the-art (SOTA) DL models to ensure robustness. This study has demonstrated the potential of the ResNet50 model in the classification of prostate cancer. Again, the clinical integration of the results of this study will aid decision-makers in enhancing patient outcomes.
{"title":"Histopathology-Based Prostate Cancer Classification Using ResNet: A Comprehensive Deep Learning Analysis.","authors":"Declan Ikechukwu Emegano, Mubarak Taiwo Mustapha, Dilber Uzun Ozsahin, Ilker Ozsahin, Berna Uzun","doi":"10.1007/s10278-025-01543-1","DOIUrl":"10.1007/s10278-025-01543-1","url":null,"abstract":"<p><p>Prostate cancer is the most prevalent solid tumor in males and one of the most common causes of male mortality. It is the most common type of cancer in men, a major global public health issue, and accounts for up to 7.3% of all male cancer diagnoses worldwide. To optimize patient outcomes and ensure therapeutic success, an accurate diagnosis must be made promptly. To achieve this, we focused on using ResNet50, a convolutional neural network (CNN) architecture, to analyze prostate histological images to classify prostate cancer. ResNet50, due to its efficiency in medical image classification, was used to classify the histological images as benign or malignant. In this study, a total of 1276 prostate biopsy images were used on the ResNet50 model. We employed evaluation metrics such as accuracy, precision, recall, and F1 score. The results showed that the ResNet50 model performed excellently with an overall accuracy of 0.98, 1.00 as precision, 0.98 as recall, and 0.97 as F1 score for benign. The malignant histological image has 0.99, 0.98, and 0.97 as precision, recall, and F1 scores. It also recorded a 95% confidence interval (CI) for accuracy as (0.91, 1.00) and a performance gain of 4.26% compared to MobileNet and CNN-RNN. The result of our model was also compared with the state-of-the-art (SOTA) DL models to ensure robustness. This study has demonstrated the potential of the ResNet50 model in the classification of prostate cancer. Again, the clinical integration of the results of this study will aid decision-makers in enhancing patient outcomes.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"604-619"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-05-29DOI: 10.1007/s10278-025-01541-3
Peter I Kamel, Florence X Doo, Dharmam Savani, Adway Kanhere, Paul H Yi, Vishwa S Parekh
MRI metadata, particularly free-text series descriptions (SDs) used to identify sequences, are highly heterogeneous due to variable inputs by manufacturers and technologists. This variability poses challenges in correctly identifying series for hanging protocols and dataset curation. The purpose of this study was to evaluate the ability of large language models (LLMs) to automatically classify MRI SDs. We analyzed non-contrast brain MRIs performed between 2016 and 2022 at our institution, identifying all unique SDs in the metadata. A practicing neuroradiologist manually classified the SD text into: "T1," "T2," "T2/FLAIR," "SWI," "DWI," ADC," or "Other." Then, various LLMs, including GPT 3.5 Turbo, GPT-4, GPT-4o, Llama 3 8b, and Llama 3 70b, were asked to classify each SD into one of the sequence categories. Model performances were compared to ground truth classification using area under the curve (AUC) as the primary metric. Additionally, GPT-4o was tasked with generating regular expression templates to match each category. In 2510 MRI brain examinations, there were 1395 unique SDs, with 727/1395 (52.1%) appearing only once, indicating high variability. GPT-4o demonstrated the highest performance, achieving an average AUC of 0.983 ± 0.020 for all series with detailed prompting. GPT models significantly outperformed Llama models, with smaller differences within the GPT family. Regular expression generation was inconsistent, demonstrating an average AUC of 0.774 ± 0.161 for all sequences. Our findings suggest that LLMs are effective for interpreting and standardizing heterogeneous MRI SDs.
{"title":"Standardizing Heterogeneous MRI Series Description Metadata Using Large Language Models.","authors":"Peter I Kamel, Florence X Doo, Dharmam Savani, Adway Kanhere, Paul H Yi, Vishwa S Parekh","doi":"10.1007/s10278-025-01541-3","DOIUrl":"10.1007/s10278-025-01541-3","url":null,"abstract":"<p><p>MRI metadata, particularly free-text series descriptions (SDs) used to identify sequences, are highly heterogeneous due to variable inputs by manufacturers and technologists. This variability poses challenges in correctly identifying series for hanging protocols and dataset curation. The purpose of this study was to evaluate the ability of large language models (LLMs) to automatically classify MRI SDs. We analyzed non-contrast brain MRIs performed between 2016 and 2022 at our institution, identifying all unique SDs in the metadata. A practicing neuroradiologist manually classified the SD text into: \"T1,\" \"T2,\" \"T2/FLAIR,\" \"SWI,\" \"DWI,\" ADC,\" or \"Other.\" Then, various LLMs, including GPT 3.5 Turbo, GPT-4, GPT-4o, Llama 3 8b, and Llama 3 70b, were asked to classify each SD into one of the sequence categories. Model performances were compared to ground truth classification using area under the curve (AUC) as the primary metric. Additionally, GPT-4o was tasked with generating regular expression templates to match each category. In 2510 MRI brain examinations, there were 1395 unique SDs, with 727/1395 (52.1%) appearing only once, indicating high variability. GPT-4o demonstrated the highest performance, achieving an average AUC of 0.983 ± 0.020 for all series with detailed prompting. GPT models significantly outperformed Llama models, with smaller differences within the GPT family. Regular expression generation was inconsistent, demonstrating an average AUC of 0.774 ± 0.161 for all sequences. Our findings suggest that LLMs are effective for interpreting and standardizing heterogeneous MRI SDs.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"962-972"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12921075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144183157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2025-04-15DOI: 10.1007/s10278-025-01500-y
Alisa Mohebbi, Saeed Mohammadzadeh, Afshin Mohammadi, Seyed Mohammad Tavangar
This study aimed to develop and internally validate a prognostic prediction model based on MRI, pathological, and clinical findings to predict breast cancer recurrence and death. A retrospective study prediction model was developed using data from 922 breast cancer patients recruited in Duke University Hospital from January 2000 to March 2014. Cox and binary logistic regressions were implemented for hazard score and 2-, 3-, 5-, and 8-year survivals and recurrences. After assessing the collinearity of predictors, both univariable and multivariable analyses were performed. Qualitative and quantitative MRI variables were selected based on clinical expert opinion and literature review. Bootstrap and leave-one-out methods were used for internal validation. Calibration, shrinkage, time-dependent receiver operating characteristic (ROC) curve, and decision-curve analyses were also performed. Finally, a user-friendly calculator was built. Of included participants, 62 (6.72%) died with a mean patient-year follow-up of 8.89 years (CI = 8.74 to 9.04), while 90 (9.76%) experienced recurrence with mean patient-year follow-up of 8.20 years (CI = 7.92 to 8.48). The Akaike information criterion (AIC) value of survival and recurrence models were 752.9 and 1020.7, indicating a good balance between model complexity and fit. Validation model adjusted area under curve (AUC) in 8-, 5-, 3-, and 2-year survivals were 0.740 (CI = 0.711 to 0.768), 0.741 (CI = 0.712 to 0.770), 0.788 (CI = 0.761 to 0.816), and 0.783 (CI = 0.755 to 0.809), while in 8-, 5-, and 3-year recurrences were 0.678 (CI = 0.647 to 0.708), 0.696 (CI = 0.664 to 0.727), and 0.769 (CI = 0.740 to 0.798), respectively. Good calibration and shrinkage parameters were achieved. The internal validation and decision curve analyses highlighted the usefulness of the model across all probability levels. The combined MRI-pathological-clinical model has excellent performance in predicting overall survival and recurrence of breast cancer and may have a role to play in daily personalized breast cancer therapy.
{"title":"Personalized Breast Cancer Prognosis Using a Model Based on MRI and Clinicopathological Variables.","authors":"Alisa Mohebbi, Saeed Mohammadzadeh, Afshin Mohammadi, Seyed Mohammad Tavangar","doi":"10.1007/s10278-025-01500-y","DOIUrl":"10.1007/s10278-025-01500-y","url":null,"abstract":"<p><p>This study aimed to develop and internally validate a prognostic prediction model based on MRI, pathological, and clinical findings to predict breast cancer recurrence and death. A retrospective study prediction model was developed using data from 922 breast cancer patients recruited in Duke University Hospital from January 2000 to March 2014. Cox and binary logistic regressions were implemented for hazard score and 2-, 3-, 5-, and 8-year survivals and recurrences. After assessing the collinearity of predictors, both univariable and multivariable analyses were performed. Qualitative and quantitative MRI variables were selected based on clinical expert opinion and literature review. Bootstrap and leave-one-out methods were used for internal validation. Calibration, shrinkage, time-dependent receiver operating characteristic (ROC) curve, and decision-curve analyses were also performed. Finally, a user-friendly calculator was built. Of included participants, 62 (6.72%) died with a mean patient-year follow-up of 8.89 years (CI = 8.74 to 9.04), while 90 (9.76%) experienced recurrence with mean patient-year follow-up of 8.20 years (CI = 7.92 to 8.48). The Akaike information criterion (AIC) value of survival and recurrence models were 752.9 and 1020.7, indicating a good balance between model complexity and fit. Validation model adjusted area under curve (AUC) in 8-, 5-, 3-, and 2-year survivals were 0.740 (CI = 0.711 to 0.768), 0.741 (CI = 0.712 to 0.770), 0.788 (CI = 0.761 to 0.816), and 0.783 (CI = 0.755 to 0.809), while in 8-, 5-, and 3-year recurrences were 0.678 (CI = 0.647 to 0.708), 0.696 (CI = 0.664 to 0.727), and 0.769 (CI = 0.740 to 0.798), respectively. Good calibration and shrinkage parameters were achieved. The internal validation and decision curve analyses highlighted the usefulness of the model across all probability levels. The combined MRI-pathological-clinical model has excellent performance in predicting overall survival and recurrence of breast cancer and may have a role to play in daily personalized breast cancer therapy.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"500-517"},"PeriodicalIF":0.0,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144003568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}