Visual Computing for Industry Biomedicine and Art最新文献

Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-04-07 DOI: 10.1186/s42492-025-00185-y

Sebastian Künzel, Tanja Munz-Körner, Pascal Tilli, Noel Schäfer, Sandeep Vidyapu, Ngoc Thang Vu, Daniel Weiskopf

This study presents a novel visualization approach to explainable artificial intelligence for graph-based visual question answering (VQA) systems. The method focuses on identifying false answer predictions by the model and offers users the opportunity to directly correct mistakes in the input space, thus facilitating dataset curation. The decision-making process of the model is demonstrated by highlighting certain internal states of a graph neural network (GNN). The proposed system is built on top of a GraphVQA framework that implements various GNN-based models for VQA trained on the GQA dataset. The authors evaluated their tool through the demonstration of identified use cases, quantitative measures, and a user study conducted with experts from machine learning, visualization, and natural language processing domains. The authors' findings highlight the prominence of their implemented features in supporting the users with incorrect prediction identification and identifying the underlying issues. Additionally, their approach is easily extendable to similar models aiming at graph-based question answering.

本研究为基于图形的可视化问题解答（VQA）系统提出了一种新颖的可视化可解释人工智能方法。该方法的重点是识别模型预测的错误答案，并为用户提供直接纠正输入空间错误的机会，从而促进数据集的整理。通过突出图神经网络（GNN）的某些内部状态，展示了模型的决策过程。提议的系统建立在 GraphVQA 框架之上，该框架实现了在 GQA 数据集上训练的各种基于 GNN 的 VQA 模型。作者通过演示已确定的用例、定量测量以及与机器学习、可视化和自然语言处理领域的专家进行用户研究，对其工具进行了评估。作者的研究结果凸显了他们所实现的功能在支持用户识别错误预测和发现潜在问题方面的突出作用。此外，他们的方法很容易扩展到类似的基于图的问题解答模型。

{"title":"Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation.","authors":"Sebastian Künzel, Tanja Munz-Körner, Pascal Tilli, Noel Schäfer, Sandeep Vidyapu, Ngoc Thang Vu, Daniel Weiskopf","doi":"10.1186/s42492-025-00185-y","DOIUrl":"10.1186/s42492-025-00185-y","url":null,"abstract":"This study presents a novel visualization approach to explainable artificial intelligence for graph-based visual question answering (VQA) systems. The method focuses on identifying false answer predictions by the model and offers users the opportunity to directly correct mistakes in the input space, thus facilitating dataset curation. The decision-making process of the model is demonstrated by highlighting certain internal states of a graph neural network (GNN). The proposed system is built on top of a GraphVQA framework that implements various GNN-based models for VQA trained on the GQA dataset. The authors evaluated their tool through the demonstration of identified use cases, quantitative measures, and a user study conducted with experts from machine learning, visualization, and natural language processing domains. The authors' findings highlight the prominence of their implemented features in supporting the users with incorrect prediction identification and identifying the underlying issues. Additionally, their approach is easily extendable to similar models aiming at graph-based question answering.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"9"},"PeriodicalIF":3.2,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-04-03 DOI: 10.1186/s42492-025-00189-8

Yuxin Liu, Xiang Zhang, Weiwei Cao, Wenju Cui, Tao Tan, Yuqin Peng, Jiayi Huang, Zhen Lei, Jun Shen, Jian Zheng

Breast cancer is one of the most common malignancies among women globally. Magnetic resonance imaging (MRI), as the final non-invasive diagnostic tool before biopsy, provides detailed free-text reports that support clinical decision-making. Therefore, the effective utilization of the information in MRI reports to make reliable decisions is crucial for patient care. This study proposes a novel method for BI-RADS classification using breast MRI reports. Large language models are employed to transform free-text reports into structured reports. Specifically, missing category information (MCI) that is absent in the free-text reports is supplemented by assigning default values to the missing categories in the structured reports. To ensure data privacy, a locally deployed Qwen-Chat model is employed. Furthermore, to enhance the domain-specific adaptability, a knowledge-driven prompt is designed. The Qwen-7B-Chat model is fine-tuned specifically for structuring breast MRI reports. To prevent information loss and enable comprehensive learning of all report details, a fusion strategy is introduced, combining free-text and structured reports to train the classification model. Experimental results show that the proposed BI-RADS classification method outperforms existing report classification methods across multiple evaluation metrics. Furthermore, an external test set from a different hospital is used to validate the robustness of the proposed approach. The proposed structured method surpasses GPT-4o in terms of performance. Ablation experiments confirm that the knowledge-driven prompt, MCI, and the fusion strategy are crucial to the model's performance.

乳腺癌是全球妇女最常见的恶性肿瘤之一。磁共振成像（MRI）作为活组织检查前的最后一种无创诊断工具，可提供详细的自由文本报告，为临床决策提供支持。因此，有效利用磁共振成像报告中的信息做出可靠的决策对患者护理至关重要。本研究提出了一种利用乳腺 MRI 报告进行 BI-RADS 分类的新方法。该方法采用大型语言模型将自由文本报告转化为结构化报告。具体来说，通过为结构化报告中缺失的类别分配默认值，来补充自由文本报告中缺失的类别信息（MCI）。为确保数据隐私，采用了本地部署的 Qwen-Chat 模型。此外，为了增强特定领域的适应性，还设计了一个知识驱动的提示。Qwen-7B-Chat 模型专门针对结构化乳腺 MRI 报告进行了微调。为防止信息丢失并全面学习所有报告细节，引入了一种融合策略，结合自由文本和结构化报告来训练分类模型。实验结果表明，在多个评估指标上，所提出的 BI-RADS 分类方法优于现有的报告分类方法。此外，还使用了来自不同医院的外部测试集来验证所提方法的鲁棒性。所提出的结构化方法在性能上超过了 GPT-4o。消融实验证实，知识驱动的提示、MCI 和融合策略对模型的性能至关重要。

{"title":"Bootstrapping BI-RADS classification using large language models and transformers in breast magnetic resonance imaging reports.","authors":"Yuxin Liu, Xiang Zhang, Weiwei Cao, Wenju Cui, Tao Tan, Yuqin Peng, Jiayi Huang, Zhen Lei, Jun Shen, Jian Zheng","doi":"10.1186/s42492-025-00189-8","DOIUrl":"10.1186/s42492-025-00189-8","url":null,"abstract":"Breast cancer is one of the most common malignancies among women globally. Magnetic resonance imaging (MRI), as the final non-invasive diagnostic tool before biopsy, provides detailed free-text reports that support clinical decision-making. Therefore, the effective utilization of the information in MRI reports to make reliable decisions is crucial for patient care. This study proposes a novel method for BI-RADS classification using breast MRI reports. Large language models are employed to transform free-text reports into structured reports. Specifically, missing category information (MCI) that is absent in the free-text reports is supplemented by assigning default values to the missing categories in the structured reports. To ensure data privacy, a locally deployed Qwen-Chat model is employed. Furthermore, to enhance the domain-specific adaptability, a knowledge-driven prompt is designed. The Qwen-7B-Chat model is fine-tuned specifically for structuring breast MRI reports. To prevent information loss and enable comprehensive learning of all report details, a fusion strategy is introduced, combining free-text and structured reports to train the classification model. Experimental results show that the proposed BI-RADS classification method outperforms existing report classification methods across multiple evaluation metrics. Furthermore, an external test set from a different hospital is used to validate the robustness of the proposed approach. The proposed structured method surpasses GPT-4o in terms of performance. Ablation experiments confirm that the knowledge-driven prompt, MCI, and the fusion strategy are crucial to the model's performance.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"8"},"PeriodicalIF":3.2,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968601/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143774443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nucleus pulposus clamping procedures based on optimized material point method for surgical simulation systems.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-04-01 DOI: 10.1186/s42492-025-00188-9

Jianlong Ni, Jingrong Li, Zhiyuan Xie, Qinghui Wang, Chunhai Li, Haoyu Wu, Yang Zhang

Clamping and removal of the nucleus pulposus (NP) are critical operations during transforaminal endoscopic lumbar discectomy. To meet the challenge of simulating the NP in real-time for better training output, an improved material point method is proposed to represent the physical properties of the NP and compute its deformation in real time. Corresponding volume rendering of the NP and its hosting bones are also presented. The virtual operation procedures are then implemented into a training prototype and subsequently tested through simulation experiments and subjective evaluation. The results have demonstrated the feasibility of the approach.

引用次数: 0

PCRFed: personalized federated learning with contrastive representation for non-independently and identically distributed medical image segmentation.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-03-28 DOI: 10.1186/s42492-025-00191-0

Shengyuan Liu, Ruofan Zhang, Mengjie Fang, Hailin Li, Tianwang Xun, Zipei Wang, Wenting Shang, Jie Tian, Di Dong

Federated learning (FL) has shown great potential in addressing data privacy issues in medical image analysis. However, varying data distributions across different sites can create challenges in aggregating client models and achieving good global model performance. In this study, we propose a novel personalized contrastive representation FL framework, named PCRFed, which leverages contrastive representation learning to address the non-independent and identically distributed (non-IID) challenge and dynamically adjusts the distance between local clients and the global model to improve each client's performance without incurring additional communication costs. The proposed weighted model-contrastive loss provides additional regularization for local models, optimizing their respective distributions while effectively utilizing information from all clients to mitigate performance challenges caused by insufficient local data. The PCRFed approach was evaluated on two non-IID medical image segmentation datasets, and the results show that it outperforms several state-of-the-art FL frameworks, achieving higher single-client performance while ensuring privacy preservation and minimal communication costs. Our PCRFed framework can be adapted to various encoder-decoder segmentation network architectures and holds significant potential for advancing the use of FL in real-world medical applications. Based on a multi-center dataset, our framework demonstrates superior overall performance and higher single-client performance, achieving a 2.63% increase in the average Dice score for prostate segmentation.

联合学习（FL）在解决医学图像分析中的数据隐私问题方面显示出巨大的潜力。然而，不同地点的数据分布各不相同，这给聚合客户端模型和实现良好的全局模型性能带来了挑战。在本研究中，我们提出了一种名为 PCRFed 的新型个性化对比表示 FL 框架，它利用对比表示学习来解决非独立和同分布（non-IID）挑战，并动态调整本地客户端与全局模型之间的距离，以提高每个客户端的性能，而不会产生额外的通信成本。所提出的加权模型对比损失为本地模型提供了额外的正则化，优化了它们各自的分布，同时有效地利用了来自所有客户端的信息，减轻了因本地数据不足而带来的性能挑战。我们在两个非 IID 医学影像分割数据集上对 PCRFed 方法进行了评估，结果表明它优于几种最先进的 FL 框架，在确保隐私保护和最低通信成本的同时，实现了更高的单客户端性能。我们的 PCRFed 框架可适用于各种编码器-解码器分割网络架构，在推动 FL 在实际医疗应用中的使用方面具有巨大潜力。基于多中心数据集，我们的框架展示了卓越的整体性能和更高的单客户端性能，使前列腺分割的平均 Dice 分数提高了 2.63%。

{"title":"PCRFed: personalized federated learning with contrastive representation for non-independently and identically distributed medical image segmentation.","authors":"Shengyuan Liu, Ruofan Zhang, Mengjie Fang, Hailin Li, Tianwang Xun, Zipei Wang, Wenting Shang, Jie Tian, Di Dong","doi":"10.1186/s42492-025-00191-0","DOIUrl":"10.1186/s42492-025-00191-0","url":null,"abstract":"Federated learning (FL) has shown great potential in addressing data privacy issues in medical image analysis. However, varying data distributions across different sites can create challenges in aggregating client models and achieving good global model performance. In this study, we propose a novel personalized contrastive representation FL framework, named PCRFed, which leverages contrastive representation learning to address the non-independent and identically distributed (non-IID) challenge and dynamically adjusts the distance between local clients and the global model to improve each client's performance without incurring additional communication costs. The proposed weighted model-contrastive loss provides additional regularization for local models, optimizing their respective distributions while effectively utilizing information from all clients to mitigate performance challenges caused by insufficient local data. The PCRFed approach was evaluated on two non-IID medical image segmentation datasets, and the results show that it outperforms several state-of-the-art FL frameworks, achieving higher single-client performance while ensuring privacy preservation and minimal communication costs. Our PCRFed framework can be adapted to various encoder-decoder segmentation network architectures and holds significant potential for advancing the use of FL in real-world medical applications. Based on a multi-center dataset, our framework demonstrates superior overall performance and higher single-client performance, achieving a 2.63% increase in the average Dice score for prostate segmentation.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"6"},"PeriodicalIF":3.2,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11953490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143735808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Principal component analysis and fine-tuned vision transformation integrating model explainability for breast cancer prediction.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-03-10 DOI: 10.1186/s42492-025-00186-x

Huong Hoang Luong, Phuc Phan Hong, Dat Vo Minh, Thinh Nguyen Le Quang, Anh Dinh The, Nguyen Thai-Nghe, Hai Thanh Nguyen

Breast cancer, which is the most commonly diagnosed cancers among women, is a notable health issues globally. Breast cancer is a result of abnormal cells in the breast tissue growing out of control. Histopathology, which refers to the detection and learning of tissue diseases, has appeared as a solution for breast cancer treatment as it plays a vital role in its diagnosis and classification. Thus, considerable research on histopathology in medical and computer science has been conducted to develop an effective method for breast cancer treatment. In this study, a vision Transformer (ViT) was employed to classify tumors into two classes, benign and malignant, in the Breast Cancer Histopathological Database (BreakHis). To enhance the model performance, we introduced the novel multi-head locality large kernel self-attention during fine-tuning, achieving an accuracy of 95.94% at 100× magnification, thereby improving the accuracy by 3.34% compared to a standard ViT (which uses multi-head self-attention). In addition, the application of principal component analysis for dimensionality reduction led to an accuracy improvement of 3.34%, highlighting its role in mitigating overfitting and reducing the computational complexity. In the final phase, SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, and Gradient-weighted Class Activation Mapping were used for the interpretability and explainability of machine-learning models, aiding in understanding the feature importance and local explanations, and visualizing the model attention. In another experiment, ensemble learning with VGGIN further boosted the performance to 97.13% accuracy. Our approach exhibited a 0.98% to 17.13% improvement in accuracy compared with state-of-the-art methods, establishing a new benchmark for breast cancer histopathological image classification.

{"title":"Principal component analysis and fine-tuned vision transformation integrating model explainability for breast cancer prediction.","authors":"Huong Hoang Luong, Phuc Phan Hong, Dat Vo Minh, Thinh Nguyen Le Quang, Anh Dinh The, Nguyen Thai-Nghe, Hai Thanh Nguyen","doi":"10.1186/s42492-025-00186-x","DOIUrl":"10.1186/s42492-025-00186-x","url":null,"abstract":"Breast cancer, which is the most commonly diagnosed cancers among women, is a notable health issues globally. Breast cancer is a result of abnormal cells in the breast tissue growing out of control. Histopathology, which refers to the detection and learning of tissue diseases, has appeared as a solution for breast cancer treatment as it plays a vital role in its diagnosis and classification. Thus, considerable research on histopathology in medical and computer science has been conducted to develop an effective method for breast cancer treatment. In this study, a vision Transformer (ViT) was employed to classify tumors into two classes, benign and malignant, in the Breast Cancer Histopathological Database (BreakHis). To enhance the model performance, we introduced the novel multi-head locality large kernel self-attention during fine-tuning, achieving an accuracy of 95.94% at 100× magnification, thereby improving the accuracy by 3.34% compared to a standard ViT (which uses multi-head self-attention). In addition, the application of principal component analysis for dimensionality reduction led to an accuracy improvement of 3.34%, highlighting its role in mitigating overfitting and reducing the computational complexity. In the final phase, SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, and Gradient-weighted Class Activation Mapping were used for the interpretability and explainability of machine-learning models, aiding in understanding the feature importance and local explanations, and visualizing the model attention. In another experiment, ensemble learning with VGGIN further boosted the performance to 97.13% accuracy. Our approach exhibited a 0.98% to 17.13% improvement in accuracy compared with state-of-the-art methods, establishing a new benchmark for breast cancer histopathological image classification.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"5"},"PeriodicalIF":3.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11893953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Global residual stress field inference method for die-forging structural parts based on fusion of monitoring data and distribution prior.

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-03-06 DOI: 10.1186/s42492-025-00187-w

Shuyuan Chen, Yingguang Li, Changqing Liu, Zhiwei Zhao, Zhibin Chen, Xiao Liu

Die-forging structural parts are widely used in the main load-bearing components of aircrafts because of their excellent mechanical properties and fatigue resistance. However, the forming and heat treatment processes of die-forging structural parts are complex, leading to high levels of internal stress and a complex distribution of residual stress fields (RSFs), which affect the deformation, fatigue life, and failure of structural parts throughout their lifecycles. Hence, the global RSF can provide the basis for process control. The existing RSF inference method based on deformation force data can utilize monitoring data to infer the global RSF of a regular part. However, owing to the irregular geometry of die-forging structural parts and the complexity of the RSF, it is challenging to solve ill-conditioned problems during the inference process, which makes it difficult to obtain the RSF accurately. This paper presents a global RSF inference method for the die-forging structural parts based on the fusion of monitoring data and distribution prior. Prior knowledge was derived from the RSF distribution trends obtained through finite element analysis. This enables the low-dimensional characterization of the RSF, reducing the number of parameters required to solve the equations. The effectiveness of this method was validated in both simulation and actual environments.

{"title":"Global residual stress field inference method for die-forging structural parts based on fusion of monitoring data and distribution prior.","authors":"Shuyuan Chen, Yingguang Li, Changqing Liu, Zhiwei Zhao, Zhibin Chen, Xiao Liu","doi":"10.1186/s42492-025-00187-w","DOIUrl":"10.1186/s42492-025-00187-w","url":null,"abstract":"Die-forging structural parts are widely used in the main load-bearing components of aircrafts because of their excellent mechanical properties and fatigue resistance. However, the forming and heat treatment processes of die-forging structural parts are complex, leading to high levels of internal stress and a complex distribution of residual stress fields (RSFs), which affect the deformation, fatigue life, and failure of structural parts throughout their lifecycles. Hence, the global RSF can provide the basis for process control. The existing RSF inference method based on deformation force data can utilize monitoring data to infer the global RSF of a regular part. However, owing to the irregular geometry of die-forging structural parts and the complexity of the RSF, it is challenging to solve ill-conditioned problems during the inference process, which makes it difficult to obtain the RSF accurately. This paper presents a global RSF inference method for the die-forging structural parts based on the fusion of monitoring data and distribution prior. Prior knowledge was derived from the RSF distribution trends obtained through finite element analysis. This enables the low-dimensional characterization of the RSF, reducing the number of parameters required to solve the equations. The effectiveness of this method was validated in both simulation and actual environments.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"4"},"PeriodicalIF":3.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11885777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explainable machine learning framework for cataracts recognition using visual features. 使用视觉特征识别白内障的可解释机器学习框架。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-01-17 DOI: 10.1186/s42492-024-00183-6

Xiao Wu, Lingxi Hu, Zunjie Xiao, Xiaoqing Zhang, Risa Higashita, Jiang Liu

Cataract is the leading ocular disease of blindness and visual impairment globally. Deep neural networks (DNNs) have achieved promising cataracts recognition performance based on anterior segment optical coherence tomography (AS-OCT) images; however, they have poor explanations, limiting their clinical applications. In contrast, visual features extracted from original AS-OCT images and their transform forms (e.g., AS-OCT-based histograms) have good explanations but have not been fully exploited. Motivated by these observations, an explainable machine learning framework to recognize cataracts severity levels automatically using AS-OCT images was proposed, consisting of three stages: visual feature extraction, feature importance explanation and selection, and recognition. First, the intensity histogram and intensity-based statistical methods are applied to extract visual features from original AS-OCT images and AS-OCT-based histograms. Subsequently, the SHapley Additive exPlanations and Pearson correlation coefficient methods are applied to analyze the feature importance and select significant visual features. Finally, an ensemble multi-class ridge regression method is applied to recognize the cataracts severity levels based on the selected visual features. Experiments on a clinical AS-OCT-NC dataset demonstrate that the proposed framework not only achieves competitive performance through comparisons with DNNs, but also has a good explanation ability, meeting the requirements of clinical diagnostic practice.

白内障是全球致盲和视力损害的主要眼部疾病。基于前段光学相干断层扫描（AS-OCT）图像的深度神经网络（DNNs）已经取得了很好的白内障识别性能；然而，它们的解释不充分，限制了它们的临床应用。相比之下，从原始AS-OCT图像中提取的视觉特征及其变换形式（例如，基于AS-OCT的直方图）有很好的解释，但尚未得到充分利用。基于这些观察结果，提出了一种基于AS-OCT图像的白内障严重程度自动识别的机器学习框架，该框架包括三个阶段：视觉特征提取、特征重要性解释和选择以及识别。首先，应用强度直方图和基于强度的统计方法从原始AS-OCT图像和基于AS-OCT的直方图中提取视觉特征；随后，应用SHapley加性解释和Pearson相关系数方法分析特征重要性，选择显著的视觉特征。最后，基于所选择的视觉特征，应用集成多类岭回归方法对白内障的严重程度进行识别。在临床AS-OCT-NC数据集上的实验表明，所提出的框架不仅与dnn相比具有竞争力，而且具有良好的解释能力，满足临床诊断实践的要求。

{"title":"Explainable machine learning framework for cataracts recognition using visual features.","authors":"Xiao Wu, Lingxi Hu, Zunjie Xiao, Xiaoqing Zhang, Risa Higashita, Jiang Liu","doi":"10.1186/s42492-024-00183-6","DOIUrl":"10.1186/s42492-024-00183-6","url":null,"abstract":"Cataract is the leading ocular disease of blindness and visual impairment globally. Deep neural networks (DNNs) have achieved promising cataracts recognition performance based on anterior segment optical coherence tomography (AS-OCT) images; however, they have poor explanations, limiting their clinical applications. In contrast, visual features extracted from original AS-OCT images and their transform forms (e.g., AS-OCT-based histograms) have good explanations but have not been fully exploited. Motivated by these observations, an explainable machine learning framework to recognize cataracts severity levels automatically using AS-OCT images was proposed, consisting of three stages: visual feature extraction, feature importance explanation and selection, and recognition. First, the intensity histogram and intensity-based statistical methods are applied to extract visual features from original AS-OCT images and AS-OCT-based histograms. Subsequently, the SHapley Additive exPlanations and Pearson correlation coefficient methods are applied to analyze the feature importance and select significant visual features. Finally, an ensemble multi-class ridge regression method is applied to recognize the cataracts severity levels based on the selected visual features. Experiments on a clinical AS-OCT-NC dataset demonstrate that the proposed framework not only achieves competitive performance through comparisons with DNNs, but also has a good explanation ability, meeting the requirements of clinical diagnostic practice.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"3"},"PeriodicalIF":3.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harmonized technical standard test methods for quality evaluation of medical fluorescence endoscopic imaging systems. 医用荧光内窥镜成像系统质量评价的协调技术标准试验方法。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-01-10 DOI: 10.1186/s42492-024-00184-5

Bodong Liu, Zhaojun Guo, Pengfei Yang, Jian'an Ye, Kunshan He, Shen Gao, Chongwei Chi, Yu An, Jie Tian

Fluorescence endoscopy technology utilizes a light source of a specific wavelength to excite the fluorescence signals of biological tissues. This capability is extremely valuable for the early detection and precise diagnosis of pathological changes. Identifying a suitable experimental approach and metric for objectively and quantitatively assessing the imaging quality of fluorescence endoscopy is imperative to enhance the image evaluation criteria of fluorescence imaging technology. In this study, we propose a new set of standards for fluorescence endoscopy technology to evaluate the optical performance and image quality of fluorescence imaging objectively and quantitatively. This comprehensive set of standards encompasses fluorescence test models and imaging quality assessment protocols to ensure that the performance of fluorescence endoscopy systems meets the required standards. In addition, it aims to enhance the accuracy and uniformity of the results by standardizing testing procedures. The formulation of pivotal metrics and testing methodologies is anticipated to facilitate direct quantitative comparisons of the performance of fluorescence endoscopy devices. This advancement is expected to foster the harmonization of clinical and preclinical evaluations using fluorescence endoscopy imaging systems, thereby improving diagnostic precision and efficiency.

荧光内窥镜技术利用特定波长的光源激发生物组织的荧光信号。这种能力对于病理变化的早期发现和精确诊断是非常有价值的。为客观定量地评价荧光内镜成像质量，确定合适的实验方法和指标是提高荧光成像技术图像评价标准的必要条件。在本研究中，我们提出了一套新的荧光内窥镜技术标准，客观定量地评价荧光成像的光学性能和图像质量。这套全面的标准包括荧光测试模型和成像质量评估协议，以确保荧光内窥镜系统的性能符合要求的标准。此外，它旨在通过标准化测试程序来提高结果的准确性和统一性。预计关键指标和测试方法的制定将促进荧光内窥镜设备性能的直接定量比较。这一进展有望促进使用荧光内窥镜成像系统的临床和临床前评估的协调，从而提高诊断的准确性和效率。

{"title":"Harmonized technical standard test methods for quality evaluation of medical fluorescence endoscopic imaging systems.","authors":"Bodong Liu, Zhaojun Guo, Pengfei Yang, Jian'an Ye, Kunshan He, Shen Gao, Chongwei Chi, Yu An, Jie Tian","doi":"10.1186/s42492-024-00184-5","DOIUrl":"10.1186/s42492-024-00184-5","url":null,"abstract":"Fluorescence endoscopy technology utilizes a light source of a specific wavelength to excite the fluorescence signals of biological tissues. This capability is extremely valuable for the early detection and precise diagnosis of pathological changes. Identifying a suitable experimental approach and metric for objectively and quantitatively assessing the imaging quality of fluorescence endoscopy is imperative to enhance the image evaluation criteria of fluorescence imaging technology. In this study, we propose a new set of standards for fluorescence endoscopy technology to evaluate the optical performance and image quality of fluorescence imaging objectively and quantitatively. This comprehensive set of standards encompasses fluorescence test models and imaging quality assessment protocols to ensure that the performance of fluorescence endoscopy systems meets the required standards. In addition, it aims to enhance the accuracy and uniformity of the results by standardizing testing procedures. The formulation of pivotal metrics and testing methodologies is anticipated to facilitate direct quantitative comparisons of the performance of fluorescence endoscopy devices. This advancement is expected to foster the harmonization of clinical and preclinical evaluations using fluorescence endoscopy imaging systems, thereby improving diagnostic precision and efficiency.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"2"},"PeriodicalIF":3.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11723869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images. 推进乳腺癌诊断：标记视觉变压器更快，更准确地分类组织病理图像。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2025-01-08 DOI: 10.1186/s42492-024-00181-8

Mouhamed Laid Abimouloud, Khaled Bensid, Mohamed Elleuch, Mohamed Ben Ammar, Monji Kherallah

The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .

视觉转换器（vision transformer, ViT）架构以其基于多头注意层的注意机制，在医学图像信息处理方面的有效性被广泛应用于各种计算机辅助诊断任务中。vit以其复杂的体系结构而闻名，这需要高性能gpu或cpu才能在现实世界的医疗诊断设备中进行有效的模型训练和部署。这使得它们比卷积神经网络（cnn）更复杂。在组织病理学图像分析的背景下，这一困难也是具有挑战性的，因为图像既有限又复杂。为了应对这些挑战，本研究提出了一种结合cnn和ViTs优势的TokenMixer混合架构。该混合架构旨在通过最小化训练过程中使用的输入补丁数量，以更短的训练时间和更少的参数提高特征提取和分类精度，同时结合使用卷积层和编码器变压器层对输入补丁进行标记化，跨所有网络层处理补丁，以实现快速准确的乳腺癌肿瘤亚型分类。TokenMixer机制的灵感来自于ConvMixer和TokenLearner模型。首先，ConvMixer模型使用卷积层动态生成空间注意图，从而能够从输入图像中提取补丁，从而最大限度地减少训练中使用的输入补丁数量。其次，TokenLearner模型从选择的输入patch中提取相关区域，对其进行标记以改进特征提取，并在编码器变压器网络中训练所有标记过的patch。我们在BreakHis公共数据集上评估了TokenMixer模型，并将其与基于viti的方法和其他最先进的方法进行了比较。我们的方法在不同放大倍数（40倍、100倍、200倍、400倍）下对乳腺癌亚型的二元和多重分类都取得了令人印象深刻的结果。该模型对二元分类的准确率为97.02%，对多重分类的准确率为93.29%，决策时间分别为391.71 s和1173.56 s。这些结果突出了我们的混合深度ViT-CNN架构在组织病理学图像中推进肿瘤分类的潜力。源代码可访问：https://github.com/abimouloud/TokenMixer。

{"title":"Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.","authors":"Mouhamed Laid Abimouloud, Khaled Bensid, Mohamed Elleuch, Mohamed Ben Ammar, Monji Kherallah","doi":"10.1186/s42492-024-00181-8","DOIUrl":"10.1186/s42492-024-00181-8","url":null,"abstract":"The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"1"},"PeriodicalIF":3.2,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11711433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-supervised contour-driven broad learning system for autonomous segmentation of concealed prohibited baggage items. 隐藏违禁行李物品自主分割的半监督轮廓驱动广义学习系统。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-12-24 DOI: 10.1186/s42492-024-00182-7

Divya Velayudhan, Abdelfatah Ahmed, Taimur Hassan, Muhammad Owais, Neha Gour, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi

With the exponential rise in global air traffic, ensuring swift passenger processing while countering potential security threats has become a paramount concern for aviation security. Although X-ray baggage monitoring is now standard, manual screening has several limitations, including the propensity for errors, and raises concerns about passenger privacy. To address these drawbacks, researchers have leveraged recent advances in deep learning to design threat-segmentation frameworks. However, these models require extensive training data and labour-intensive dense pixel-wise annotations and are finetuned separately for each dataset to account for inter-dataset discrepancies. Hence, this study proposes a semi-supervised contour-driven broad learning system (BLS) for X-ray baggage security threat instance segmentation referred to as C-BLX. The research methodology involved enhancing representation learning and achieving faster training capability to tackle severe occlusion and class imbalance using a single training routine with limited baggage scans. The proposed framework was trained with minimal supervision using resource-efficient image-level labels to localize illegal items in multi-vendor baggage scans. More specifically, the framework generated candidate region segments from the input X-ray scans based on local intensity transition cues, effectively identifying concealed prohibited items without entire baggage scans. The multi-convolutional BLS exploits the rich complementary features extracted from these region segments to predict object categories, including threat and benign classes. The contours corresponding to the region segments predicted as threats were then utilized to yield the segmentation results. The proposed C-BLX system was thoroughly evaluated on three highly imbalanced public datasets and surpassed other competitive approaches in baggage-threat segmentation, yielding 90.04%, 78.92%, and 59.44% in terms of mIoU on GDXray, SIXray, and Compass-XP, respectively. Furthermore, the limitations of the proposed system in extracting precise region segments in intricate noisy settings and potential strategies for overcoming them through post-processing techniques were explored (source code will be available at https://github.com/Divs1159/CNN_BLS .).

随着全球空中交通的指数增长，确保快速处理旅客，同时应对潜在的安全威胁已成为航空安全的首要问题。虽然x光行李监控现在是标准的，但人工筛查有一些局限性，包括容易出错，并引发了对乘客隐私的担忧。为了解决这些问题，研究人员利用深度学习的最新进展来设计威胁分割框架。然而，这些模型需要大量的训练数据和劳动密集型的密集像素级注释，并且需要针对每个数据集分别进行微调，以解释数据集之间的差异。因此，本研究提出了一种用于x射线行李安全威胁实例分割的半监督轮廓驱动广义学习系统（BLS），称为C-BLX。研究方法包括增强表征学习和实现更快的训练能力，以解决严重的闭塞和班级不平衡，使用单一的训练程序和有限的行李扫描。该框架在最小的监督下进行训练，使用资源高效的图像级标签来定位多供应商行李扫描中的非法物品。更具体地说，该框架基于局部强度转换线索从输入的x射线扫描中生成候选区域片段，有效识别隐藏的违禁物品，而无需对整个行李进行扫描。多卷积BLS利用从这些区域段中提取的丰富的互补特征来预测对象类别，包括威胁类和良性类。然后利用预测为威胁的区域段对应的轮廓来产生分割结果。本文提出的C-BLX系统在三个高度不平衡的公共数据集上进行了全面评估，在行李威胁分割方面超过了其他竞争方法，在GDXray、SIXray和Compass-XP上的mIoU分别达到90.04%、78.92%和59.44%。此外，所提出的系统在复杂的噪声环境中提取精确区域片段的局限性以及通过后处理技术克服它们的潜在策略进行了探讨（源代码将在https://github.com/Divs1159/CNN_BLS .）。

{"title":"Semi-supervised contour-driven broad learning system for autonomous segmentation of concealed prohibited baggage items.","authors":"Divya Velayudhan, Abdelfatah Ahmed, Taimur Hassan, Muhammad Owais, Neha Gour, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi","doi":"10.1186/s42492-024-00182-7","DOIUrl":"10.1186/s42492-024-00182-7","url":null,"abstract":"With the exponential rise in global air traffic, ensuring swift passenger processing while countering potential security threats has become a paramount concern for aviation security. Although X-ray baggage monitoring is now standard, manual screening has several limitations, including the propensity for errors, and raises concerns about passenger privacy. To address these drawbacks, researchers have leveraged recent advances in deep learning to design threat-segmentation frameworks. However, these models require extensive training data and labour-intensive dense pixel-wise annotations and are finetuned separately for each dataset to account for inter-dataset discrepancies. Hence, this study proposes a semi-supervised contour-driven broad learning system (BLS) for X-ray baggage security threat instance segmentation referred to as C-BLX. The research methodology involved enhancing representation learning and achieving faster training capability to tackle severe occlusion and class imbalance using a single training routine with limited baggage scans. The proposed framework was trained with minimal supervision using resource-efficient image-level labels to localize illegal items in multi-vendor baggage scans. More specifically, the framework generated candidate region segments from the input X-ray scans based on local intensity transition cues, effectively identifying concealed prohibited items without entire baggage scans. The multi-convolutional BLS exploits the rich complementary features extracted from these region segments to predict object categories, including threat and benign classes. The contours corresponding to the region segments predicted as threats were then utilized to yield the segmentation results. The proposed C-BLX system was thoroughly evaluated on three highly imbalanced public datasets and surpassed other competitive approaches in baggage-threat segmentation, yielding 90.04%, 78.92%, and 59.44% in terms of mIoU on GDXray, SIXray, and Compass-XP, respectively. Furthermore, the limitations of the proposed system in extracting precise region segments in intricate noisy settings and potential strategies for overcoming them through post-processing techniques were explored (source code will be available at https://github.com/Divs1159/CNN_BLS .).","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"30"},"PeriodicalIF":3.2,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11666859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0