Pub Date : 2025-11-19DOI: 10.3390/jimaging11110417
Georgios S Ioannidis, Katerina Nikiforaki, Aikaterini Dovrou, Vassilis Kilintzis, Grigorios Kalliatakis, Oliver Diaz, Karim Lekadir, Kostas Marias
This study aims to develop an explainable radiomics-based model for the automatic assessment of image quality in breast cancer Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) data. A cohort of 280 images obtained from a public database was annotated by two clinical experts, resulting in 110 high-quality and 110 low-quality images. The proposed methodology involved the extraction of 819 radiomic features and 2 No-Reference image quality metrics per patient, using both the whole image and the background as regions of interest. Feature extraction was performed under two scenarios: (i) from a sample of 12 slices per patient, and (ii) from the middle slice of each patient. Following model training, a range of machine learning classifiers were applied with explainability assessed through SHapley Additive Explanations (SHAP). The best performance was achieved in the second scenario, where combining features from the whole image and background with a support vector machine classifier yielded sensitivity, specificity, accuracy, and AUC values of 85.51%, 80.01%, 82.76%, and 89.37%, respectively. This proposed model demonstrates potential for integration into clinical practice and may also serve as a valuable resource for large-scale repositories and subgroup analyses aimed at ensuring fairness and explainability.
{"title":"Explainable Radiomics-Based Model for Automatic Image Quality Assessment in Breast Cancer DCE MRI Data.","authors":"Georgios S Ioannidis, Katerina Nikiforaki, Aikaterini Dovrou, Vassilis Kilintzis, Grigorios Kalliatakis, Oliver Diaz, Karim Lekadir, Kostas Marias","doi":"10.3390/jimaging11110417","DOIUrl":"10.3390/jimaging11110417","url":null,"abstract":"<p><p>This study aims to develop an explainable radiomics-based model for the automatic assessment of image quality in breast cancer Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) data. A cohort of 280 images obtained from a public database was annotated by two clinical experts, resulting in 110 high-quality and 110 low-quality images. The proposed methodology involved the extraction of 819 radiomic features and 2 No-Reference image quality metrics per patient, using both the whole image and the background as regions of interest. Feature extraction was performed under two scenarios: (i) from a sample of 12 slices per patient, and (ii) from the middle slice of each patient. Following model training, a range of machine learning classifiers were applied with explainability assessed through SHapley Additive Explanations (SHAP). The best performance was achieved in the second scenario, where combining features from the whole image and background with a support vector machine classifier yielded sensitivity, specificity, accuracy, and AUC values of 85.51%, 80.01%, 82.76%, and 89.37%, respectively. This proposed model demonstrates potential for integration into clinical practice and may also serve as a valuable resource for large-scale repositories and subgroup analyses aimed at ensuring fairness and explainability.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-18DOI: 10.3390/jimaging11110416
Miguel José das Neves, Felipe Rodrigues Perche Mahlow, Renato Dias de Souza, Paulo Roberto G Hernandes, José Remo Ferreira Brega, Kelton Augusto Pontara da Costa
This paper addresses the critical challenge of detecting content-aware image manipulations, specifically focusing on seam carving forgery. While deep learning models, particularly Convolutional Neural Networks (CNNs), have shown promise in this area, their black-box nature limits their trustworthiness in high-stakes domains like digital forensics. To address this gap, we propose and validate a framework for interpretable forgery detection, termed E-XAI (Ensemble Explainable AI). Conceptually inspired by Ensemble Learning, our framework's novelty lies not in combining predictive models, but in integrating a multi-perspective ensemble of explainability techniques. Specifically, we combine SHAP for fine-grained, pixel-level feature attribution with Grad-CAM for region-level localization to create a more robust and holistic interpretation of a single, custom-trained CNN's decisions. Our approach is validated on a purpose-built, balanced, binary-class dataset of 10,300 images. The results demonstrate high classification performance on an unseen test set, with a 95% accuracy and a 99% precision for the forged class. Furthermore, we analyze the model's robustness against JPEG compression, a common real-world perturbation. More importantly, the application of the E-XAI framework reveals how the model identifies subtle forgery artifacts, providing transparent, visual evidence for its decisions. This work contributes a robust end-to-end pipeline for interpretable image forgery detection, enhancing the trust and reliability of AI systems in information security.
{"title":"Seam Carving Forgery Detection Through Multi-Perspective Explainable AI.","authors":"Miguel José das Neves, Felipe Rodrigues Perche Mahlow, Renato Dias de Souza, Paulo Roberto G Hernandes, José Remo Ferreira Brega, Kelton Augusto Pontara da Costa","doi":"10.3390/jimaging11110416","DOIUrl":"10.3390/jimaging11110416","url":null,"abstract":"<p><p>This paper addresses the critical challenge of detecting content-aware image manipulations, specifically focusing on seam carving forgery. While deep learning models, particularly Convolutional Neural Networks (CNNs), have shown promise in this area, their black-box nature limits their trustworthiness in high-stakes domains like digital forensics. To address this gap, we propose and validate a framework for interpretable forgery detection, termed E-XAI (Ensemble Explainable AI). Conceptually inspired by Ensemble Learning, our framework's novelty lies not in combining predictive models, but in integrating a multi-perspective ensemble of explainability techniques. Specifically, we combine SHAP for fine-grained, pixel-level feature attribution with Grad-CAM for region-level localization to create a more robust and holistic interpretation of a single, custom-trained CNN's decisions. Our approach is validated on a purpose-built, balanced, binary-class dataset of 10,300 images. The results demonstrate high classification performance on an unseen test set, with a 95% accuracy and a 99% precision for the forged class. Furthermore, we analyze the model's robustness against JPEG compression, a common real-world perturbation. More importantly, the application of the E-XAI framework reveals how the model identifies subtle forgery artifacts, providing transparent, visual evidence for its decisions. This work contributes a robust end-to-end pipeline for interpretable image forgery detection, enhancing the trust and reliability of AI systems in information security.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653248/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.3390/jimaging11110415
Sang-Jeong Lee
Automated Optical Inspection (AOI) of Printed Circuit Boards (PCBs) suffers from scarce labeled data and frequent domain shifts caused by variations in camera optics, illumination, and product design. These limitations hinder the development of accurate and reliable deep-learning models in manufacturing settings. To address this challenge, this study systematically benchmarks three Parameter-Efficient Fine-Tuning (PEFT) strategies-Linear Probe, Low-Rank Adaptation (LoRA), and Visual Prompt Tuning (VPT)-applied to two representative foundation vision models: the Contrastive Language-Image Pretraining Vision Transformer (CLIP-ViT-B/16) and the Self-Distillation with No Labels Vision Transformer (DINOv2-S/14). The models are evaluated on six-class PCB defect classification tasks under few-shot (k = 5, 10, 20) and full-data regimes, analyzing both performance and reliability. Experiments show that VPT achieves 0.99 ± 0.01 accuracy and 0.998 ± 0.001 macro-Area Under the Precision-Recall Curve (macro-AUPRC), reducing classification error by approximately 65% compared with Linear and LoRA while tuning fewer than 1.5% of backbone parameters. Reliability, assessed by the stability of precision-recall behavior across different decision thresholds, improved as the number of labeled samples increased. Furthermore, class-wise and few-shot analyses revealed that VPT adapts more effectively to rare defect types such as Spur and Spurious Copper while maintaining near-ceiling performance on simpler categories (Short, Pinhole). These findings collectively demonstrate that prompt-based adaptation offers a quantitatively favorable trade-off between accuracy, efficiency, and reliability. Practically, this positions VPT as a scalable strategy for factory-level AOI, enabling the rapid deployment of robust defect inspection models even when labeled data is scarce.
{"title":"Few-Shot Adaptation of Foundation Vision Models for PCB Defect Inspection.","authors":"Sang-Jeong Lee","doi":"10.3390/jimaging11110415","DOIUrl":"10.3390/jimaging11110415","url":null,"abstract":"<p><p>Automated Optical Inspection (AOI) of Printed Circuit Boards (PCBs) suffers from scarce labeled data and frequent domain shifts caused by variations in camera optics, illumination, and product design. These limitations hinder the development of accurate and reliable deep-learning models in manufacturing settings. To address this challenge, this study systematically benchmarks three Parameter-Efficient Fine-Tuning (PEFT) strategies-Linear Probe, Low-Rank Adaptation (LoRA), and Visual Prompt Tuning (VPT)-applied to two representative foundation vision models: the Contrastive Language-Image Pretraining Vision Transformer (CLIP-ViT-B/16) and the Self-Distillation with No Labels Vision Transformer (DINOv2-S/14). The models are evaluated on six-class PCB defect classification tasks under few-shot (k = 5, 10, 20) and full-data regimes, analyzing both performance and reliability. Experiments show that VPT achieves 0.99 ± 0.01 accuracy and 0.998 ± 0.001 macro-Area Under the Precision-Recall Curve (macro-AUPRC), reducing classification error by approximately 65% compared with Linear and LoRA while tuning fewer than 1.5% of backbone parameters. Reliability, assessed by the stability of precision-recall behavior across different decision thresholds, improved as the number of labeled samples increased. Furthermore, class-wise and few-shot analyses revealed that VPT adapts more effectively to rare defect types such as Spur and Spurious Copper while maintaining near-ceiling performance on simpler categories (Short, Pinhole). These findings collectively demonstrate that prompt-based adaptation offers a quantitatively favorable trade-off between accuracy, efficiency, and reliability. Practically, this positions VPT as a scalable strategy for factory-level AOI, enabling the rapid deployment of robust defect inspection models even when labeled data is scarce.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-16DOI: 10.3390/jimaging11110414
Lidia Yolanda Ramírez-Rios, Jesús Everardo Olguín-Tiznado, Edgar Rene Ramos-Acosta, Everardo Inzunza-Gonzalez, Julio César Cano-Gutiérrez, Enrique Efrén García-Guerrero, Claudia Camargo-Wilson
The anatomical structure of the foot can be assessed by examining the plantar footprint for orthopedic intervention. In fact, there is a relationship between a specific type of foot and multiple musculoskeletal disorders, which are among the main ailments affecting the lower extremities, where its accurate classification is essential for early diagnosis. This work aims to develop a method for accurately classifying the plantar footprint and hindfoot, specifically concerning the sagittal plane. A custom image dataset was created, comprising 603 RGB plantar images that were modified and augmented. Six state-of-the-art models have been trained and evaluated: swin_tiny_patch4_window7_224, convnextv2_tiny, deit3_base_patch16_224, xception41, inception-v4, and efficientnet_b0. Among them, the swin_tiny_patch4_window7_224 model achieved 98.013% accuracy, demonstrating its potential as a reliable and low-cost tool for clinical screening and diagnosis of foot-related conditions.
{"title":"Toward Smarter Orthopedic Care: Classifying Plantar Footprints from RGB Images Using Vision Transformers and CNNs.","authors":"Lidia Yolanda Ramírez-Rios, Jesús Everardo Olguín-Tiznado, Edgar Rene Ramos-Acosta, Everardo Inzunza-Gonzalez, Julio César Cano-Gutiérrez, Enrique Efrén García-Guerrero, Claudia Camargo-Wilson","doi":"10.3390/jimaging11110414","DOIUrl":"10.3390/jimaging11110414","url":null,"abstract":"<p><p>The anatomical structure of the foot can be assessed by examining the plantar footprint for orthopedic intervention. In fact, there is a relationship between a specific type of foot and multiple musculoskeletal disorders, which are among the main ailments affecting the lower extremities, where its accurate classification is essential for early diagnosis. This work aims to develop a method for accurately classifying the plantar footprint and hindfoot, specifically concerning the sagittal plane. A custom image dataset was created, comprising 603 RGB plantar images that were modified and augmented. Six state-of-the-art models have been trained and evaluated: swin_tiny_patch4_window7_224, convnextv2_tiny, deit3_base_patch16_224, xception41, inception-v4, and efficientnet_b0. Among them, the swin_tiny_patch4_window7_224 model achieved 98.013% accuracy, demonstrating its potential as a reliable and low-cost tool for clinical screening and diagnosis of foot-related conditions.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.3390/jimaging11110413
Tamás Molnár, Bence Bolla, Orsolya Szabó, András Koltay
Forest damage has been increasingly recorded over the past decade in both Europe and Hungary, primarily due to prolonged droughts, causing a decline in forest health. In the framework of ICP Forests, the forest damage has been monitored for decades; however, it is labour-intensive and time-consuming. Satellite-based remote sensing offers a rapid and efficient method for assessing large-scale damage events, combining the ground-based ICP Forests datasets. This study utilised cloud computing and Sentinel-2 satellite imagery to monitor forest health and detect anomalies. Standardised NDVI (Z NDVI) maps were produced for the period from 2017 to 2023 to identify disturbances in the forest. The research focused on seven active ICP Forests Level II and 78 Level I plots in Hungary. Z NDVI values were divided into five categories based on damage severity, and there was agreement between Level II field data and satellite imagery. In 2017, severe damage was caused by late frost and wind; however, the forest recovered by 2018. Another decline was observed in 2021 due to wind and in 2022 due to drought. Data from the ICP Forests Level I plots, which represent forest condition in Hungary, indicated that 80% of the monitored stands were damaged, with 30% suffering moderate damage and 15% experiencing severe damage. Z NDVI classifications aligned with the field data, showing widespread forest damage across the country.
{"title":"Sentinel-2-Based Forest Health Survey of ICP Forests Level I and II Plots in Hungary.","authors":"Tamás Molnár, Bence Bolla, Orsolya Szabó, András Koltay","doi":"10.3390/jimaging11110413","DOIUrl":"10.3390/jimaging11110413","url":null,"abstract":"<p><p>Forest damage has been increasingly recorded over the past decade in both Europe and Hungary, primarily due to prolonged droughts, causing a decline in forest health. In the framework of ICP Forests, the forest damage has been monitored for decades; however, it is labour-intensive and time-consuming. Satellite-based remote sensing offers a rapid and efficient method for assessing large-scale damage events, combining the ground-based ICP Forests datasets. This study utilised cloud computing and Sentinel-2 satellite imagery to monitor forest health and detect anomalies. Standardised NDVI (Z NDVI) maps were produced for the period from 2017 to 2023 to identify disturbances in the forest. The research focused on seven active ICP Forests Level II and 78 Level I plots in Hungary. Z NDVI values were divided into five categories based on damage severity, and there was agreement between Level II field data and satellite imagery. In 2017, severe damage was caused by late frost and wind; however, the forest recovered by 2018. Another decline was observed in 2021 due to wind and in 2022 due to drought. Data from the ICP Forests Level I plots, which represent forest condition in Hungary, indicated that 80% of the monitored stands were damaged, with 30% suffering moderate damage and 15% experiencing severe damage. Z NDVI classifications aligned with the field data, showing widespread forest damage across the country.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653305/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.3390/jimaging11110412
Hongliang Zhang, Bolin Xu, Sanxin Jiang
Camouflaged Object Detection (COD) is a challenging computer vision task aimed at accurately identifying and segmenting objects seamlessly blended into their backgrounds. This task has broad applications across medical image segmentation, defect detection, agricultural image detection, security monitoring, and scientific research. Traditional COD methods often struggle with precise segmentation due to the high similarity between camouflaged objects and their surroundings. In this study, we introduce a Boundary-Guided Differential Attention Network (BDA-Net) to address these challenges. BDA-Net first extracts boundary features by fusing multi-scale image features and applying channel attention. Subsequently, it employs a differential attention mechanism, guided by these boundary features, to highlight camouflaged objects and suppress background information. The weighted features are then progressively fused to generate accurate camouflage object masks. Experimental results on the COD10K, NC4K, and CAMO datasets demonstrate that BDA-Net outperforms most state-of-the-art COD methods, achieving higher accuracy. Here we show that our approach improves detection accuracy by up to 3.6% on key metrics, offering a robust solution for precise camouflaged object segmentation.
{"title":"Boundary-Guided Differential Attention: Enhancing Camouflaged Object Detection Accuracy.","authors":"Hongliang Zhang, Bolin Xu, Sanxin Jiang","doi":"10.3390/jimaging11110412","DOIUrl":"10.3390/jimaging11110412","url":null,"abstract":"<p><p>Camouflaged Object Detection (COD) is a challenging computer vision task aimed at accurately identifying and segmenting objects seamlessly blended into their backgrounds. This task has broad applications across medical image segmentation, defect detection, agricultural image detection, security monitoring, and scientific research. Traditional COD methods often struggle with precise segmentation due to the high similarity between camouflaged objects and their surroundings. In this study, we introduce a Boundary-Guided Differential Attention Network (BDA-Net) to address these challenges. BDA-Net first extracts boundary features by fusing multi-scale image features and applying channel attention. Subsequently, it employs a differential attention mechanism, guided by these boundary features, to highlight camouflaged objects and suppress background information. The weighted features are then progressively fused to generate accurate camouflage object masks. Experimental results on the COD10K, NC4K, and CAMO datasets demonstrate that BDA-Net outperforms most state-of-the-art COD methods, achieving higher accuracy. Here we show that our approach improves detection accuracy by up to 3.6% on key metrics, offering a robust solution for precise camouflaged object segmentation.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.3390/jimaging11110410
Long Li, Tinglei Jia, Huaizhi Yue, Huize Cheng, Yongfeng Bu, Zhaoyang Zhang
Long-tailed image classification remains challenging for vision-language models. Head classes dominate training while tail classes are underrepresented and noisy, and short prompts with weak text supervision further amplify head bias. This paper presents TASA, an end-to-end framework that stabilizes textual supervision and enhances cross-modal fusion. A Semantic Distribution Modulation (SDM) module constructs class-specific text prototypes by cosine-weighted fusion of multiple LLM-generated descriptions with a canonical template, providing stable and diverse semantic anchors without training text parameters. Dual-Space Cross-Modal Fusion (DCF) module incorporates selective-scan state-space blocks into both image and text branches, enabling bidirectional conditioning and efficient feature fusion through a lightweight multilayer perceptron. Together with a margin-aware alignment loss, TASA aligns images with class prototypes for classification without requiring paired image-text data or per-class prompt tuning. Experiments on CIFAR-10/100-LT, ImageNet-LT, and Places-LT demonstrate consistent improvements across many-, medium-, and few-shot groups. Ablation studies confirm that DCF yields the largest single-module gain, while SDM and DCF combined provide the most robust and balanced performance. These results highlight the effectiveness of integrating text-driven prototypes with state-space fusion for long-tailed classification.
对于视觉语言模型来说,长尾图像分类仍然是一个挑战。头部类在训练中占主导地位,而尾部类的代表性不足且嘈杂,短提示和弱文本监督进一步放大了头部偏见。本文提出了一种端到端框架TASA,它稳定了文本监督并增强了跨模态融合。语义分布调制(Semantic Distribution Modulation, SDM)模块通过余弦加权融合多个llm生成的描述和一个规范模板来构建特定类的文本原型,在不需要训练文本参数的情况下提供稳定和多样化的语义锚。双空间跨模态融合(DCF)模块将选择性扫描状态空间块集成到图像和文本分支中,通过轻量级多层感知器实现双向调节和有效的特征融合。与边缘感知对齐损失一起,TASA将图像与分类原型对齐,而不需要配对的图像-文本数据或每个类的提示调优。在CIFAR-10/100-LT、ImageNet-LT和Places-LT上的实验表明,在多组、中组和少组中都有一致的改进。烧蚀研究证实,DCF产生最大的单模块增益,而SDM和DCF结合提供最稳健和平衡的性能。这些结果突出了将文本驱动原型与状态空间融合相结合用于长尾分类的有效性。
{"title":"TASA: Text-Anchored State-Space Alignment for Long-Tailed Image Classification.","authors":"Long Li, Tinglei Jia, Huaizhi Yue, Huize Cheng, Yongfeng Bu, Zhaoyang Zhang","doi":"10.3390/jimaging11110410","DOIUrl":"10.3390/jimaging11110410","url":null,"abstract":"<p><p>Long-tailed image classification remains challenging for vision-language models. Head classes dominate training while tail classes are underrepresented and noisy, and short prompts with weak text supervision further amplify head bias. This paper presents TASA, an end-to-end framework that stabilizes textual supervision and enhances cross-modal fusion. A Semantic Distribution Modulation (SDM) module constructs class-specific text prototypes by cosine-weighted fusion of multiple LLM-generated descriptions with a canonical template, providing stable and diverse semantic anchors without training text parameters. Dual-Space Cross-Modal Fusion (DCF) module incorporates selective-scan state-space blocks into both image and text branches, enabling bidirectional conditioning and efficient feature fusion through a lightweight multilayer perceptron. Together with a margin-aware alignment loss, TASA aligns images with class prototypes for classification without requiring paired image-text data or per-class prompt tuning. Experiments on CIFAR-10/100-LT, ImageNet-LT, and Places-LT demonstrate consistent improvements across many-, medium-, and few-shot groups. Ablation studies confirm that DCF yields the largest single-module gain, while SDM and DCF combined provide the most robust and balanced performance. These results highlight the effectiveness of integrating text-driven prototypes with state-space fusion for long-tailed classification.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653332/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.3390/jimaging11110411
Wanshu Li, Yuanhui Hu
In immersive digital devices, high environmental complexity can lead to rendering delays and loss of interactive details, resulting in a fragmented experience. This paper proposes a lightweight NeRF (Neural Radiance Fields) modeling and multimodal perception fusion method. First, a sparse hash code is constructed based on Instant-NGP (Instant Neural Graphics Primitives) to accelerate scene radiance field generation. Second, parameter distillation and channel pruning are used to reduce the model's size and reduce computational overheads. Next, multimodal data from a depth camera and an IMU (Inertial Measurement Unit) is fused, and Kalman filtering is used to improve pose tracking accuracy. Finally, the optimized NeRF model is integrated into the Unity engine, utilizing custom shaders and asynchronous rendering to achieve low-latency viewpoint responsiveness. Experiments show that the file size of this method in high-complexity scenes is only 79.5 MB ± 5.3 MB, and the first loading time is only 2.9 s ± 0.4 s, effectively reducing rendering latency. The SSIM is 0.951 ± 0.016 at 1.5 m/s, and the GME is 7.68 ± 0.15 at 1.5 m/s. It can stably restore texture details and edge sharpness under dynamic viewing angles. In scenarios that support 3-5 people interacting simultaneously, the average interaction response delay is only 16.3 ms, and the average jitter error is controlled at 0.12°, significantly improving spatial interaction performance. In conclusion, this study provides effective technical solutions for high-quality immersive interaction in complex public scenarios. Future work will explore the framework's adaptability in larger-scale dynamic environments and further optimize the network synchronization mechanism for multi-user concurrency.
{"title":"Neural Radiance Fields: Driven Exploration of Visual Communication and Spatial Interaction Design for Immersive Digital Installations.","authors":"Wanshu Li, Yuanhui Hu","doi":"10.3390/jimaging11110411","DOIUrl":"10.3390/jimaging11110411","url":null,"abstract":"<p><p>In immersive digital devices, high environmental complexity can lead to rendering delays and loss of interactive details, resulting in a fragmented experience. This paper proposes a lightweight NeRF (Neural Radiance Fields) modeling and multimodal perception fusion method. First, a sparse hash code is constructed based on Instant-NGP (Instant Neural Graphics Primitives) to accelerate scene radiance field generation. Second, parameter distillation and channel pruning are used to reduce the model's size and reduce computational overheads. Next, multimodal data from a depth camera and an IMU (Inertial Measurement Unit) is fused, and Kalman filtering is used to improve pose tracking accuracy. Finally, the optimized NeRF model is integrated into the Unity engine, utilizing custom shaders and asynchronous rendering to achieve low-latency viewpoint responsiveness. Experiments show that the file size of this method in high-complexity scenes is only 79.5 MB ± 5.3 MB, and the first loading time is only 2.9 s ± 0.4 s, effectively reducing rendering latency. The SSIM is 0.951 ± 0.016 at 1.5 m/s, and the GME is 7.68 ± 0.15 at 1.5 m/s. It can stably restore texture details and edge sharpness under dynamic viewing angles. In scenarios that support 3-5 people interacting simultaneously, the average interaction response delay is only 16.3 ms, and the average jitter error is controlled at 0.12°, significantly improving spatial interaction performance. In conclusion, this study provides effective technical solutions for high-quality immersive interaction in complex public scenarios. Future work will explore the framework's adaptability in larger-scale dynamic environments and further optimize the network synchronization mechanism for multi-user concurrency.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653945/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.3390/jimaging11110407
Michel Beyer, Julian Grossi, Alexandru Burde, Sead Abazi, Lukas Seifert, Joachim Polligkeit, Neha Umakant Chodankar, Florian M Thieringer
The accurate reconstruction of craniofacial defects requires the precise segmentation and mirroring of healthy anatomy. Conventional workflows rely on manual interaction, making them time-consuming and subject to operator variability. This study developed and validated a fully automated digital pipeline that integrates deep learning-based segmentation with algorithmic mirroring for craniofacial reconstruction. A total of 388 cranial CT scans were used to train a three-dimensional nnU-Net model for skull and mandible segmentation. A Principal Component Analysis-Iterative Closest Point (PCA-ICP) algorithm was then applied to compute the sagittal symmetry plane and perform mirroring. Automated results were compared with expert-generated segmentations and manually defined symmetry planes using Dice Similarity Coefficient (DSC), Mean Surface Distance (MSD), Hausdorff Distance (HD), and angular deviation. The nnU-Net achieved high segmentation accuracy for both the mandible (mean DSC 0.956) and the skull (mean DSC 0.965). Mirroring results showed minimal angular deviation from expert reference planes (mandible: 1.32° ± 0.71° in defect cases, 1.58° ± 1.12° in intact cases; skull: 1.75° ± 0.84° in defect cases, 1.15° ± 0.81° in intact cases). The presence of defects did not significantly affect accuracy. This automated workflow demonstrated robust performance and clinical applicability, offering standardized, reproducible, and time-efficient planning for craniofacial reconstruction.
{"title":"Fully Automated AI-Based Digital Workflow for Mirroring of Healthy and Defective Craniofacial Models.","authors":"Michel Beyer, Julian Grossi, Alexandru Burde, Sead Abazi, Lukas Seifert, Joachim Polligkeit, Neha Umakant Chodankar, Florian M Thieringer","doi":"10.3390/jimaging11110407","DOIUrl":"10.3390/jimaging11110407","url":null,"abstract":"<p><p>The accurate reconstruction of craniofacial defects requires the precise segmentation and mirroring of healthy anatomy. Conventional workflows rely on manual interaction, making them time-consuming and subject to operator variability. This study developed and validated a fully automated digital pipeline that integrates deep learning-based segmentation with algorithmic mirroring for craniofacial reconstruction. A total of 388 cranial CT scans were used to train a three-dimensional nnU-Net model for skull and mandible segmentation. A Principal Component Analysis-Iterative Closest Point (PCA-ICP) algorithm was then applied to compute the sagittal symmetry plane and perform mirroring. Automated results were compared with expert-generated segmentations and manually defined symmetry planes using Dice Similarity Coefficient (DSC), Mean Surface Distance (MSD), Hausdorff Distance (HD), and angular deviation. The nnU-Net achieved high segmentation accuracy for both the mandible (mean DSC 0.956) and the skull (mean DSC 0.965). Mirroring results showed minimal angular deviation from expert reference planes (mandible: 1.32° ± 0.71° in defect cases, 1.58° ± 1.12° in intact cases; skull: 1.75° ± 0.84° in defect cases, 1.15° ± 0.81° in intact cases). The presence of defects did not significantly affect accuracy. This automated workflow demonstrated robust performance and clinical applicability, offering standardized, reproducible, and time-efficient planning for craniofacial reconstruction.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.3390/jimaging11110406
Esin Rothenfluh, Georg F Erbach, Léna G Dietrich, Laura De Pellegrin, Daniela A Frauchiger, Rainer J Egli
This exploratory study investigates the feasibility and diagnostic value of high-resolution peripheral quantitative computed tomography (HR-pQCT) in detecting structural and microarchitectural changes in lunate avascular necrosis (AVN), or Kienböck's disease. Five adult patients with unilateral AVN underwent either MRI or CT, alongside HR-pQCT of both wrists. Imaging features such as subchondral remodeling, joint space narrowing, and bone fragmentation were assessed across modalities. HR-pQCT detected at least one additional pathological feature not seen on MRI or CT in four of five patients and revealed early subchondral changes in two contralateral asymptomatic wrists. Quantitative measurements of bone volume fraction (BV/TV) further indicated altered trabecular structure correlating with disease stage. These findings suggest that HR-pQCT may offer enhanced sensitivity for early-stage AVN and better delineation of disease extent, which is critical for informed surgical planning. While limited by small sample size, this study provides preliminary evidence supporting HR-pQCT as a complementary imaging tool in the assessment of lunate AVN, with potential to improve early detection, staging accuracy, and individualized treatment strategies.
{"title":"High-Resolution Peripheral Quantitative Computed Tomography (HR-pQCT) for Assessment of Avascular Necrosis of the Lunate.","authors":"Esin Rothenfluh, Georg F Erbach, Léna G Dietrich, Laura De Pellegrin, Daniela A Frauchiger, Rainer J Egli","doi":"10.3390/jimaging11110406","DOIUrl":"10.3390/jimaging11110406","url":null,"abstract":"<p><p>This exploratory study investigates the feasibility and diagnostic value of high-resolution peripheral quantitative computed tomography (HR-pQCT) in detecting structural and microarchitectural changes in lunate avascular necrosis (AVN), or Kienböck's disease. Five adult patients with unilateral AVN underwent either MRI or CT, alongside HR-pQCT of both wrists. Imaging features such as subchondral remodeling, joint space narrowing, and bone fragmentation were assessed across modalities. HR-pQCT detected at least one additional pathological feature not seen on MRI or CT in four of five patients and revealed early subchondral changes in two contralateral asymptomatic wrists. Quantitative measurements of bone volume fraction (BV/TV) further indicated altered trabecular structure correlating with disease stage. These findings suggest that HR-pQCT may offer enhanced sensitivity for early-stage AVN and better delineation of disease extent, which is critical for informed surgical planning. While limited by small sample size, this study provides preliminary evidence supporting HR-pQCT as a complementary imaging tool in the assessment of lunate AVN, with potential to improve early detection, staging accuracy, and individualized treatment strategies.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 11","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12653468/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145606633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}