Pub Date : 2025-12-05DOI: 10.3390/jimaging11120434
Evgenii Yu Zlokazov, Rostislav S Starikov, Pavel A Cheremkhin, Timur Z Minikhanov
High-speed realization of computer-generated holograms (CGHs) is a crucial problem in the field of modern 3D visualization and optical image processing system development. Binary CGHs can be realized using high-resolution, high-speed spatial light modulators such as ferroelectric liquid crystals on silicon devices or digital micro-mirror devices providing the high throughput of optoelectronic systems. However, the quality of holographic images restored by binary CGHs often suffers from distortions, background noise, and speckle noise caused by the limitations and imperfections of optical system components. The present manuscript introduces a method based on the optimization of CGH models directly in the optical system with a camera-in-the-loop configuration using effective direct search with a random trajectory algorithm. The method was experimentally verified. The results demonstrate a significant enhancement in the quality of the holographic images optically restored by binary-phase CGH models optimized through this method compared to purely digitally generated models.
{"title":"Camera-in-the-Loop Realization of Direct Search with Random Trajectory Method for Binary-Phase Computer-Generated Hologram Optimization.","authors":"Evgenii Yu Zlokazov, Rostislav S Starikov, Pavel A Cheremkhin, Timur Z Minikhanov","doi":"10.3390/jimaging11120434","DOIUrl":"10.3390/jimaging11120434","url":null,"abstract":"<p><p>High-speed realization of computer-generated holograms (CGHs) is a crucial problem in the field of modern 3D visualization and optical image processing system development. Binary CGHs can be realized using high-resolution, high-speed spatial light modulators such as ferroelectric liquid crystals on silicon devices or digital micro-mirror devices providing the high throughput of optoelectronic systems. However, the quality of holographic images restored by binary CGHs often suffers from distortions, background noise, and speckle noise caused by the limitations and imperfections of optical system components. The present manuscript introduces a method based on the optimization of CGH models directly in the optical system with a camera-in-the-loop configuration using effective direct search with a random trajectory algorithm. The method was experimentally verified. The results demonstrate a significant enhancement in the quality of the holographic images optically restored by binary-phase CGH models optimized through this method compared to purely digitally generated models.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.3390/jimaging11120433
Hilal Tekin, Şafak Kılıç, Yahya Doğan
Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these challenges through an integrated use of attention-enhanced ConvNeXt architectures for both segmentation and classification. In the first stage, DiagNeXt-Seg employs a U-Net-based design incorporating Enhanced Convolutional Blocks (ECBs) with spatial attention gates and Atrous Spatial Pyramid Pooling (ASPP) to achieve precise multi-class kidney segmentation. In the second stage, DiagNeXt-Cls utilizes the segmented regions of interest (ROIs) for pathology classification through a hierarchical multi-resolution strategy enhanced by Context-Aware Feature Fusion (CAFF) and Evidential Deep Learning (EDL) for uncertainty estimation. The main contributions of this work include: (1) enhanced ConvNeXt blocks with large-kernel depthwise convolutions optimized for 3D medical imaging, (2) a boundary-aware compound loss combining Dice, cross-entropy, focal, and distance transform terms to improve segmentation precision, (3) attention-guided skip connections preserving fine-grained spatial details, (4) hierarchical multi-scale feature modeling for robust pathology recognition, and (5) a confidence-modulated classification approach integrating segmentation quality metrics for reliable decision-making. Extensive experiments on a large kidney CT dataset comprising 3847 patients demonstrate that DiagNeXt achieves 98.9% classification accuracy, outperforming state-of-the-art approaches by 6.8%. The framework attains near-perfect AUC scores across all pathology classes (Normal: 1.000, Tumor: 1.000, Cyst: 0.999, Stone: 0.994) while offering clinically interpretable uncertainty maps and attention visualizations. The superior diagnostic accuracy, computational efficiency (6.2× faster inference), and interpretability of DiagNeXt make it a strong candidate for real-world integration into clinical kidney disease diagnosis and treatment planning systems.
{"title":"DiagNeXt: A Two-Stage Attention-Guided ConvNeXt Framework for Kidney Pathology Segmentation and Classification.","authors":"Hilal Tekin, Şafak Kılıç, Yahya Doğan","doi":"10.3390/jimaging11120433","DOIUrl":"10.3390/jimaging11120433","url":null,"abstract":"<p><p>Accurate segmentation and classification of kidney pathologies from medical images remain a major challenge in computer-aided diagnosis due to complex morphological variations, small lesion sizes, and severe class imbalance. This study introduces DiagNeXt, a novel two-stage deep learning framework designed to overcome these challenges through an integrated use of attention-enhanced ConvNeXt architectures for both segmentation and classification. In the first stage, DiagNeXt-Seg employs a U-Net-based design incorporating Enhanced Convolutional Blocks (ECBs) with spatial attention gates and Atrous Spatial Pyramid Pooling (ASPP) to achieve precise multi-class kidney segmentation. In the second stage, DiagNeXt-Cls utilizes the segmented regions of interest (ROIs) for pathology classification through a hierarchical multi-resolution strategy enhanced by Context-Aware Feature Fusion (CAFF) and Evidential Deep Learning (EDL) for uncertainty estimation. The main contributions of this work include: (1) enhanced ConvNeXt blocks with large-kernel depthwise convolutions optimized for 3D medical imaging, (2) a boundary-aware compound loss combining Dice, cross-entropy, focal, and distance transform terms to improve segmentation precision, (3) attention-guided skip connections preserving fine-grained spatial details, (4) hierarchical multi-scale feature modeling for robust pathology recognition, and (5) a confidence-modulated classification approach integrating segmentation quality metrics for reliable decision-making. Extensive experiments on a large kidney CT dataset comprising 3847 patients demonstrate that DiagNeXt achieves 98.9% classification accuracy, outperforming state-of-the-art approaches by 6.8%. The framework attains near-perfect AUC scores across all pathology classes (Normal: 1.000, Tumor: 1.000, Cyst: 0.999, Stone: 0.994) while offering clinically interpretable uncertainty maps and attention visualizations. The superior diagnostic accuracy, computational efficiency (6.2× faster inference), and interpretability of DiagNeXt make it a strong candidate for real-world integration into clinical kidney disease diagnosis and treatment planning systems.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.3390/jimaging11120432
Costin-Emanuel Vasile, Călin Bîră, Radu Hobincu
Registering UAV-based thermal and visible images is a challenging task due to differences in appearance across spectra and the lack of public benchmarks. To address this issue, we introduce UAV-TIRVis, a dataset consisting of 80 accurately and manually registered UAV-based thermal (640 × 512) and visible (4K) image pairs, captured across diverse environments. We benchmark our dataset using well-known registration methods, including feature-based (ORB, SURF, SIFT, KAZE), correlation-based, and intensity-based methods, as well as a custom, heuristic intensity-based method. We evaluate the performance of these methods using four metrics: RMSE, PSNR, SSIM, and NCC, averaged per scenario and across the entire dataset. The results show that conventional methods often fail to generalize across scenes, yielding <0.6 NCC on average, whereas the heuristic method shows that it is possible to achieve 0.77 SSIM and 0.82 NCC, highlighting the difficulty of cross-spectral UAV alignment and the need for further research to improve optimization in existing registration methods.
{"title":"UAV-TIRVis: A Benchmark Dataset for Thermal-Visible Image Registration from Aerial Platforms.","authors":"Costin-Emanuel Vasile, Călin Bîră, Radu Hobincu","doi":"10.3390/jimaging11120432","DOIUrl":"10.3390/jimaging11120432","url":null,"abstract":"<p><p>Registering UAV-based thermal and visible images is a challenging task due to differences in appearance across spectra and the lack of public benchmarks. To address this issue, we introduce UAV-TIRVis, a dataset consisting of 80 accurately and manually registered UAV-based thermal (640 × 512) and visible (4K) image pairs, captured across diverse environments. We benchmark our dataset using well-known registration methods, including feature-based (ORB, SURF, SIFT, KAZE), correlation-based, and intensity-based methods, as well as a custom, heuristic intensity-based method. We evaluate the performance of these methods using four metrics: RMSE, PSNR, SSIM, and NCC, averaged per scenario and across the entire dataset. The results show that conventional methods often fail to generalize across scenes, yielding <0.6 NCC on average, whereas the heuristic method shows that it is possible to achieve 0.77 SSIM and 0.82 NCC, highlighting the difficulty of cross-spectral UAV alignment and the need for further research to improve optimization in existing registration methods.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.3390/jimaging11120430
Zhigang Zhu, John-Ross Rizzo, Hao Tang
Over 2 [...].
超过2[…]。
{"title":"Editorial on the Special Issue \"Image and Video Processing for Blind and Visually Impaired\".","authors":"Zhigang Zhu, John-Ross Rizzo, Hao Tang","doi":"10.3390/jimaging11120430","DOIUrl":"10.3390/jimaging11120430","url":null,"abstract":"<p><p>Over 2 [...].</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The internal quality assessment of potato tubers is a crucial task in agro-laboratory processing. Traditional methods struggle to detect internal defects such as hollow heart, internal bruises, and insect galleries using only surface features. We present a novel, fully modular hybrid AI architecture designed for defect detection using RGB images of potato slices, suitable for integration in laboratory. Our pipeline combines high-recall multi-threshold YOLO detection, contextual patch validation using ResNet, precise segmentation via the Segment Anything Model (SAM), and skin-contact analysis using VGG16 with a Random Forest classifier. Experimental results on a labeled dataset of over 6000 annotated instances show a recall above 95% and precision near 97.2% for most defect classes. The approach offers both robustness and interpretability, outperforming previous methods that rely on costly hyperspectral or MRI techniques. This system is scalable, explainable, and compatible with existing 2D imaging hardware.
{"title":"Hybrid AI Pipeline for Laboratory Detection of Internal Potato Defects Using 2D RGB Imaging.","authors":"Slim Hamdi, Kais Loukil, Adem Haj Boubaker, Hichem Snoussi, Mohamed Abid","doi":"10.3390/jimaging11120431","DOIUrl":"10.3390/jimaging11120431","url":null,"abstract":"<p><p>The internal quality assessment of potato tubers is a crucial task in agro-laboratory processing. Traditional methods struggle to detect internal defects such as hollow heart, internal bruises, and insect galleries using only surface features. We present a novel, fully modular hybrid AI architecture designed for defect detection using RGB images of potato slices, suitable for integration in laboratory. Our pipeline combines high-recall multi-threshold YOLO detection, contextual patch validation using ResNet, precise segmentation via the Segment Anything Model (SAM), and skin-contact analysis using VGG16 with a Random Forest classifier. Experimental results on a labeled dataset of over 6000 annotated instances show a recall above 95% and precision near 97.2% for most defect classes. The approach offers both robustness and interpretability, outperforming previous methods that rely on costly hyperspectral or MRI techniques. This system is scalable, explainable, and compatible with existing 2D imaging hardware.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733781/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.3390/jimaging11120429
Adrien Deliège, Jeanne Marlot, Marc Van Droogenbroeck, Maria Giulia Dondero
Text-to-image generative models can be used to imitate historical artistic styles, but their effectiveness in doing so remains unclear. In this work, we propose an evaluation framework that leverages expert knowledge from art history and visual semiotics and combines it with quantitative analysis to assess stylistic fidelity. Three experts rated both historical artwork production and images generated with Midjourney v6 for five major movements (Abstract Art, Cubism, Expressionism, Impressionism, Surrealism) and ten associated painters (male and female pairs), using nine visual criteria grounded in Greimas's plastic categories and Wölfflin's stylistic oppositions. Ratings were expressed as 95% intervals on continuous 0-100 scales and compared using our Relative Ratings Map (RRMap), which summarizes relative shifts, relative dispersion, and distributional overlap (via the Bhattacharyya coefficient). They were also discretized in four quality ratings (bad, stereotype, fair, excellent). The results show strong inter-expert variability and more moderate intra-expert effects tied to movements, criteria, criterion groups and modalities. Experts tend to agree that the model sometimes aligns with historical trends but also sometimes produces stereotyped versions of a movement or painter, or even completely missed its target, although no unanimous consensus emerges. We conclude that evaluating generative models requires both expert-driven interpretation and quantitative tools, and that stylistic fidelity is hard to quantify even with a rigorous framework.
{"title":"How Good Is the Machine at the Imitation Game? On Stylistic Characteristics of AI-Generated Images.","authors":"Adrien Deliège, Jeanne Marlot, Marc Van Droogenbroeck, Maria Giulia Dondero","doi":"10.3390/jimaging11120429","DOIUrl":"10.3390/jimaging11120429","url":null,"abstract":"<p><p>Text-to-image generative models can be used to imitate historical artistic styles, but their effectiveness in doing so remains unclear. In this work, we propose an evaluation framework that leverages expert knowledge from art history and visual semiotics and combines it with quantitative analysis to assess stylistic fidelity. Three experts rated both historical artwork production and images generated with Midjourney v6 for five major movements (Abstract Art, Cubism, Expressionism, Impressionism, Surrealism) and ten associated painters (male and female pairs), using nine visual criteria grounded in Greimas's plastic categories and Wölfflin's stylistic oppositions. Ratings were expressed as 95% intervals on continuous 0-100 scales and compared using our Relative Ratings Map (RRMap), which summarizes relative shifts, relative dispersion, and distributional overlap (via the Bhattacharyya coefficient). They were also discretized in four quality ratings (bad, stereotype, fair, excellent). The results show strong inter-expert variability and more moderate intra-expert effects tied to movements, criteria, criterion groups and modalities. Experts tend to agree that the model sometimes aligns with historical trends but also sometimes produces stereotyped versions of a movement or painter, or even completely missed its target, although no unanimous consensus emerges. We conclude that evaluating generative models requires both expert-driven interpretation and quantitative tools, and that stylistic fidelity is hard to quantify even with a rigorous framework.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734345/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.3390/jimaging11120428
Marianno Franzini, Salvatore Chirumbolo, Francesco Vaiano, Luigi Valdenassi, Francesca Giannetti, Marianna Chierchia, Umberto Tirelli, Paolo Bonacina, Gianluca Poggi, Aniello Langella, Edoardo Maria Pieracci, Christian Giannetti, Roberto Antonio Giannetti
Oxygen-ozone (O2-O3) therapy is widely used for treating lumbar disc herniation. However, controversy remains regarding the safest and most effective route of administration. While intradiscal injection is purported to show clinical efficacy, it has also been associated with serious complications. In contrast, the intramuscular route can exhibit a more favourable safety profile and comparable pain outcomes, suggesting its potential as a safer alternative in selected patient populations. This mixed-method study combined computed tomography (CT) imaging, biophysical diffusion modelling, and a meta-analysis of clinical trials to evaluate whether intramuscular O2-O3 therapy can achieve disc penetration and therapeutic efficacy comparable to intradiscal nucleolysis, while minimizing procedural risk. Literature searches across PubMed, Scopus, and Cochrane databases identified seven eligible studies (four randomized controlled trials and three cohort studies), encompassing a total of 120 patients. Statistical analyses included Hedges' g, odds ratios, and number needed to harm (NNH). CT imaging demonstrated gas migration into the intervertebral disc within minutes after intramuscular injection, confirming the plausibility of diffusion through annular micro-fissures. The meta-analysis revealed substantial pain reduction with intramuscular therapy (Hedges' g = -1.55) and very high efficacy with intradiscal treatment (g = 2.87), though the latter was associated with significantly greater heterogeneity and higher complication rates. The relative risk of severe adverse events was 6.57 times higher for intradiscal procedures (NNH ≈ 1180). O2-O3 therapy offers a biologically plausible, safer, and effective alternative to intradiscal injection, supporting its adoption as a first-line, minimally invasive strategy for managing lumbar disc herniation.
氧-臭氧(O2-O3)疗法被广泛用于治疗腰椎间盘突出症。然而,关于最安全和最有效的给药途径仍然存在争议。虽然椎间盘内注射据称具有临床疗效,但它也与严重的并发症有关。相比之下,肌肉注射途径可以表现出更有利的安全性和可比的疼痛结果,这表明它在选定的患者群体中可能是一种更安全的选择。这项混合方法研究结合了计算机断层扫描(CT)成像、生物物理扩散模型和临床试验的荟萃分析,以评估肌肉注射O2-O3治疗是否可以实现椎间盘穿透和与椎间盘内核溶解相当的治疗效果,同时最大限度地降低手术风险。通过PubMed、Scopus和Cochrane数据库的文献检索,确定了7项符合条件的研究(4项随机对照试验和3项队列研究),共包括120名患者。统计分析包括对冲系数、优势比和需要伤害的数量(NNH)。CT成像显示气体在肌内注射后几分钟内迁移到椎间盘,证实了通过环形微裂隙扩散的可能性。荟萃分析显示,肌肉内治疗可显著减轻疼痛(Hedges' g = -1.55),椎间盘内治疗的疗效非常高(g = 2.87),尽管后者具有更大的异质性和更高的并发症发生率。严重不良事件的相对风险是椎间盘内手术的6.57倍(NNH≈1180)。O2-O3治疗提供了一种生物学上合理、更安全、有效的替代椎间盘内注射的方法,支持其作为治疗腰椎间盘突出症的一线微创策略。
{"title":"How Safe Are Oxygen-Ozone Therapy Procedures for Spine Disc Herniation? The SIOOT Protocols for Treating Spine Disorders.","authors":"Marianno Franzini, Salvatore Chirumbolo, Francesco Vaiano, Luigi Valdenassi, Francesca Giannetti, Marianna Chierchia, Umberto Tirelli, Paolo Bonacina, Gianluca Poggi, Aniello Langella, Edoardo Maria Pieracci, Christian Giannetti, Roberto Antonio Giannetti","doi":"10.3390/jimaging11120428","DOIUrl":"10.3390/jimaging11120428","url":null,"abstract":"<p><p>Oxygen-ozone (O<sub>2</sub>-O<sub>3</sub>) therapy is widely used for treating lumbar disc herniation. However, controversy remains regarding the safest and most effective route of administration. While intradiscal injection is purported to show clinical efficacy, it has also been associated with serious complications. In contrast, the intramuscular route can exhibit a more favourable safety profile and comparable pain outcomes, suggesting its potential as a safer alternative in selected patient populations. This mixed-method study combined computed tomography (CT) imaging, biophysical diffusion modelling, and a meta-analysis of clinical trials to evaluate whether intramuscular O<sub>2</sub>-O<sub>3</sub> therapy can achieve disc penetration and therapeutic efficacy comparable to intradiscal nucleolysis, while minimizing procedural risk. Literature searches across PubMed, Scopus, and Cochrane databases identified seven eligible studies (four randomized controlled trials and three cohort studies), encompassing a total of 120 patients. Statistical analyses included Hedges' g, odds ratios, and number needed to harm (NNH). CT imaging demonstrated gas migration into the intervertebral disc within minutes after intramuscular injection, confirming the plausibility of diffusion through annular micro-fissures. The meta-analysis revealed substantial pain reduction with intramuscular therapy (Hedges' g = -1.55) and very high efficacy with intradiscal treatment (g = 2.87), though the latter was associated with significantly greater heterogeneity and higher complication rates. The relative risk of severe adverse events was 6.57 times higher for intradiscal procedures (NNH ≈ 1180). O<sub>2</sub>-O<sub>3</sub> therapy offers a biologically plausible, safer, and effective alternative to intradiscal injection, supporting its adoption as a first-line, minimally invasive strategy for managing lumbar disc herniation.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.3390/jimaging11120427
Meilan Tang, Xinlian Zhou, Zhiyong Li
Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block-residual block cooperative architecture with striped pooling feature fusion to achieve multi-scale feature representation without requiring tumor masks. The model extracts fine-grained morphological features through three shallow VGG spatial blocks while capturing global contextual information between tumors and surrounding tissues via four deep ResNet residual blocks. Residual connections mitigate gradient vanishing. To effectively fuse multi-level features, strip pooling modules are introduced after the third spatial block and fourth residual block, enabling cross-layer feature integration-particularly optimizing representation of irregular tumor regions. The fused features undergo cross-scale concatenation, integrating both spatial perception and semantic information, and are ultimately classified via an end-to-end Softmax classifier. Experimental results demonstrate that the model achieves an accuracy of 97.29% in brain tumor image classification tasks, significantly outperforming traditional convolutional neural networks. This validates its effectiveness in achieving high-precision, multi-scale feature learning and classification without brain tumor masks, holding potential clinical application value.
{"title":"Brain Tumour Classification Model Based on Spatial Block-Residual Block Collaborative Architecture with Strip Pooling Feature Fusion.","authors":"Meilan Tang, Xinlian Zhou, Zhiyong Li","doi":"10.3390/jimaging11120427","DOIUrl":"10.3390/jimaging11120427","url":null,"abstract":"<p><p>Precise classification of brain tumors is crucial for early diagnosis and treatment, but obtaining tumor masks is extremely challenging, limiting the application of traditional methods. This paper proposes a brain tumor classification model based on whole-brain images, combining a spatial block-residual block cooperative architecture with striped pooling feature fusion to achieve multi-scale feature representation without requiring tumor masks. The model extracts fine-grained morphological features through three shallow VGG spatial blocks while capturing global contextual information between tumors and surrounding tissues via four deep ResNet residual blocks. Residual connections mitigate gradient vanishing. To effectively fuse multi-level features, strip pooling modules are introduced after the third spatial block and fourth residual block, enabling cross-layer feature integration-particularly optimizing representation of irregular tumor regions. The fused features undergo cross-scale concatenation, integrating both spatial perception and semantic information, and are ultimately classified via an end-to-end Softmax classifier. Experimental results demonstrate that the model achieves an accuracy of 97.29% in brain tumor image classification tasks, significantly outperforming traditional convolutional neural networks. This validates its effectiveness in achieving high-precision, multi-scale feature learning and classification without brain tumor masks, holding potential clinical application value.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.3390/jimaging11120423
Radwan Qasrawi, Suliman Thwib, Ghada Issa, Ibrahem Qdaih, Razan Abu Ghoush, Hamza Arjah
Background: The early and accurate classification of ischemic stroke stages on computed tomography (CT) remains challenging due to subtle attenuation differences and significant scanner variability. This study developed a neural network framework to dynamically optimize Hounsfield Unit (HU) transformations and CLAHE parameters for temporal stage-specific stroke classification.
Methods: We analyzed 1480 CT cases from 68 patients across five stages (hyperacute, acute, subacute, chronic, and normal). The training data were augmented via horizontal flipping, ±7° rotation. A convolutional neural network (CNN) was used to optimize linear transformation and CLAHE parameters through a combined loss function incorporating the effective measure of enhancement (EME), peak signal-to-noise ratio (PSNR), and regularization. the enhanced images were classified using logistic regression (LR), support vector machines (SVMs), and random forests (RFs) with 25-fold cross-validation. Model interpretability was evaluated using Grad-CAM.
Results: Neural network optimization significantly outperformed static parameters across validation metrics. Deep CLAHE achieved the following accuracies versus static CLAHE: hyperacute (0.9838 vs. 0.9754), acute (0.9904 vs. 0.9873), subacute (0.9948 vs. 0.9825), and chronic (near-perfect 0.9979 vs. 0.9808). Qualitative interpretability analysis confirmed that models focused on clinically relevant regions, with optimized enhancement producing more coherent attention patterns than static methods. Parameter analysis revealed stage-aware adaptation: conservative enhancement in early phases (slope: 1.249-1.257), maximized in subacute (slope: 1.290-1.292), and restrained in the chronic phase (slope: 1.240-1.258), reflecting underlying stroke pathophysiology.
Conclusions: A neural network-optimized framework with interpretability validation provides stage-specific stroke classification that achieves superior performance over static methods. Its pathophysiology-aligned parameter adaptation offers a clinically viable and transparent solution for emergency stroke assessment.
背景:由于细微的衰减差异和显著的扫描仪可变性,计算机断层扫描(CT)早期准确的缺血性脑卒中分期分类仍然具有挑战性。本研究开发了一种神经网络框架,用于动态优化Hounsfield Unit (HU)变换和CLAHE参数,用于特定时间阶段的脑卒中分类。方法:我们分析了68例患者的1480例CT病例,分为5个阶段(超急性、急性、亚急性、慢性和正常)。通过水平翻转、±7°旋转增强训练数据。利用卷积神经网络(CNN)通过结合增强(EME)、峰值信噪比(PSNR)和正则化的组合损失函数对线性变换和CLAHE参数进行优化。利用逻辑回归(LR)、支持向量机(svm)和随机森林(RFs)对增强后的图像进行分类,并进行25次交叉验证。使用Grad-CAM评估模型的可解释性。结果:神经网络优化在验证指标上明显优于静态参数。与静态CLAHE相比,深度CLAHE达到以下精度:超急性(0.9838比0.9754)、急性(0.9904比0.9873)、亚急性(0.9948比0.9825)和慢性(接近完美的0.9979比0.9808)。定性可解释性分析证实,模型专注于临床相关区域,与静态方法相比,优化增强产生了更连贯的注意力模式。参数分析显示阶段感知适应:早期保守增强(斜率:1.249-1.257),亚急性期最大(斜率:1.290-1.292),慢性期受限(斜率:1.240-1.258),反映了潜在的卒中病理生理。结论:具有可解释性验证的神经网络优化框架提供了特定阶段的脑卒中分类,比静态方法具有更好的性能。它的病理生理参数适应为紧急卒中评估提供了临床可行和透明的解决方案。
{"title":"Optimized Hounsfield Units Transformation for Explainable Temporal Stage-Specific Ischemic Stroke Classification in CT Imaging.","authors":"Radwan Qasrawi, Suliman Thwib, Ghada Issa, Ibrahem Qdaih, Razan Abu Ghoush, Hamza Arjah","doi":"10.3390/jimaging11120423","DOIUrl":"10.3390/jimaging11120423","url":null,"abstract":"<p><strong>Background: </strong>The early and accurate classification of ischemic stroke stages on computed tomography (CT) remains challenging due to subtle attenuation differences and significant scanner variability. This study developed a neural network framework to dynamically optimize Hounsfield Unit (HU) transformations and CLAHE parameters for temporal stage-specific stroke classification.</p><p><strong>Methods: </strong>We analyzed 1480 CT cases from 68 patients across five stages (hyperacute, acute, subacute, chronic, and normal). The training data were augmented via horizontal flipping, ±7° rotation. A convolutional neural network (CNN) was used to optimize linear transformation and CLAHE parameters through a combined loss function incorporating the effective measure of enhancement (EME), peak signal-to-noise ratio (PSNR), and regularization. the enhanced images were classified using logistic regression (LR), support vector machines (SVMs), and random forests (RFs) with 25-fold cross-validation. Model interpretability was evaluated using Grad-CAM.</p><p><strong>Results: </strong>Neural network optimization significantly outperformed static parameters across validation metrics. Deep CLAHE achieved the following accuracies versus static CLAHE: hyperacute (0.9838 vs. 0.9754), acute (0.9904 vs. 0.9873), subacute (0.9948 vs. 0.9825), and chronic (near-perfect 0.9979 vs. 0.9808). Qualitative interpretability analysis confirmed that models focused on clinically relevant regions, with optimized enhancement producing more coherent attention patterns than static methods. Parameter analysis revealed stage-aware adaptation: conservative enhancement in early phases (slope: 1.249-1.257), maximized in subacute (slope: 1.290-1.292), and restrained in the chronic phase (slope: 1.240-1.258), reflecting underlying stroke pathophysiology.</p><p><strong>Conclusions: </strong>A neural network-optimized framework with interpretability validation provides stage-specific stroke classification that achieves superior performance over static methods. Its pathophysiology-aligned parameter adaptation offers a clinically viable and transparent solution for emergency stroke assessment.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733839/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, deep learning networks have demonstrated remarkable progress in the semantic segmentation of historical documents. Nonetheless, their limited explainability remains a critical concern, as these models frequently operate as black boxes, thereby constraining confidence in the trustworthiness of their outputs. To enhance transparency and reliability in their deployment, increasing attention has been directed toward explainable artificial intelligence (XAI) techniques. These techniques typically produce fine-grained attribution maps in the form of heatmaps, illustrating feature contributions from different blocks and layers within a deep neural network (DNN). However, such maps often closely resemble the segmentation outputs themselves, and there is currently no consensus regarding appropriate explainability metrics for semantic segmentation. To overcome these challenges, we present SegClarity, a novel workflow designed to integrate explainability into the analysis of historical documents. The workflow combines visual and quantitative evaluations specifically tailored to segmentation-based applications. Furthermore, we introduce the Attribution Concordance Score (ACS), a new explainability metric that provides quantitative insights into the consistency and reliability of attribution maps. To evaluate the effectiveness of our approach, we conducted extensive qualitative and quantitative experiments using two datasets of historical document images, two U-Net model variants, and four attribution-based XAI methods. A qualitative assessment involved four XAI methods across multiple U-Net layers, including comparisons at the input level with state-of-the-art perturbation methods RISE and MiSuRe. Quantitatively, five XAI evaluation metrics were employed to benchmark these approaches comprehensively. Beyond historical document analysis, we further validated the workflow's generalization by demonstrating its transferability to the Cityscapes dataset, a challenging benchmark for urban scene segmentation. The results demonstrate that the proposed workflow substantially improves the interpretability and reliability of deep learning models applied to the semantic segmentation of historical documents. To enhance reproducibility, we have released SegClarity's source code along with interactive examples of the proposed workflow.
{"title":"SegClarity: An Attribution-Based XAI Workflow for Evaluating Historical Document Layout Models.","authors":"Iheb Brini, Najoua Rahal, Maroua Mehri, Rolf Ingold, Najoua Essoukri Ben Amara","doi":"10.3390/jimaging11120424","DOIUrl":"10.3390/jimaging11120424","url":null,"abstract":"<p><p>In recent years, deep learning networks have demonstrated remarkable progress in the semantic segmentation of historical documents. Nonetheless, their limited explainability remains a critical concern, as these models frequently operate as black boxes, thereby constraining confidence in the trustworthiness of their outputs. To enhance transparency and reliability in their deployment, increasing attention has been directed toward explainable artificial intelligence (XAI) techniques. These techniques typically produce fine-grained attribution maps in the form of heatmaps, illustrating feature contributions from different blocks and layers within a deep neural network (DNN). However, such maps often closely resemble the segmentation outputs themselves, and there is currently no consensus regarding appropriate explainability metrics for semantic segmentation. To overcome these challenges, we present SegClarity, a novel workflow designed to integrate explainability into the analysis of historical documents. The workflow combines visual and quantitative evaluations specifically tailored to segmentation-based applications. Furthermore, we introduce the Attribution Concordance Score (ACS), a new explainability metric that provides quantitative insights into the consistency and reliability of attribution maps. To evaluate the effectiveness of our approach, we conducted extensive qualitative and quantitative experiments using two datasets of historical document images, two U-Net model variants, and four attribution-based XAI methods. A qualitative assessment involved four XAI methods across multiple U-Net layers, including comparisons at the input level with state-of-the-art perturbation methods RISE and MiSuRe. Quantitatively, five XAI evaluation metrics were employed to benchmark these approaches comprehensively. Beyond historical document analysis, we further validated the workflow's generalization by demonstrating its transferability to the Cityscapes dataset, a challenging benchmark for urban scene segmentation. The results demonstrate that the proposed workflow substantially improves the interpretability and reliability of deep learning models applied to the semantic segmentation of historical documents. To enhance reproducibility, we have released SegClarity's source code along with interactive examples of the proposed workflow.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}