Pub Date : 2025-12-12DOI: 10.3390/jimaging11120445
Shuo Xu, Hang Su, Lei Zhao
Microscopic wood images are vital in wood analysis and classification research. However, the high cost of acquiring microscopic images and the limitations of experimental conditions have led to a severe problem of insufficient sample data, which significantly restricts the training performance and generalization ability of deep learning models. This study first used basic image processing techniques to perform preliminary augmentation of the original dataset. The augmented data were then input into five GAN models, BGAN, DCGAN, WGAN-GP, LSGAN, and StyleGAN2, for training. The quality and model performance of the generated images were assessed by analyzing the degree of fidelity of cellular structure (e.g., earlywood, latewood, and wood rays), image clarity, and diversity of the images for each model-generated image, as well as by using KID, IS, and SSIM. The results showed that images generated by BGAN and WGAN-GP exhibited high quality, with lower KID values and higher IS values, and the generated images were visually close to real images. In contrast, the DCGAN, LSGAN, and StyleGAN2 models experienced mode collapse during training, resulting in lower image clarity and diversity compared to the other models. Through a comparative analysis of different GAN models, this study demonstrates the feasibility and effectiveness of Generative Adversarial Networks in the domain of small-sample image data augmentation, providing an important reference for further research in the field of wood identification.
{"title":"Research on Augmentation of Wood Microscopic Image Dataset Based on Generative Adversarial Networks.","authors":"Shuo Xu, Hang Su, Lei Zhao","doi":"10.3390/jimaging11120445","DOIUrl":"10.3390/jimaging11120445","url":null,"abstract":"<p><p>Microscopic wood images are vital in wood analysis and classification research. However, the high cost of acquiring microscopic images and the limitations of experimental conditions have led to a severe problem of insufficient sample data, which significantly restricts the training performance and generalization ability of deep learning models. This study first used basic image processing techniques to perform preliminary augmentation of the original dataset. The augmented data were then input into five GAN models, BGAN, DCGAN, WGAN-GP, LSGAN, and StyleGAN2, for training. The quality and model performance of the generated images were assessed by analyzing the degree of fidelity of cellular structure (e.g., earlywood, latewood, and wood rays), image clarity, and diversity of the images for each model-generated image, as well as by using KID, IS, and SSIM. The results showed that images generated by BGAN and WGAN-GP exhibited high quality, with lower KID values and higher IS values, and the generated images were visually close to real images. In contrast, the DCGAN, LSGAN, and StyleGAN2 models experienced mode collapse during training, resulting in lower image clarity and diversity compared to the other models. Through a comparative analysis of different GAN models, this study demonstrates the feasibility and effectiveness of Generative Adversarial Networks in the domain of small-sample image data augmentation, providing an important reference for further research in the field of wood identification.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733676/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fetal ventriculomegaly (VM) is a condition characterized by abnormal enlargement of the cerebral ventricles of the fetus brain that often causes developmental disorders in children. Manual segmentation and classification of ventricular structures from brain MRI scans are time-consuming and require clinical expertise. To address this challenge, we develop an automated pipeline for ventricle segmentation, ventricular width estimation, and VM severity classification using a publicly available dataset. An adaptive slice selection strategy converts 3D MRI volumes into the most informative 2D slices, which are then segmented to isolate the lateral ventricles and deep gray matter. Ventricular width is automatically estimated to assign severity levels based on clinical thresholds, generating labeled data for training a deep learning classifier. Finally, an explainability module using a large language model integrates the MRI slices, segmentation masks, and predicted severity to provide interpretable clinical reasoning. Experimental results demonstrate that the proposed decision support system delivers robust performance, achieving dice scores of 89% and 87.5% for the 2D and 3D segmentation models, respectively. Also, the classification network attains an accuracy of 86% and an F1-score of 0.84 in VM analysis.
{"title":"AI-Driven Clinical Decision Support System for Automated Ventriculomegaly Classification from Fetal Brain MRI.","authors":"Mannam Subbarao, Simi Surendran, Seena Thomas, Hemanth Lakshman, Vinjanampati Goutham, Keshagani Goud, Suhas Udayakumaran","doi":"10.3390/jimaging11120444","DOIUrl":"10.3390/jimaging11120444","url":null,"abstract":"<p><p>Fetal ventriculomegaly (VM) is a condition characterized by abnormal enlargement of the cerebral ventricles of the fetus brain that often causes developmental disorders in children. Manual segmentation and classification of ventricular structures from brain MRI scans are time-consuming and require clinical expertise. To address this challenge, we develop an automated pipeline for ventricle segmentation, ventricular width estimation, and VM severity classification using a publicly available dataset. An adaptive slice selection strategy converts 3D MRI volumes into the most informative 2D slices, which are then segmented to isolate the lateral ventricles and deep gray matter. Ventricular width is automatically estimated to assign severity levels based on clinical thresholds, generating labeled data for training a deep learning classifier. Finally, an explainability module using a large language model integrates the MRI slices, segmentation masks, and predicted severity to provide interpretable clinical reasoning. Experimental results demonstrate that the proposed decision support system delivers robust performance, achieving dice scores of 89% and 87.5% for the 2D and 3D segmentation models, respectively. Also, the classification network attains an accuracy of 86% and an F1-score of 0.84 in VM analysis.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.3390/jimaging11120446
Némo Bouillon, Vincent Boitier
Accurate sky-obstacle segmentation in hemispherical fisheye imagery is essential for solar irradiance forecasting, photovoltaic system design, and environmental monitoring. However, existing methods often rely on expensive all-sky imagers and region-specific training data, produce coarse sky-obstacle boundaries, and ignore the optical properties of fisheye lenses. We propose a low-cost segmentation framework designed for fisheye imagery that combines synthetic data generation, lens-aware augmentation, and a hybrid deep-learning pipeline. Synthetic fisheye training images are created from publicly available street-view panoramas to cover diverse environments without dedicated hardware, and lens-aware augmentations model fisheye projection and photometric effects to improve robustness across devices. On this dataset, we train a convolutional neural network (CNN) and refine its output with gradient-boosted decision trees (GBDT) to sharpen sky-obstacle boundaries. The method is evaluated on real fisheye images captured with smartphones and low-cost clip-on lenses across multiple sites, achieving an Intersection over Union (IoU) of 96.63% and an F1 score of 98.29%, along with high boundary accuracy. An additional evaluation on an external panoramic baseline dataset confirms strong cross-dataset generalization. Together, these results show that the proposed framework enables accurate, low-cost, and widely deployable hemispherical sky segmentation for practical solar and environmental imaging applications.
{"title":"Pixel-Wise Sky-Obstacle Segmentation in Fisheye Imagery Using Deep Learning and Gradient Boosting.","authors":"Némo Bouillon, Vincent Boitier","doi":"10.3390/jimaging11120446","DOIUrl":"10.3390/jimaging11120446","url":null,"abstract":"<p><p>Accurate sky-obstacle segmentation in hemispherical fisheye imagery is essential for solar irradiance forecasting, photovoltaic system design, and environmental monitoring. However, existing methods often rely on expensive all-sky imagers and region-specific training data, produce coarse sky-obstacle boundaries, and ignore the optical properties of fisheye lenses. We propose a low-cost segmentation framework designed for fisheye imagery that combines synthetic data generation, lens-aware augmentation, and a hybrid deep-learning pipeline. Synthetic fisheye training images are created from publicly available street-view panoramas to cover diverse environments without dedicated hardware, and lens-aware augmentations model fisheye projection and photometric effects to improve robustness across devices. On this dataset, we train a convolutional neural network (CNN) and refine its output with gradient-boosted decision trees (GBDT) to sharpen sky-obstacle boundaries. The method is evaluated on real fisheye images captured with smartphones and low-cost clip-on lenses across multiple sites, achieving an Intersection over Union (IoU) of 96.63% and an F1 score of 98.29%, along with high boundary accuracy. An additional evaluation on an external panoramic baseline dataset confirms strong cross-dataset generalization. Together, these results show that the proposed framework enables accurate, low-cost, and widely deployable hemispherical sky segmentation for practical solar and environmental imaging applications.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.3390/jimaging11120447
Zhen Li, Yuxuan Wang, Lingzhong Meng, Wenjuan Chu, Guang Yang
Object detection in complex environments, such as challenging lighting conditions, adverse weather, and target occlusions, poses significant difficulties for existing algorithms. To address these challenges, this study introduces a collaborative solution integrating improved CycleGAN-based data augmentation and an enhanced object detection framework, AS-YOLO. The improved CycleGAN incorporates a dual self-attention mechanism and spectral normalization to enhance feature capture and training stability. The AS-YOLO framework integrates a channel-spatial parallel attention mechanism, an AFPN structure for improved feature fusion, and the Inner_IoU loss function for better generalization. The experimental results show that compared with YOLOv8n, mAP@0.5 and mAP@0.95 of the AS-YOLO algorithm have increased by 1.5% and 0.6%, respectively. After data augmentation and style transfer, mAP@0.5 and mAP@0.95 have increased by 14.6% and 17.8%, respectively, demonstrating the effectiveness of the proposed method in improving the performance of the model in complex scenarios.
{"title":"Enhanced Object Detection Algorithms in Complex Environments via Improved CycleGAN Data Augmentation and AS-YOLO Framework.","authors":"Zhen Li, Yuxuan Wang, Lingzhong Meng, Wenjuan Chu, Guang Yang","doi":"10.3390/jimaging11120447","DOIUrl":"10.3390/jimaging11120447","url":null,"abstract":"<p><p>Object detection in complex environments, such as challenging lighting conditions, adverse weather, and target occlusions, poses significant difficulties for existing algorithms. To address these challenges, this study introduces a collaborative solution integrating improved CycleGAN-based data augmentation and an enhanced object detection framework, AS-YOLO. The improved CycleGAN incorporates a dual self-attention mechanism and spectral normalization to enhance feature capture and training stability. The AS-YOLO framework integrates a channel-spatial parallel attention mechanism, an AFPN structure for improved feature fusion, and the Inner_IoU loss function for better generalization. The experimental results show that compared with YOLOv8n, mAP@0.5 and mAP@0.95 of the AS-YOLO algorithm have increased by 1.5% and 0.6%, respectively. After data augmentation and style transfer, mAP@0.5 and mAP@0.95 have increased by 14.6% and 17.8%, respectively, demonstrating the effectiveness of the proposed method in improving the performance of the model in complex scenarios.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734041/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.3390/jimaging11120443
Zinuo Peng, Shuxian Liu, Chenhao Li
In the realm of medical image processing, the segmentation of dermatological lesions is a pivotal technique for the early detection of skin cancer. However, existing methods for segmenting images of skin lesions often encounter limitations when dealing with intricate boundaries and diverse lesion shapes. To address these challenges, we propose VMPANet, designed to accurately localize critical targets and capture edge structures. VMPANet employs an inverted pyramid convolution to extract multi-scale features while utilizing the visual Mamba module to capture long-range dependencies among image features. Additionally, we leverage previously extracted masks as cues to facilitate efficient feature propagation. Furthermore, VMPANet integrates parallel depthwise separable convolutions to enhance feature extraction and introduces innovative mechanisms for edge enhancement, spatial attention, and channel attention to adaptively extract edge information and complex spatial relationships. Notably, VMPANet refines a novel cross-attention mechanism, which effectively facilitates the interaction between deep semantic cues and shallow texture details, thereby generating comprehensive feature representations while reducing computational load and redundancy. We conducted comparative and ablation experiments on two public skin lesion datasets (ISIC2017 and ISIC2018). The results demonstrate that VMPANet outperforms existing mainstream methods. On the ISIC2017 dataset, its mIoU and DSC metrics are 1.38% and 0.83% higher than those of VM-Unet respectively; on the ISIC2018 dataset, these metrics are 1.10% and 0.67% higher than those of EMCAD, respectively. Moreover, VMPANet boasts a parameter count of only 0.383 M and a computational load of 1.159 GFLOPs.
{"title":"VMPANet: Vision Mamba Skin Lesion Image Segmentation Model Based on Prompt and Attention Mechanism Fusion.","authors":"Zinuo Peng, Shuxian Liu, Chenhao Li","doi":"10.3390/jimaging11120443","DOIUrl":"10.3390/jimaging11120443","url":null,"abstract":"<p><p>In the realm of medical image processing, the segmentation of dermatological lesions is a pivotal technique for the early detection of skin cancer. However, existing methods for segmenting images of skin lesions often encounter limitations when dealing with intricate boundaries and diverse lesion shapes. To address these challenges, we propose VMPANet, designed to accurately localize critical targets and capture edge structures. VMPANet employs an inverted pyramid convolution to extract multi-scale features while utilizing the visual Mamba module to capture long-range dependencies among image features. Additionally, we leverage previously extracted masks as cues to facilitate efficient feature propagation. Furthermore, VMPANet integrates parallel depthwise separable convolutions to enhance feature extraction and introduces innovative mechanisms for edge enhancement, spatial attention, and channel attention to adaptively extract edge information and complex spatial relationships. Notably, VMPANet refines a novel cross-attention mechanism, which effectively facilitates the interaction between deep semantic cues and shallow texture details, thereby generating comprehensive feature representations while reducing computational load and redundancy. We conducted comparative and ablation experiments on two public skin lesion datasets (ISIC2017 and ISIC2018). The results demonstrate that VMPANet outperforms existing mainstream methods. On the ISIC2017 dataset, its mIoU and DSC metrics are 1.38% and 0.83% higher than those of VM-Unet respectively; on the ISIC2018 dataset, these metrics are 1.10% and 0.67% higher than those of EMCAD, respectively. Moreover, VMPANet boasts a parameter count of only 0.383 M and a computational load of 1.159 GFLOPs.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.3390/jimaging11120442
Paul Matteschk, Max Aragón, Jose Gomez, Jacob K Thorning, Stefanie Meilinger, Sebastian Houben
All-sky imagers (ASIs) used in solar energy meteorology face an extreme intra-image dynamic range, with the circumsolar neighborhood orders of magnitude brighter than the diffuse dome. Many operational ASI pipelines address this gap with high-dynamic-range (HDR) bracketing inside the camera's image signal processor (ISP), i.e., after demosaicing and color processing in a nonlinear 8-bit RGB domain. Near the Sun, such ISP-domain HDR can down-weight the shortest exposure, retain clipped or near-clipped samples from longer frames, and compress highlight contrast, thereby increasing circumsolar saturation and flattening aureole gradients. A radiance-linear HDR fusion in the sensor/RAW domain (RAW-HDR) is therefore contrasted with the vendor ISP-based HDR mode (ISP-HDR). Solar-based geometric calibration enables Sun-centered analysis. Paired, interleaved acquisitions under clear-sky and broken-cloud conditions are evaluated using two circumsolar performance criteria per RGB channel: (i) saturated-area fraction in concentric rings and (ii) a median-based radial gradient in defined arcs. All quantitative analyses operate on the radiance-linear HDR result; post-merge tone mapping is only used for visualization. Across conditions, ISP-HDR exhibits roughly double the near-saturation within 0-4° of the Sun and about a three- to fourfold weaker circumsolar radial gradient within 0-6° relative to RAW-HDR. These findings indicate that radiance-linear fusion in the RAW domain better preserves circumsolar structure than the examined ISP-domain HDR mode and thus provides more suitable input for downstream tasks such as cloud-edge detection, aerosol retrieval, and irradiance estimation.
{"title":"HDR Merging of RAW Exposure Series for All-Sky Cameras: A Comparative Study for Circumsolar Radiometry.","authors":"Paul Matteschk, Max Aragón, Jose Gomez, Jacob K Thorning, Stefanie Meilinger, Sebastian Houben","doi":"10.3390/jimaging11120442","DOIUrl":"10.3390/jimaging11120442","url":null,"abstract":"<p><p>All-sky imagers (ASIs) used in solar energy meteorology face an extreme intra-image dynamic range, with the circumsolar neighborhood orders of magnitude brighter than the diffuse dome. Many operational ASI pipelines address this gap with high-dynamic-range (HDR) bracketing inside the camera's image signal processor (ISP), i.e., after demosaicing and color processing in a nonlinear 8-bit RGB domain. Near the Sun, such ISP-domain HDR can down-weight the shortest exposure, retain clipped or near-clipped samples from longer frames, and compress highlight contrast, thereby increasing circumsolar saturation and flattening aureole gradients. A radiance-linear HDR fusion in the sensor/RAW domain (RAW-HDR) is therefore contrasted with the vendor ISP-based HDR mode (ISP-HDR). Solar-based geometric calibration enables Sun-centered analysis. Paired, interleaved acquisitions under clear-sky and broken-cloud conditions are evaluated using two circumsolar performance criteria per RGB channel: (i) saturated-area fraction in concentric rings and (ii) a median-based radial gradient in defined arcs. All quantitative analyses operate on the radiance-linear HDR result; post-merge tone mapping is only used for visualization. Across conditions, ISP-HDR exhibits roughly double the near-saturation within 0-4° of the Sun and about a three- to fourfold weaker circumsolar radial gradient within 0-6° relative to RAW-HDR. These findings indicate that radiance-linear fusion in the RAW domain better preserves circumsolar structure than the examined ISP-domain HDR mode and thus provides more suitable input for downstream tasks such as cloud-edge detection, aerosol retrieval, and irradiance estimation.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, have shown promise for automated plant disease identification, although they still face significant obstacles. These include poor generalization across complicated visual backdrops, limited resilience to different illness sizes, and high processing needs that make deployment on resource-constrained edge devices difficult. We suggest a Hybrid Multi-Scale Neural Network (HMCT-AF with GSAF) architecture for precise and effective fruit crop disease identification in order to overcome these drawbacks. In order to extract long-range dependencies, HMCT-AF with GSAF combines a Vision Transformer-based structural branch with multi-scale convolutional branches to capture both high-level contextual patterns and fine-grained local information. These disparate features are adaptively combined using a novel HMCT-AF with a GSAF module, which enhances model interpretability and classification performance. We conduct evaluations on both PlantVillage (controlled environment) and CLD (real-world in-field conditions), observing consistent performance gains that indicate strong resilience to natural lighting variations and background complexity. With an accuracy of up to 93.79%, HMCT-AF with GSAF outperforms vanilla Transformer models, EfficientNet, and traditional CNNs. These findings demonstrate how well the model captures scale-variant disease symptoms and how it may be used in real-time agricultural applications using hardware that is compatible with the edge. According to our research, HMCT-AF with GSAF presents a viable basis for intelligent, scalable plant disease monitoring systems in contemporary precision farming.
{"title":"Hybrid Multi-Scale Neural Network with Attention-Based Fusion for Fruit Crop Disease Identification.","authors":"Shakhmaran Seilov, Akniyet Nurzhaubayev, Marat Baideldinov, Bibinur Zhursinbek, Medet Ashimgaliyev, Ainur Zhumadillayeva","doi":"10.3390/jimaging11120440","DOIUrl":"10.3390/jimaging11120440","url":null,"abstract":"<p><p>Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, have shown promise for automated plant disease identification, although they still face significant obstacles. These include poor generalization across complicated visual backdrops, limited resilience to different illness sizes, and high processing needs that make deployment on resource-constrained edge devices difficult. We suggest a Hybrid Multi-Scale Neural Network (HMCT-AF with GSAF) architecture for precise and effective fruit crop disease identification in order to overcome these drawbacks. In order to extract long-range dependencies, HMCT-AF with GSAF combines a Vision Transformer-based structural branch with multi-scale convolutional branches to capture both high-level contextual patterns and fine-grained local information. These disparate features are adaptively combined using a novel HMCT-AF with a GSAF module, which enhances model interpretability and classification performance. We conduct evaluations on both PlantVillage (controlled environment) and CLD (real-world in-field conditions), observing consistent performance gains that indicate strong resilience to natural lighting variations and background complexity. With an accuracy of up to 93.79%, HMCT-AF with GSAF outperforms vanilla Transformer models, EfficientNet, and traditional CNNs. These findings demonstrate how well the model captures scale-variant disease symptoms and how it may be used in real-time agricultural applications using hardware that is compatible with the edge. According to our research, HMCT-AF with GSAF presents a viable basis for intelligent, scalable plant disease monitoring systems in contemporary precision farming.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10DOI: 10.3390/jimaging11120439
Julio Antonio Laria Pino, Jesús David Terán Villanueva, Julio Laria Menchaca, Leobardo Garcia Solorio, Salvador Ibarra Martínez, Mirna Patricia Ponce Flores, Aurelio Alejandro Santiago Pineda
One of the most important activities in any oyster farm is the measurement of oyster size; this activity is time-consuming and conducted manually, generally using a caliper, which leads to high measurement variability. This paper proposes a methodology to count and obtain the length and width averages of a sample of oysters from an image, relying on artificial intelligence (AI), which refers to systems capable of learning and decision-making, and computer vision (CV), which enables the extraction of information from digital images. The proposed approach employs the DBScan clustering algorithm, an artificial neural network (ANN), and a random forest classifier to enable automatic oyster classification, counting, and size estimation from images. As a result of the proposed methodology, the speed in measuring the length and width of the oysters was 86.7 times faster than manual measurement. Regarding the counting, the process missed the total count of oysters in two of the ten images. These results demonstrate the feasibility of using the proposed methodology to measure oyster size and count in oyster farms.
{"title":"Application of Artificial Intelligence and Computer Vision for Measuring and Counting Oysters.","authors":"Julio Antonio Laria Pino, Jesús David Terán Villanueva, Julio Laria Menchaca, Leobardo Garcia Solorio, Salvador Ibarra Martínez, Mirna Patricia Ponce Flores, Aurelio Alejandro Santiago Pineda","doi":"10.3390/jimaging11120439","DOIUrl":"10.3390/jimaging11120439","url":null,"abstract":"<p><p>One of the most important activities in any oyster farm is the measurement of oyster size; this activity is time-consuming and conducted manually, generally using a caliper, which leads to high measurement variability. This paper proposes a methodology to count and obtain the length and width averages of a sample of oysters from an image, relying on artificial intelligence (AI), which refers to systems capable of learning and decision-making, and computer vision (CV), which enables the extraction of information from digital images. The proposed approach employs the DBScan clustering algorithm, an artificial neural network (ANN), and a random forest classifier to enable automatic oyster classification, counting, and size estimation from images. As a result of the proposed methodology, the speed in measuring the length and width of the oysters was 86.7 times faster than manual measurement. Regarding the counting, the process missed the total count of oysters in two of the ten images. These results demonstrate the feasibility of using the proposed methodology to measure oyster size and count in oyster farms.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10DOI: 10.3390/jimaging11120441
Xin Li, Baile Sun
A major computational bottleneck in classifying large-scale hyperspectral images (HSI) is the mandatory data decompression prior to processing. Compressed-domain computing offers a solution by enabling deep learning on partially compressed data. However, existing compressed-domain methods are predominantly tailored for the Discrete Cosine Transform (DCT) used in natural images, while HSIs are typically compressed using the Discrete Wavelet Transform (DWT). The fundamental structural mismatch between the block-based DCT and the hierarchical DWT sub-bands presents two core challenges: how to extract features from multiple wavelet sub-bands, and how to fuse these features effectively? To address these issues, we propose a novel framework that extracts and fuses features from different DWT sub-bands directly. We design a multi-branch feature extractor with sub-band feature alignment loss that processes functionally different sub-bands in parallel, preserving the independence of each frequency feature. We then employ a sub-band cross-attention mechanism that inverts the typical attention paradigm by using the sparse, high-frequency detail sub-bands as queries to adaptively select and enhance salient features from the dense, information-rich low-frequency sub-bands. This enables a targeted fusion of global context and fine-grained structural information without data reconstruction. Experiments on three benchmark datasets demonstrate that our method achieves classification accuracy comparable to state-of-the-art spatial-domain approaches while eliminating at least 56% of the decompression overhead.
{"title":"WaveletHSI: Direct HSI Classification from Compressed Wavelet Coefficients via Sub-Band Feature Extraction and Fusion.","authors":"Xin Li, Baile Sun","doi":"10.3390/jimaging11120441","DOIUrl":"10.3390/jimaging11120441","url":null,"abstract":"<p><p>A major computational bottleneck in classifying large-scale hyperspectral images (HSI) is the mandatory data decompression prior to processing. Compressed-domain computing offers a solution by enabling deep learning on partially compressed data. However, existing compressed-domain methods are predominantly tailored for the Discrete Cosine Transform (DCT) used in natural images, while HSIs are typically compressed using the Discrete Wavelet Transform (DWT). The fundamental structural mismatch between the block-based DCT and the hierarchical DWT sub-bands presents two core challenges: how to extract features from multiple wavelet sub-bands, and how to fuse these features effectively? To address these issues, we propose a novel framework that extracts and fuses features from different DWT sub-bands directly. We design a multi-branch feature extractor with sub-band feature alignment loss that processes functionally different sub-bands in parallel, preserving the independence of each frequency feature. We then employ a sub-band cross-attention mechanism that inverts the typical attention paradigm by using the sparse, high-frequency detail sub-bands as queries to adaptively select and enhance salient features from the dense, information-rich low-frequency sub-bands. This enables a targeted fusion of global context and fine-grained structural information without data reconstruction. Experiments on three benchmark datasets demonstrate that our method achieves classification accuracy comparable to state-of-the-art spatial-domain approaches while eliminating at least 56% of the decompression overhead.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12733817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.3390/jimaging11120438
Kyuseok Kim, Ji-Youn Kim
Accurate intracranial artery segmentation from digital subtraction angiography (DSA) is critical for neurovascular diagnosis and intervention planning. Vascular extraction, which combines preprocessing methods and deep learning models, yields a high level of results, but limited preprocessing results constrain the improvement of results. We propose a texture-based contrast enhancement preprocessing framework integrated with the nnU-Net model to improve vessel segmentation in time-sequential DSA images. The method generates a combined feature mask by fusing local contrast, local entropy, and brightness threshold maps, which is then used as input for deep learning-based segmentation. Segmentation performance was evaluated using the DIAS dataset with various standard quantitative metrics. The proposed preprocessing significantly improved segmentation across all metrics compared to both the baseline and contrast-limited adaptive histogram equalization (CLAHE). Using nnU-Net, the method achieved a Dice Similarity Coefficient (DICE) of 0.83 ± 0.20 and an Intersection over Union (IoU) of 0.72 ± 0.14, outperforming CLAHE (DICE 0.79 ± 0.41, IoU 0.70 ± 0.23) and the baseline (DICE 0.65 ± 0.15, IoU 0.47 ± 0.20). Most notably, vessel connectivity (VC) dropped by over 65% relative to unprocessed images, indicating marked improvements in VC and topological accuracy. This study demonstrates that combining texture-based preprocessing with nnU-Net delivers robust, noise-tolerant, and clinically interpretable segmentation of intracranial arteries from DSA.
{"title":"Texture-Based Preprocessing Framework with nnU-Net Model for Accurate Intracranial Artery Segmentation.","authors":"Kyuseok Kim, Ji-Youn Kim","doi":"10.3390/jimaging11120438","DOIUrl":"10.3390/jimaging11120438","url":null,"abstract":"<p><p>Accurate intracranial artery segmentation from digital subtraction angiography (DSA) is critical for neurovascular diagnosis and intervention planning. Vascular extraction, which combines preprocessing methods and deep learning models, yields a high level of results, but limited preprocessing results constrain the improvement of results. We propose a texture-based contrast enhancement preprocessing framework integrated with the nnU-Net model to improve vessel segmentation in time-sequential DSA images. The method generates a combined feature mask by fusing local contrast, local entropy, and brightness threshold maps, which is then used as input for deep learning-based segmentation. Segmentation performance was evaluated using the DIAS dataset with various standard quantitative metrics. The proposed preprocessing significantly improved segmentation across all metrics compared to both the baseline and contrast-limited adaptive histogram equalization (CLAHE). Using nnU-Net, the method achieved a Dice Similarity Coefficient (DICE) of 0.83 ± 0.20 and an Intersection over Union (IoU) of 0.72 ± 0.14, outperforming CLAHE (DICE 0.79 ± 0.41, IoU 0.70 ± 0.23) and the baseline (DICE 0.65 ± 0.15, IoU 0.47 ± 0.20). Most notably, vessel connectivity (VC) dropped by over 65% relative to unprocessed images, indicating marked improvements in VC and topological accuracy. This study demonstrates that combining texture-based preprocessing with nnU-Net delivers robust, noise-tolerant, and clinically interpretable segmentation of intracranial arteries from DSA.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 12","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12734170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}