Pub Date : 2025-12-27DOI: 10.3390/jimaging12010013
Parisa Kaviani, Matthias F Froelich, Bernardo Bizzo, Andrew Primak, Giridhar Dasegowda, Emiliano Garza-Frias, Lina Karout, Anushree Burade, Seyedehelaheh Hosseini, Javier Eduardo Contreras Yametti, Keith Dreyer, Sanjay Saini, Mannudeep Kalra
This retrospective diagnostic accuracy study compared radiologist-based qualitative assessments and radiomics-based analyses with an automated artificial intelligence (AI)-based volumetric approach for evaluating changes in kidney stone burden on follow-up CT examinations. With institutional review board approval, 157 patients (mean age, 61 ± 13 years; 99 men, 58 women) who underwent baseline and follow-up non-contrast abdomen-pelvis CT for kidney stone evaluation were included. The index test was an automated AI-based whole-kidney and stone segmentation radiomics prototype (Frontier, Siemens Healthineers), which segmented both kidneys and isolated stone volumes using a fixed threshold of 130 Hounsfield units, providing stone volume and maximum diameter per kidney. The reference standard was a threshold-defined volumetric assessment of stone burden change between baseline and follow-up CTs. The radiologist's performance was assessed using (1) interpretations from clinical radiology reports and (2) an independent radiologist's assessment of stone burden change (stable, increased, or decreased). Diagnostic accuracy was evaluated using multivariable logistic regression and receiver operating characteristic (ROC) analysis. Automated volumetric assessment identified stable (n = 44), increased (n = 109), and decreased (n = 108) stone burden across the evaluated kidneys. Qualitative assessments from radiology reports demonstrated weak diagnostic performance (AUC range, 0.55-0.62), similar to the independent radiologist (AUC range, 0.41-0.72) for differentiating changes in stone burden. A model incorporating higher-order radiomics features achieved an AUC of 0.71 for distinguishing increased versus decreased stone burdens compared with the baseline CT (p < 0.001), but did not outperform threshold-based volumetric assessment. The automated threshold-based volumetric quantification of kidney stone burdens provides higher diagnostic accuracy than qualitative radiologist assessments and radiomics-based analyses for identifying a stable, increased, or decreased stone burden on follow-up CT examinations.
{"title":"Assessing Change in Stone Burden on Baseline and Follow-Up CT: Radiologist and Radiomics Evaluations.","authors":"Parisa Kaviani, Matthias F Froelich, Bernardo Bizzo, Andrew Primak, Giridhar Dasegowda, Emiliano Garza-Frias, Lina Karout, Anushree Burade, Seyedehelaheh Hosseini, Javier Eduardo Contreras Yametti, Keith Dreyer, Sanjay Saini, Mannudeep Kalra","doi":"10.3390/jimaging12010013","DOIUrl":"10.3390/jimaging12010013","url":null,"abstract":"<p><p>This retrospective diagnostic accuracy study compared radiologist-based qualitative assessments and radiomics-based analyses with an automated artificial intelligence (AI)-based volumetric approach for evaluating changes in kidney stone burden on follow-up CT examinations. With institutional review board approval, 157 patients (mean age, 61 ± 13 years; 99 men, 58 women) who underwent baseline and follow-up non-contrast abdomen-pelvis CT for kidney stone evaluation were included. The index test was an automated AI-based whole-kidney and stone segmentation radiomics prototype (Frontier, Siemens Healthineers), which segmented both kidneys and isolated stone volumes using a fixed threshold of 130 Hounsfield units, providing stone volume and maximum diameter per kidney. The reference standard was a threshold-defined volumetric assessment of stone burden change between baseline and follow-up CTs. The radiologist's performance was assessed using (1) interpretations from clinical radiology reports and (2) an independent radiologist's assessment of stone burden change (stable, increased, or decreased). Diagnostic accuracy was evaluated using multivariable logistic regression and receiver operating characteristic (ROC) analysis. Automated volumetric assessment identified stable (<i>n</i> = 44), increased (<i>n</i> = 109), and decreased (<i>n</i> = 108) stone burden across the evaluated kidneys. Qualitative assessments from radiology reports demonstrated weak diagnostic performance (AUC range, 0.55-0.62), similar to the independent radiologist (AUC range, 0.41-0.72) for differentiating changes in stone burden. A model incorporating higher-order radiomics features achieved an AUC of 0.71 for distinguishing increased versus decreased stone burdens compared with the baseline CT (<i>p</i> < 0.001), but did not outperform threshold-based volumetric assessment. The automated threshold-based volumetric quantification of kidney stone burdens provides higher diagnostic accuracy than qualitative radiologist assessments and radiomics-based analyses for identifying a stable, increased, or decreased stone burden on follow-up CT examinations.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842500/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.3390/jimaging12010011
Changheun Oh
Long scan times remain a fundamental challenge in Magnetic Resonance Imaging (MRI). Accelerated MRI, which undersamples k-space, requires robust reconstruction methods to solve the ill-posed inverse problem. Recent methods have shown promise by processing image-domain features to capture global spatial context. However, these approaches are often limited, as they fail to fully leverage the unique, sequential characteristics of the k-space data themselves, which are critical for disentangling aliasing artifacts. This study introduces a novel, hybrid, dual-domain deep learning architecture that combines a ViT-based autoencoder with Bidirectional Recurrent Neural Networks (BiRNNs). The proposed architecture is designed to synergistically process information from both domains: it uses the ViT to learn features from image patches and the BiRNNs to model sequential dependencies directly from k-space data. We conducted a comprehensive comparative analysis against a standard ViT with only an MLP head (Model 1), a ViT autoencoder operating solely in the image domain (Model 2), and a competitive UNet baseline. Evaluations were performed on retrospectively undersampled neuro-MRI data using R = 4 and R = 8 acceleration factors with both regular and random sampling patterns. The proposed architecture demonstrated superior performance and robustness, significantly outperforming all other models in challenging high-acceleration and random-sampling scenarios. The results confirm that integrating sequential k-space processing via BiRNNs is critical for superior artifact suppression, offering a robust solution for accelerated MRI.
{"title":"A Hybrid Vision Transformer-BiRNN Architecture for Direct k-Space to Image Reconstruction in Accelerated MRI.","authors":"Changheun Oh","doi":"10.3390/jimaging12010011","DOIUrl":"10.3390/jimaging12010011","url":null,"abstract":"<p><p>Long scan times remain a fundamental challenge in Magnetic Resonance Imaging (MRI). Accelerated MRI, which undersamples k-space, requires robust reconstruction methods to solve the ill-posed inverse problem. Recent methods have shown promise by processing image-domain features to capture global spatial context. However, these approaches are often limited, as they fail to fully leverage the unique, sequential characteristics of the k-space data themselves, which are critical for disentangling aliasing artifacts. This study introduces a novel, hybrid, dual-domain deep learning architecture that combines a ViT-based autoencoder with Bidirectional Recurrent Neural Networks (BiRNNs). The proposed architecture is designed to synergistically process information from both domains: it uses the ViT to learn features from image patches and the BiRNNs to model sequential dependencies directly from k-space data. We conducted a comprehensive comparative analysis against a standard ViT with only an MLP head (Model 1), a ViT autoencoder operating solely in the image domain (Model 2), and a competitive UNet baseline. Evaluations were performed on retrospectively undersampled neuro-MRI data using R = 4 and R = 8 acceleration factors with both regular and random sampling patterns. The proposed architecture demonstrated superior performance and robustness, significantly outperforming all other models in challenging high-acceleration and random-sampling scenarios. The results confirm that integrating sequential k-space processing via BiRNNs is critical for superior artifact suppression, offering a robust solution for accelerated MRI.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.3390/jimaging12010012
Sibusiso B Buthelezi, Jules R Tapamo
We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.
{"title":"Patched-Based Swin Transformer Hyperprior for Learned Image Compression.","authors":"Sibusiso B Buthelezi, Jules R Tapamo","doi":"10.3390/jimaging12010012","DOIUrl":"10.3390/jimaging12010012","url":null,"abstract":"<p><p>We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842631/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.3390/jimaging12010008
Ziye He, Shu Gan, Xiping Yuan
To address the decline in self-consistency and limited spatial adaptability of traditional interpolation methods in complex terrain, this study proposes a terrain-constrained Triangulated Irregular Network (TIN) interpolation method based on UAV point clouds. The method was tested in the southern margin of the Lufeng Dinosaur National Geopark, Yunnan Province, using ground points at different sampling densities (90%, 70%, 50%, 30%, and 10%), and compared with Spline, Kriging, ANUDEM, and IDW methods. Results show that the proposed method maintains the lowest RMSE and MAE across all densities, demonstrating higher stability and self-consistency and better preserving terrain undulations. This provides technical support for high-precision DEM reconstruction from UAV point clouds in complex terrain.
{"title":"A Terrain-Constrained TIN Approach for High-Precision DEM Reconstruction Using UAV Point Clouds.","authors":"Ziye He, Shu Gan, Xiping Yuan","doi":"10.3390/jimaging12010008","DOIUrl":"10.3390/jimaging12010008","url":null,"abstract":"<p><p>To address the decline in self-consistency and limited spatial adaptability of traditional interpolation methods in complex terrain, this study proposes a terrain-constrained Triangulated Irregular Network (TIN) interpolation method based on UAV point clouds. The method was tested in the southern margin of the Lufeng Dinosaur National Geopark, Yunnan Province, using ground points at different sampling densities (90%, 70%, 50%, 30%, and 10%), and compared with Spline, Kriging, ANUDEM, and IDW methods. Results show that the proposed method maintains the lowest RMSE and MAE across all densities, demonstrating higher stability and self-consistency and better preserving terrain undulations. This provides technical support for high-precision DEM reconstruction from UAV point clouds in complex terrain.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842391/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Segmentation of vegetation from images is an important task in precision agriculture applications, particularly in challenging desert environments where sparse vegetation, varying soil colors, and strong shadows pose significant difficulties. In this paper, we present a machine learning approach to robust green-vegetation segmentation in drone imagery captured over desert farmlands. The proposed method combines HSV color-space representation with Gray-Level Co-occurrence Matrix (GLCM) texture features and employs Support Vector Machine (SVM) as the learning algorithm. To enhance robustness, we incorporate comprehensive preprocessing, including Gaussian filtering, illumination normalization, and bilateral filtering, followed by morphological post-processing to improve segmentation quality. The method is evaluated against both traditional spectral index methods (ExG and CIVE) and a modern deep learning baseline using comprehensive metrics including accuracy, precision, recall, F1-score, and Intersection over Union (IoU). Experimental results on 120 high-resolution drone images from UAE desert farmlands demonstrate that the proposed method achieves superior performance with an accuracy of 0.91, F1-score of 0.88, and IoU of 0.82, showing significant improvement over baseline methods in handling challenging desert conditions, including shadows, varying soil colors, and sparse vegetation patterns. The method provides practical computational performance with a processing time of 25 s per image and a training time of 28 min, making it suitable for agricultural applications where accuracy is prioritized over processing speed.
{"title":"Accurate Segmentation of Vegetation in UAV Desert Imagery Using HSV-GLCM Features and SVM Classification.","authors":"Thani Jintasuttisak, Patompong Chabplan, Sasitorn Issaro, Orawan Saeung, Thamasan Suwanroj","doi":"10.3390/jimaging12010009","DOIUrl":"10.3390/jimaging12010009","url":null,"abstract":"<p><p>Segmentation of vegetation from images is an important task in precision agriculture applications, particularly in challenging desert environments where sparse vegetation, varying soil colors, and strong shadows pose significant difficulties. In this paper, we present a machine learning approach to robust green-vegetation segmentation in drone imagery captured over desert farmlands. The proposed method combines HSV color-space representation with Gray-Level Co-occurrence Matrix (GLCM) texture features and employs Support Vector Machine (SVM) as the learning algorithm. To enhance robustness, we incorporate comprehensive preprocessing, including Gaussian filtering, illumination normalization, and bilateral filtering, followed by morphological post-processing to improve segmentation quality. The method is evaluated against both traditional spectral index methods (ExG and CIVE) and a modern deep learning baseline using comprehensive metrics including accuracy, precision, recall, F1-score, and Intersection over Union (IoU). Experimental results on 120 high-resolution drone images from UAE desert farmlands demonstrate that the proposed method achieves superior performance with an accuracy of 0.91, F1-score of 0.88, and IoU of 0.82, showing significant improvement over baseline methods in handling challenging desert conditions, including shadows, varying soil colors, and sparse vegetation patterns. The method provides practical computational performance with a processing time of 25 s per image and a training time of 28 min, making it suitable for agricultural applications where accuracy is prioritized over processing speed.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12843393/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.3390/jimaging12010006
George Angelidis, Stavroula Giannakou, Varvara Valotassiou, Emmanouil Panagiotidis, Ioannis Tsougos, Chara Tzavara, Dimitrios Psimadas, Evdoxia Theodorou, Charalampos Ziangas, John Skoularigis, Filippos Triposkiadis, Panagiotis Georgoulias
The evaluation of myocardial perfusion imaging (MPI) studies is based on the visual interpretation of the reconstructed images, while the measurements obtained through software packages may contribute to the investigation, mainly in cases of ambiguous scintigraphic findings. We aimed to investigate the long-term prognostic value of expert reading of Summed Stress Score (SSS), Summed Rest Score (SRS), and Summed Difference Score (SDS), combined with the automated measurements of these parameters, in comparison to the prognostic ability of the angiographic score for soft and hard cardiac events. The study was conducted at the Nuclear Medicine Laboratory of the University of Thessaly, in Larissa, Greece. Overall, 378 consecutive patients with known or suspected coronary artery disease (CAD) were enrolled. Automated measurements of SSS, SRS, and SDS were obtained using the Emory Cardiac Toolbox, Myovation, and Quantitative Perfusion SPECT software packages. Coronary angiographies were scored according to a four-point scoring system (angiographic score). Follow-up data were recorded after phone contact, as well as through review of hospital records. All participants were followed up for at least 36 months. Soft and hard cardiac events were recorded in 31.7% and 11.6% of the sample, respectively, while any cardiac event was recorded in 36.5%. For hard cardiac events, the prognostic value of expert scoring, combined with the prognostic value of the automated measurements, was significantly greater compared to the prognostic ability of the angiographic score (p < 0.001). As far as any cardiac event, the prognostic value of expert scoring, combined with the prognostic value of the automated analyses, was significantly greater compared to the prognostic ability of the angiographic score (p < 0.001). According to our results, in patients with known or suspected CAD, the combination of expert reading and automated measurements of SSS, SRS, and SDS shows a superior prognostic ability in comparison to the angiographic score.
{"title":"Long-Term Prognostic Value in Nuclear Cardiology: Expert Scoring Combined with Automated Measurements vs. Angiographic Score.","authors":"George Angelidis, Stavroula Giannakou, Varvara Valotassiou, Emmanouil Panagiotidis, Ioannis Tsougos, Chara Tzavara, Dimitrios Psimadas, Evdoxia Theodorou, Charalampos Ziangas, John Skoularigis, Filippos Triposkiadis, Panagiotis Georgoulias","doi":"10.3390/jimaging12010006","DOIUrl":"10.3390/jimaging12010006","url":null,"abstract":"<p><p>The evaluation of myocardial perfusion imaging (MPI) studies is based on the visual interpretation of the reconstructed images, while the measurements obtained through software packages may contribute to the investigation, mainly in cases of ambiguous scintigraphic findings. We aimed to investigate the long-term prognostic value of expert reading of Summed Stress Score (SSS), Summed Rest Score (SRS), and Summed Difference Score (SDS), combined with the automated measurements of these parameters, in comparison to the prognostic ability of the angiographic score for soft and hard cardiac events. The study was conducted at the Nuclear Medicine Laboratory of the University of Thessaly, in Larissa, Greece. Overall, 378 consecutive patients with known or suspected coronary artery disease (CAD) were enrolled. Automated measurements of SSS, SRS, and SDS were obtained using the Emory Cardiac Toolbox, Myovation, and Quantitative Perfusion SPECT software packages. Coronary angiographies were scored according to a four-point scoring system (angiographic score). Follow-up data were recorded after phone contact, as well as through review of hospital records. All participants were followed up for at least 36 months. Soft and hard cardiac events were recorded in 31.7% and 11.6% of the sample, respectively, while any cardiac event was recorded in 36.5%. For hard cardiac events, the prognostic value of expert scoring, combined with the prognostic value of the automated measurements, was significantly greater compared to the prognostic ability of the angiographic score (<i>p</i> < 0.001). As far as any cardiac event, the prognostic value of expert scoring, combined with the prognostic value of the automated analyses, was significantly greater compared to the prognostic ability of the angiographic score (<i>p</i> < 0.001). According to our results, in patients with known or suspected CAD, the combination of expert reading and automated measurements of SSS, SRS, and SDS shows a superior prognostic ability in comparison to the angiographic score.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12843285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.3390/jimaging12010007
Zhen Peng, Zhihong Jiang, Pengcheng Zhu, Gaipin Cai, Xiaoyan Luo
The dynamic characteristics of flotation foam, such as velocity and breakage rate, are critical factors that influence mineral separation efficiency. However, challenges inherent in foam images, including weak textures, severe deformations, and motion blur, present significant technical hurdles for dynamic monitoring. These issues lead to a fundamental conflict between the efficiency and accuracy of traditional feature matching algorithms. This paper introduces a novel progressive framework for dynamic feature matching in flotation foam images, termed "stable extraction, efficient coarse screening, and precise matching." This framework first employs the Accelerated-KAZE (AKAZE) algorithm to extract robust, scale- and rotation-invariant feature points from a non-linear scale-space, effectively addressing the challenge of weak textures. Subsequently, it innovatively incorporates the Grid-based Motion Statistics (GMS) algorithm to perform efficient coarse screening based on motion consistency, rapidly filtering out a large number of obvious mismatches. Finally, the Progressive Sample and Consensus (PROSAC) algorithm is used for precise matching, eliminating the remaining subtle mismatches through progressive sampling and geometric constraints. This framework enables the precise analysis of dynamic foam characteristics, including displacement, velocity, and breakage rate (enhanced by a robust "foam lifetime" mechanism). Comparative experimental results demonstrate that, compared to ORB-GMS-RANSAC (with a Mean Absolute Error, MAE of 1.20 pixels and a Mean Relative Error, MRE of 9.10%) and ORB-RANSAC (MAE: 3.53 pixels, MRE: 27.36%), the proposed framework achieves significantly lower error rates (MAE: 0.23 pixels, MRE: 2.13%). It exhibits exceptional stability and accuracy, particularly in complex scenarios involving low texture and minor displacements. This research provides a high-precision, high-robustness technical solution for the dynamic monitoring and intelligent control of the flotation process.
{"title":"AKAZE-GMS-PROSAC: A New Progressive Framework for Matching Dynamic Characteristics of Flotation Foam.","authors":"Zhen Peng, Zhihong Jiang, Pengcheng Zhu, Gaipin Cai, Xiaoyan Luo","doi":"10.3390/jimaging12010007","DOIUrl":"10.3390/jimaging12010007","url":null,"abstract":"<p><p>The dynamic characteristics of flotation foam, such as velocity and breakage rate, are critical factors that influence mineral separation efficiency. However, challenges inherent in foam images, including weak textures, severe deformations, and motion blur, present significant technical hurdles for dynamic monitoring. These issues lead to a fundamental conflict between the efficiency and accuracy of traditional feature matching algorithms. This paper introduces a novel progressive framework for dynamic feature matching in flotation foam images, termed \"stable extraction, efficient coarse screening, and precise matching.\" This framework first employs the Accelerated-KAZE (AKAZE) algorithm to extract robust, scale- and rotation-invariant feature points from a non-linear scale-space, effectively addressing the challenge of weak textures. Subsequently, it innovatively incorporates the Grid-based Motion Statistics (GMS) algorithm to perform efficient coarse screening based on motion consistency, rapidly filtering out a large number of obvious mismatches. Finally, the Progressive Sample and Consensus (PROSAC) algorithm is used for precise matching, eliminating the remaining subtle mismatches through progressive sampling and geometric constraints. This framework enables the precise analysis of dynamic foam characteristics, including displacement, velocity, and breakage rate (enhanced by a robust \"foam lifetime\" mechanism). Comparative experimental results demonstrate that, compared to ORB-GMS-RANSAC (with a Mean Absolute Error, MAE of 1.20 pixels and a Mean Relative Error, MRE of 9.10%) and ORB-RANSAC (MAE: 3.53 pixels, MRE: 27.36%), the proposed framework achieves significantly lower error rates (MAE: 0.23 pixels, MRE: 2.13%). It exhibits exceptional stability and accuracy, particularly in complex scenarios involving low texture and minor displacements. This research provides a high-precision, high-robustness technical solution for the dynamic monitoring and intelligent control of the flotation process.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12843094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.3390/jimaging12010010
Haya Monawwar, Guoliang Fan
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in a query image and their boundaries may overlap or be partially occluded. We present Render-Rank-Refine, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-k candidates are compared to the query via rotation-invariant circular descriptors, which re-ranks the matches before final translation and rotation refinement. Our method increases camera localization accuracy compared to the state-of-the-art SPVLoc baseline by reducing the translation error by 40.4% and the rotation error by 29.7% in ambiguous layouts, as evaluated on the Zillow Indoor Dataset. In terms of inference throughput, our method achieves 25.8-26.4 QPS, (Queries Per Second) which is significantly faster than other recent comparable methods, while maintaining accuracy comparable to or better than the SPVLoc baseline. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions.
{"title":"Render-Rank-Refine: Accurate 6D Indoor Localization via Circular Rendering.","authors":"Haya Monawwar, Guoliang Fan","doi":"10.3390/jimaging12010010","DOIUrl":"10.3390/jimaging12010010","url":null,"abstract":"<p><p>Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in a query image and their boundaries may overlap or be partially occluded. We present <i>Render-Rank-Refine</i>, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-<i>k</i> candidates are compared to the query via rotation-invariant circular descriptors, which re-ranks the matches before final translation and rotation refinement. Our method increases camera localization accuracy compared to the state-of-the-art SPVLoc baseline by reducing the translation error by 40.4% and the rotation error by 29.7% in ambiguous layouts, as evaluated on the Zillow Indoor Dataset. In terms of inference throughput, our method achieves 25.8-26.4 QPS, (Queries Per Second) which is significantly faster than other recent comparable methods, while maintaining accuracy comparable to or better than the SPVLoc baseline. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We aimed to investigate the type of bone changes in temporomandibular disorder patients with disc displacement. The subjects were 117 temporomandibular joints that were diagnosed with anterior disc displacement using magnetic resonance imaging (MRI). Temporomandibular joint (TMJ) pain and opening dysfunction were examined. Disc displacement with and without reduction, joint effusion, and bone changes in the mandibular condyle were assessed on MRI. The types of bone changes were classified into erosion, flattening, osteophyte, and atrophy on the MR images. Fisher's exact test and χ2 test were performed for analyses. Bone changes were found on 30.8% of subjects with erosion, flattening, osteophyte, and atrophy types (p < 0.001). The occurrence of joint effusion appearance (p < 0.001), TMJ pain (p = 0.027), and opening dysfunction (p = 0.002) differed among the types of bone changes. Gender differences were also found among the types of bone changes (p < 0.001). The rate of disc displacement with reduction was significantly smaller than that of disc displacement without reduction on flattening and osteophyte (p < 0.001). The results made it clear that the symptoms, gender, and presence or absence of disc reduction differed among the types of bone changes.
{"title":"Bone Changes in Mandibular Condyle of Temporomandibular Dysfunction Patients Recognized on Magnetic Resonance Imaging.","authors":"Fumi Mizuhashi, Ichiro Ogura, Ryo Mizuhashi, Yuko Watarai, Tatsuhiro Suzuki, Momoka Kawana, Kotono Nagata, Tomonori Niitsuma, Makoto Oohashi","doi":"10.3390/jimaging12010005","DOIUrl":"10.3390/jimaging12010005","url":null,"abstract":"<p><p>We aimed to investigate the type of bone changes in temporomandibular disorder patients with disc displacement. The subjects were 117 temporomandibular joints that were diagnosed with anterior disc displacement using magnetic resonance imaging (MRI). Temporomandibular joint (TMJ) pain and opening dysfunction were examined. Disc displacement with and without reduction, joint effusion, and bone changes in the mandibular condyle were assessed on MRI. The types of bone changes were classified into erosion, flattening, osteophyte, and atrophy on the MR images. Fisher's exact test and χ<sup>2</sup> test were performed for analyses. Bone changes were found on 30.8% of subjects with erosion, flattening, osteophyte, and atrophy types (<i>p</i> < 0.001). The occurrence of joint effusion appearance (<i>p</i> < 0.001), TMJ pain (<i>p</i> = 0.027), and opening dysfunction (<i>p</i> = 0.002) differed among the types of bone changes. Gender differences were also found among the types of bone changes (<i>p</i> < 0.001). The rate of disc displacement with reduction was significantly smaller than that of disc displacement without reduction on flattening and osteophyte (<i>p</i> < 0.001). The results made it clear that the symptoms, gender, and presence or absence of disc reduction differed among the types of bone changes.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.3390/jimaging12010004
Mou Deb, Mrinal Kanti Dhar, Poonguzhali Elangovan, Keerthy Gopalakrishnan, Divyanshi Sood, Aaftab Sethi, Sabah Afroze, Sourav Bansal, Aastha Goudel, Charmy Parikh, Avneet Kaur, Swetha Rapolu, Gianeshwaree Alias Rachna Panjwani, Rabiah Aslam Ansari, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Scott A Helgeson, Venkata S Akshintala, Shivaram P Arunachalam
This study proposes a novel two-dimensional Empirical Mode Decomposition (2D EMD)-based deep learning framework to enhance model performance in multi-class image classification tasks and potential early detection of diseases in healthcare using medical imaging. To validate this approach, we apply it to gastrointestinal (GI) endoscopic image classification using the publicly available Kvasir dataset, which contains eight GI image classes with 1000 images each. The proposed 2D EMD-based design procedure decomposes images into a full set of intrinsic mode functions (IMFs) to enhance image features beneficial for AI model development. Integrating 2D EMD into a deep learning pipeline, we evaluate its impact on four popular models (ResNet152, VGG19bn, MobileNetV3L, and SwinTransformerV2S). The results demonstrate that subtracting IMFs from the original image consistently improves accuracy, F1-score, and AUC for all models. The study reveals a notable enhancement in model performance, with an approximately 9% increase in accuracy compared to counterparts without EMD integration for ResNet152. Similarly, there is an increase of around 18% for VGG19L, 3% for MobileNetV3L, and 8% for SwinTransformerV2. Additionally, explainable AI (XAI) techniques, such as Grad-CAM, illustrate that the model focuses on GI regions for predictions. This study highlights the efficacy of 2D EMD in enhancing deep learning model performance for GI image classification, with potential applications in other domains.
{"title":"Empirical Mode Decomposition-Based Deep Learning Model Development for Medical Imaging: Feasibility Study for Gastrointestinal Endoscopic Image Classification.","authors":"Mou Deb, Mrinal Kanti Dhar, Poonguzhali Elangovan, Keerthy Gopalakrishnan, Divyanshi Sood, Aaftab Sethi, Sabah Afroze, Sourav Bansal, Aastha Goudel, Charmy Parikh, Avneet Kaur, Swetha Rapolu, Gianeshwaree Alias Rachna Panjwani, Rabiah Aslam Ansari, Naghmeh Asadimanesh, Shiva Sankari Karuppiah, Scott A Helgeson, Venkata S Akshintala, Shivaram P Arunachalam","doi":"10.3390/jimaging12010004","DOIUrl":"10.3390/jimaging12010004","url":null,"abstract":"<p><p>This study proposes a novel two-dimensional Empirical Mode Decomposition (2D EMD)-based deep learning framework to enhance model performance in multi-class image classification tasks and potential early detection of diseases in healthcare using medical imaging. To validate this approach, we apply it to gastrointestinal (GI) endoscopic image classification using the publicly available Kvasir dataset, which contains eight GI image classes with 1000 images each. The proposed 2D EMD-based design procedure decomposes images into a full set of intrinsic mode functions (IMFs) to enhance image features beneficial for AI model development. Integrating 2D EMD into a deep learning pipeline, we evaluate its impact on four popular models (ResNet152, VGG19bn, MobileNetV3L, and SwinTransformerV2S). The results demonstrate that subtracting IMFs from the original image consistently improves accuracy, F1-score, and AUC for all models. The study reveals a notable enhancement in model performance, with an approximately 9% increase in accuracy compared to counterparts without EMD integration for ResNet152. Similarly, there is an increase of around 18% for VGG19L, 3% for MobileNetV3L, and 8% for SwinTransformerV2. Additionally, explainable AI (XAI) techniques, such as Grad-CAM, illustrate that the model focuses on GI regions for predictions. This study highlights the efficacy of 2D EMD in enhancing deep learning model performance for GI image classification, with potential applications in other domains.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12842276/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}