首页 > 最新文献

Journal of Imaging最新文献

英文 中文
Relationship Between Display Pixel Structure and Gloss Perception. 显示像素结构与光泽度感知的关系。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-09 DOI: 10.3390/jimaging12020071
Kosei Aketagawa, Midori Tanaka, Takahiko Horiuchi

The demand for accurate representation of gloss perception, which significantly contributes to the impression and evaluation of objects, is increasing owing to recent advancements in display technology enabling high-definition visual reproduction. This study experimentally analyzes the influence of display pixel structure on gloss perception. In a visual evaluation experiment using natural images, gloss perception was assessed across six types of stimuli: three subpixel arrays (RGB, RGBW, and PenTile RGBG) combined with two pixel-aperture ratios (100% and 50%). The experimental results statistically confirmed that regardless of pixel-aperture ratio, the RGB subpixel array was perceived as exhibiting the strongest gloss. Furthermore, cluster analysis of observers revealed individual differences in the effect of pixel structure on gloss perception. Additionally, gloss classification and image feature analysis suggested that the magnitude of pixel structure influence varies depending on the frequency components contained in the images. Moreover, analysis using a generalized linear mixed model supported the superiority of the RGB subpixel array even when accounting for variability across observers and natural images.

由于最近显示技术的进步,能够实现高清晰度的视觉再现,对光泽感知的准确表示的需求正在增加,这对物体的印象和评估有重要贡献。本研究通过实验分析了显示像素结构对光泽度感知的影响。在一项使用自然图像的视觉评估实验中,研究人员评估了六种刺激类型下的光泽感知:三种亚像素阵列(RGB、RGBW和PenTile RGBG)结合两种像素孔径比(100%和50%)。实验结果统计证实,无论像素-孔径比如何,RGB亚像素阵列被认为具有最强的光泽。此外,对观察者的聚类分析揭示了像素结构对光泽感知影响的个体差异。此外,光泽分类和图像特征分析表明,像素结构的影响程度取决于图像中包含的频率成分。此外,使用广义线性混合模型的分析支持RGB亚像素阵列的优势,即使考虑到观察者和自然图像之间的可变性。
{"title":"Relationship Between Display Pixel Structure and Gloss Perception.","authors":"Kosei Aketagawa, Midori Tanaka, Takahiko Horiuchi","doi":"10.3390/jimaging12020071","DOIUrl":"10.3390/jimaging12020071","url":null,"abstract":"<p><p>The demand for accurate representation of gloss perception, which significantly contributes to the impression and evaluation of objects, is increasing owing to recent advancements in display technology enabling high-definition visual reproduction. This study experimentally analyzes the influence of display pixel structure on gloss perception. In a visual evaluation experiment using natural images, gloss perception was assessed across six types of stimuli: three subpixel arrays (RGB, RGBW, and PenTile RGBG) combined with two pixel-aperture ratios (100% and 50%). The experimental results statistically confirmed that regardless of pixel-aperture ratio, the RGB subpixel array was perceived as exhibiting the strongest gloss. Furthermore, cluster analysis of observers revealed individual differences in the effect of pixel structure on gloss perception. Additionally, gloss classification and image feature analysis suggested that the magnitude of pixel structure influence varies depending on the frequency components contained in the images. Moreover, analysis using a generalized linear mixed model supported the superiority of the RGB subpixel array even when accounting for variability across observers and natural images.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12942264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topic-Modeling Guided Semantic Clustering for Enhancing CNN-Based Image Classification Using Scale-Invariant Feature Transform and Block Gabor Filtering. 基于主题建模的语义聚类基于尺度不变特征变换和块Gabor滤波增强cnn图像分类。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-09 DOI: 10.3390/jimaging12020070
Natthaphong Suthamno, Jessada Tanthanuch

This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local feature descriptors. These descriptors are clustered using K-means to build a visual vocabulary. Bag of Words histograms then represent each image as a visual document. Latent Dirichlet Allocation is applied to uncover latent semantic topics, generating coherent image clusters. Cluster-specific CNN models, including AlexNet, GoogLeNet, and several ResNet variants, are trained under identical conditions to identify the most suitable architecture for each cluster. Two topic guided integration strategies, the Maximum Proportion Topic (MPT) and the Weight Proportion Topic (WPT), are then used to assign test images to the corresponding specialized model. Experimental results show that both the SIFT-based and BGF-based pipelines outperform non-clustered CNN models and a baseline method using Incremental PCA, K-means, Same-Cluster Prediction, and unweighted Ensemble Voting. The SIFT pipeline achieves the highest accuracy of 95.24% with the MPT strategy, while the BGF pipeline achieves 93.76% with the WPT strategy. These findings confirm that semantic structure introduced through topic modeling substantially improves CNN classification performance.

本研究提出了一个主题建模引导框架,通过在CNN训练之前引入语义聚类来增强图像分类。通过Sobel边缘检测的尺度不变特征变换(SIFT)和块Gabor滤波(BGF)两种关键点提取管道对图像进行处理,获得局部特征描述子。使用K-means对这些描述符进行聚类,以构建视觉词汇表。然后,单词袋直方图将每个图像表示为一个视觉文档。使用潜在狄利克雷分配来发现潜在的语义主题,生成连贯的图像聚类。特定于集群的CNN模型,包括AlexNet、GoogLeNet和几个ResNet变体,在相同的条件下进行训练,以确定每个集群最合适的架构。然后使用两种主题引导的集成策略,即最大比例主题(MPT)和权重比例主题(WPT),将测试图像分配到相应的专用模型。实验结果表明,基于sift和基于bgf的管道都优于非聚类CNN模型和使用增量PCA、K-means、同簇预测和无加权集成投票的基线方法。SIFT管道在MPT策略下达到95.24%的最高准确率,而BGF管道在WPT策略下达到93.76%的最高准确率。这些发现证实了通过主题建模引入的语义结构大大提高了CNN的分类性能。
{"title":"Topic-Modeling Guided Semantic Clustering for Enhancing CNN-Based Image Classification Using Scale-Invariant Feature Transform and Block Gabor Filtering.","authors":"Natthaphong Suthamno, Jessada Tanthanuch","doi":"10.3390/jimaging12020070","DOIUrl":"10.3390/jimaging12020070","url":null,"abstract":"<p><p>This study proposes a topic-modeling guided framework that enhances image classification by introducing semantic clustering prior to CNN training. Images are processed through two key-point extraction pipelines: Scale-Invariant Feature Transform (SIFT) with Sobel edge detection and Block Gabor Filtering (BGF), to obtain local feature descriptors. These descriptors are clustered using K-means to build a visual vocabulary. Bag of Words histograms then represent each image as a visual document. Latent Dirichlet Allocation is applied to uncover latent semantic topics, generating coherent image clusters. Cluster-specific CNN models, including AlexNet, GoogLeNet, and several ResNet variants, are trained under identical conditions to identify the most suitable architecture for each cluster. Two topic guided integration strategies, the Maximum Proportion Topic (MPT) and the Weight Proportion Topic (WPT), are then used to assign test images to the corresponding specialized model. Experimental results show that both the SIFT-based and BGF-based pipelines outperform non-clustered CNN models and a baseline method using Incremental PCA, K-means, Same-Cluster Prediction, and unweighted Ensemble Voting. The SIFT pipeline achieves the highest accuracy of 95.24% with the MPT strategy, while the BGF pipeline achieves 93.76% with the WPT strategy. These findings confirm that semantic structure introduced through topic modeling substantially improves CNN classification performance.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12941444/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YOLO11s-UAV: An Advanced Algorithm for Small Object Detection in UAV Aerial Imagery. YOLO11s-UAV:一种先进的无人机航拍图像小目标检测算法。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-06 DOI: 10.3390/jimaging12020069
Qi Mi, Jianshu Chao, Anqi Chen, Kaiyuan Zhang, Jiahua Lai

Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model's backbone. This module effectively eliminates information loss caused by strided convolution or pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model's complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications.

无人驾驶飞行器(uav)现在广泛应用于各种应用,包括农业、城市交通管理和搜救行动。然而,出现了一些挑战,包括小尺寸的物体只占用图像中稀疏的像素数量,航空镜头中的复杂背景,以及有限的机载计算资源。为了解决这些问题,本文提出了一种改进的基于无人机的小目标检测算法YOLO11s-UAV,专为航空图像设计。首先,我们引入了一种新的FPN,称为内容感知重组和交互特征金字塔网络(CARIFPN),该网络在减少冗余网络结构的同时显著增强了小目标特征检测。其次,我们在模型的主干中应用一种新的下采样卷积来提取小目标特征,称为空间到深度的膨胀残差卷积(S2DResConv)。该模块有效消除了跨行卷积或池化操作造成的信息丢失,便于多尺度上下文的捕获。最后,我们将一个简单的、无参数的注意力模块(SimAM)与C3k2集成在一起,形成灵活的SimAM (FlexSimAM),并应用于整个模型。这种改进的模块不仅降低了模型的复杂性,而且能够在复杂场景中有效地增强小对象特征。实验结果表明,在VisDrone-DET2019数据集上,与基线yolo11相比,我们的模型在验证集mAP@0.5上提高了7.8%(达到46.0%),在测试集上提高了5.9%(增加到37.3%),同时减少了55.3%的模型参数。同样,它在TinyPerson数据集上实现了7.2%的改进,在UAVDT-DET上实现了3.0%的改进。在NVIDIA Jetson Orin NX SUPER平台上的部署表明,我们的模型实现了33 FPS,比yolo11低21.4%,证实了其实时机载无人机应用的可行性。
{"title":"YOLO11s-UAV: An Advanced Algorithm for Small Object Detection in UAV Aerial Imagery.","authors":"Qi Mi, Jianshu Chao, Anqi Chen, Kaiyuan Zhang, Jiahua Lai","doi":"10.3390/jimaging12020069","DOIUrl":"10.3390/jimaging12020069","url":null,"abstract":"<p><p>Unmanned aerial vehicles (UAVs) are now widely used in various applications, including agriculture, urban traffic management, and search and rescue operations. However, several challenges arise, including the small size of objects occupying only a sparse number of pixels in images, complex backgrounds in aerial footage, and limited computational resources onboard. To address these issues, this paper proposes an improved UAV-based small object detection algorithm, YOLO11s-UAV, specifically designed for aerial imagery. Firstly, we introduce a novel FPN, called Content-Aware Reassembly and Interaction Feature Pyramid Network (CARIFPN), which significantly enhances small object feature detection while reducing redundant network structures. Secondly, we apply a new downsampling convolution for small object feature extraction, called Space-to-Depth for Dilation-wise Residual Convolution (S2DResConv), in the model's backbone. This module effectively eliminates information loss caused by strided convolution or pooling operations and facilitates the capture of multi-scale context. Finally, we integrate a simple, parameter-free attention module (SimAM) with C3k2 to form Flexible SimAM (FlexSimAM), which is applied throughout the entire model. This improved module not only reduces the model's complexity but also enables efficient enhancement of small object features in complex scenarios. Experimental results demonstrate that on the VisDrone-DET2019 dataset, our model improves mAP@0.5 by 7.8% on the validation set (reaching 46.0%) and by 5.9% on the test set (increasing to 37.3%) compared to the baseline YOLO11s, while reducing model parameters by 55.3%. Similarly, it achieves a 7.2% improvement on the TinyPerson dataset and a 3.0% increase on UAVDT-DET. Deployment on the NVIDIA Jetson Orin NX SUPER platform shows that our model achieves 33 FPS, which is 21.4% lower than YOLO11s, confirming its feasibility for real-time onboard UAV applications.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12942582/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Radiological Report Generation from Breast Ultrasound Images Using Vision and Language Transformers. 使用视觉和语言转换器从乳房超声图像自动生成放射学报告。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-06 DOI: 10.3390/jimaging12020068
Shaheen Khatoon, Azhar Mahmood

Breast ultrasound imaging is widely used for the detection and characterization of breast abnormalities; however, generating detailed and consistent radiological reports remains a labor-intensive and subjective process. Recent advances in deep learning have demonstrated the potential of automated report generation systems to support clinical workflows, yet most existing approaches focus on chest X-ray imaging and rely on convolutional-recurrent architectures with limited capacity to model long-range dependencies and complex clinical semantics. In this work, we propose a multimodal Transformer-based framework for automatic breast ultrasound report generation that integrates visual and textual information through cross-attention mechanisms. The proposed architecture employs a Vision Transformer (ViT) to extract rich spatial and morphological features from ultrasound images. For textual embedding, pretrained language models (BERT, BioBERT, and GPT-2) are implemented in various encoder-decoder configurations to leverage both general linguistic knowledge and domain-specific biomedical semantics. A multimodal Transformer decoder is implemented to autoregressively generate diagnostic reports by jointly attending to visual features and contextualized textual embeddings. We conducted an extensive quantitative evaluation using standard report generation metrics, including BLEU, ROUGE-L, METEOR, and CIDEr, to assess lexical accuracy, semantic alignment, and clinical relevance. Experimental results demonstrate that BioBERT-based models consistently outperform general domain counterparts in clinical specificity, while GPT-2-based decoders improve linguistic fluency.

乳腺超声成像广泛用于乳腺异常的检测和表征;然而,生成详细和一致的放射报告仍然是一个劳动密集型和主观的过程。深度学习的最新进展已经证明了自动化报告生成系统支持临床工作流程的潜力,但大多数现有方法都侧重于胸部x射线成像,并依赖于卷积循环架构,对远程依赖关系和复杂临床语义的建模能力有限。在这项工作中,我们提出了一个基于多模态转换器的乳房超声报告自动生成框架,该框架通过交叉注意机制集成了视觉和文本信息。该架构采用视觉变换(Vision Transformer, ViT)从超声图像中提取丰富的空间和形态特征。对于文本嵌入,在各种编码器-解码器配置中实现了预训练语言模型(BERT、BioBERT和GPT-2),以利用一般语言知识和特定领域的生物医学语义。实现了一种多模态变压器解码器,通过共同关注视觉特征和上下文文本嵌入来自回归地生成诊断报告。我们使用标准报告生成指标(包括BLEU、ROUGE-L、METEOR和CIDEr)进行了广泛的定量评估,以评估词汇准确性、语义一致性和临床相关性。实验结果表明,基于biobert的模型在临床特异性方面始终优于一般领域的模型,而基于gpt -2的解码器提高了语言流畅性。
{"title":"Automated Radiological Report Generation from Breast Ultrasound Images Using Vision and Language Transformers.","authors":"Shaheen Khatoon, Azhar Mahmood","doi":"10.3390/jimaging12020068","DOIUrl":"10.3390/jimaging12020068","url":null,"abstract":"<p><p>Breast ultrasound imaging is widely used for the detection and characterization of breast abnormalities; however, generating detailed and consistent radiological reports remains a labor-intensive and subjective process. Recent advances in deep learning have demonstrated the potential of automated report generation systems to support clinical workflows, yet most existing approaches focus on chest X-ray imaging and rely on convolutional-recurrent architectures with limited capacity to model long-range dependencies and complex clinical semantics. In this work, we propose a multimodal Transformer-based framework for automatic breast ultrasound report generation that integrates visual and textual information through cross-attention mechanisms. The proposed architecture employs a Vision Transformer (ViT) to extract rich spatial and morphological features from ultrasound images. For textual embedding, pretrained language models (BERT, BioBERT, and GPT-2) are implemented in various encoder-decoder configurations to leverage both general linguistic knowledge and domain-specific biomedical semantics. A multimodal Transformer decoder is implemented to autoregressively generate diagnostic reports by jointly attending to visual features and contextualized textual embeddings. We conducted an extensive quantitative evaluation using standard report generation metrics, including BLEU, ROUGE-L, METEOR, and CIDEr, to assess lexical accuracy, semantic alignment, and clinical relevance. Experimental results demonstrate that BioBERT-based models consistently outperform general domain counterparts in clinical specificity, while GPT-2-based decoders improve linguistic fluency.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12941839/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Nutritional and Morphological Attributes of Fresh Commercial Opuntia Cladodes Using Machine Learning and Imaging. 利用机器学习和成像技术预测新鲜商用大枝苋的营养和形态属性。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-05 DOI: 10.3390/jimaging12020067
Juan Arredondo Valdez, Josué Israel García López, Héctor Flores Breceda, Ajay Kumar, Ricardo David Valdez Cepeda, Alejandro Isabel Luna Maldonado

Opuntia ficus-indica L. is a prominent crop in Mexico, requiring advanced non-destructive technologies for the real-time monitoring and quality control of fresh commercial cladodes. The primary research objective of this study was to develop and validate high-precision mathematical models that correlate hyperspectral signatures (400-1000 nm) with the specific nutritional, morphological, and antioxidant attributes of fresh cladodes (cultivar Villanueva) at their peak commercial maturity. By combining hyperspectral imaging (HSI) with machine learning algorithms, including K-Means clustering for image preprocessing and Partial Least Squares Regression (PLSR) for predictive modeling, this study successfully predicted the concentrations of 10 minerals (N, P, K, Ca, Mg, Fe, B, Mn, Zn, and Cu), chlorophylls (a, b, and Total), and antioxidant capacities (ABTS, FRAP, and DPPH). The innovative nature of this work lies in the simultaneous non-destructive quantification of 17 distinct variables from a single scan, achieving coefficients of determination (R2) as high as 0.988 for Phosphorus and Chlorophyll b. The practical applicability of this research provides a viable replacement for time-consuming and destructive laboratory acid digestion, enabling producers to implement automated, high-throughput sorting lines for quality assurance. Furthermore, this study establishes a framework for interdisciplinary collaborations between agricultural engineers, data scientists for algorithm optimization, and food scientists to enhance the functional value chain of Opuntia products.

无花果是墨西哥的一种重要作物,需要先进的无损技术来实时监测和控制新鲜商业枝的质量。本研究的主要目的是建立并验证高光谱特征(400-1000 nm)与新鲜枝状花序(Villanueva品种)在商业成熟期的特定营养、形态和抗氧化特性之间的高精度数学模型。通过结合高光谱成像(HSI)和机器学习算法,包括用于图像预处理的K- means聚类和用于预测建模的偏最小二乘回归(PLSR),本研究成功预测了10种矿物质(N, P, K, Ca, Mg, Fe, B, Mn, Zn和Cu)的浓度,叶绿素(a, B和Total)和抗氧化能力(ABTS, FRAP和DPPH)。这项工作的创新之处在于通过一次扫描同时对17个不同的变量进行无损定量,磷和叶绿素b的测定系数(R2)高达0.988。该研究的实际适用性为耗时且破坏性的实验室酸消化提供了可行的替代方案,使生产商能够实现自动化,高通量分选线以保证质量。此外,本研究建立了农业工程师、算法优化数据科学家和食品科学家之间跨学科合作的框架,以增强Opuntia产品的功能价值链。
{"title":"Predicting Nutritional and Morphological Attributes of Fresh Commercial <i>Opuntia</i> Cladodes Using Machine Learning and Imaging.","authors":"Juan Arredondo Valdez, Josué Israel García López, Héctor Flores Breceda, Ajay Kumar, Ricardo David Valdez Cepeda, Alejandro Isabel Luna Maldonado","doi":"10.3390/jimaging12020067","DOIUrl":"10.3390/jimaging12020067","url":null,"abstract":"<p><p><i>Opuntia ficus-indica</i> L. is a prominent crop in Mexico, requiring advanced non-destructive technologies for the real-time monitoring and quality control of fresh commercial cladodes. The primary research objective of this study was to develop and validate high-precision mathematical models that correlate hyperspectral signatures (400-1000 nm) with the specific nutritional, morphological, and antioxidant attributes of fresh cladodes (cultivar Villanueva) at their peak commercial maturity. By combining hyperspectral imaging (HSI) with machine learning algorithms, including K-Means clustering for image preprocessing and Partial Least Squares Regression (PLSR) for predictive modeling, this study successfully predicted the concentrations of 10 minerals (N, P, K, Ca, Mg, Fe, B, Mn, Zn, and Cu), chlorophylls (a, b, and Total), and antioxidant capacities (ABTS, FRAP, and DPPH). The innovative nature of this work lies in the simultaneous non-destructive quantification of 17 distinct variables from a single scan, achieving coefficients of determination (R<sup>2</sup>) as high as 0.988 for Phosphorus and Chlorophyll b. The practical applicability of this research provides a viable replacement for time-consuming and destructive laboratory acid digestion, enabling producers to implement automated, high-throughput sorting lines for quality assurance. Furthermore, this study establishes a framework for interdisciplinary collaborations between agricultural engineers, data scientists for algorithm optimization, and food scientists to enhance the functional value chain of <i>Opuntia</i> products.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12941559/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ciphertext-Only Attack on Grayscale-Based EtC Image Encryption via Component Separation and Regularized Single-Channel Compatibility. 基于分量分离和正则化单通道兼容的灰度EtC图像加密纯密文攻击
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-05 DOI: 10.3390/jimaging12020065
Ruifeng Li, Masaaki Fujiyoshi

Grayscale-based Encryption-then-Compression (EtC) systems transform RGB images into the YCbCr color space, concatenate the components into a single grayscale image, and apply block permutation, block rotation/flipping, and block-wise negative-positive inversion. Because this pipeline separates color components and disrupts inter-channel statistics, existing extended jigsaw puzzle solvers (JPSs) have been regarded as ineffective, and grayscale-based EtC systems have been considered resistant to ciphertext-only visual reconstruction. In this paper, we present a practical ciphertext-only attack against grayscale-based EtC. The proposed attack introduces three key components: (i) Texture-Based Component Classification (TBCC) to distinguish luminance (Y) and chrominance (Cb/Cr) blocks and focus reconstruction on structure-rich regions; (ii) Regularized Single-Channel Edge Compatibility (R-SCEC), which applies Tikhonov regularization to a single-channel variant of the Mahalanobis Gradient Compatibility (MGC) measure to alleviate covariance rank-deficiency while maintaining robustness under inversion and geometric transforms; and (iii) Adaptive Pruning based on the TBCC-reduced search space that skips redundant boundary matching computations to further improve reconstruction efficiency. Experiments show that, in settings where existing extended JPS solvers fail, our method can still recover visually recognizable semantic content, revealing a potential vulnerability in grayscale-based EtC and calling for a re-evaluation of its security.

基于灰度的加密-然后压缩(EtC)系统将RGB图像转换为YCbCr颜色空间,将组件连接到单个灰度图像中,并应用块置换,块旋转/翻转和块方向的负-正反转。由于该管道分离颜色组件并破坏通道间统计,现有的扩展拼图解谜器(JPSs)被认为是无效的,并且基于灰度的EtC系统被认为对仅密文的视觉重建具有抵抗力。在本文中,我们提出了一种实用的针对基于灰度的EtC的纯密文攻击。该算法引入了三个关键部分:(i)基于纹理的分量分类(TBCC),区分亮度(Y)和色度(Cb/Cr)块,并将重建重点放在结构丰富的区域;(ii)正则化单通道边缘兼容性(R-SCEC),将Tikhonov正则化应用于Mahalanobis梯度兼容性(MGC)措施的单通道变体,以缓解协方差秩缺陷,同时保持反演和几何变换下的鲁棒性;(iii)基于tbcc约简搜索空间的自适应剪枝,跳过冗余边界匹配计算,进一步提高重构效率。实验表明,在现有扩展JPS求解器失败的情况下,我们的方法仍然可以恢复视觉上可识别的语义内容,这揭示了基于灰度的EtC的潜在漏洞,并呼吁对其安全性进行重新评估。
{"title":"Ciphertext-Only Attack on Grayscale-Based EtC Image Encryption via Component Separation and Regularized Single-Channel Compatibility.","authors":"Ruifeng Li, Masaaki Fujiyoshi","doi":"10.3390/jimaging12020065","DOIUrl":"10.3390/jimaging12020065","url":null,"abstract":"<p><p>Grayscale-based Encryption-then-Compression (EtC) systems transform RGB images into the YCbCr color space, concatenate the components into a single grayscale image, and apply block permutation, block rotation/flipping, and block-wise negative-positive inversion. Because this pipeline separates color components and disrupts inter-channel statistics, existing extended jigsaw puzzle solvers (JPSs) have been regarded as ineffective, and grayscale-based EtC systems have been considered resistant to ciphertext-only visual reconstruction. In this paper, we present a practical ciphertext-only attack against grayscale-based EtC. The proposed attack introduces three key components: (i) Texture-Based Component Classification (TBCC) to distinguish luminance (Y) and chrominance (Cb/Cr) blocks and focus reconstruction on structure-rich regions; (ii) Regularized Single-Channel Edge Compatibility (R-SCEC), which applies Tikhonov regularization to a single-channel variant of the Mahalanobis Gradient Compatibility (MGC) measure to alleviate covariance rank-deficiency while maintaining robustness under inversion and geometric transforms; and (iii) Adaptive Pruning based on the TBCC-reduced search space that skips redundant boundary matching computations to further improve reconstruction efficiency. Experiments show that, in settings where existing extended JPS solvers fail, our method can still recover visually recognizable semantic content, revealing a potential vulnerability in grayscale-based EtC and calling for a re-evaluation of its security.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12941909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Crop Disease Recognition Methods Based on Spectral and RGB Images. 基于光谱和RGB图像的作物病害识别方法综述。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-02-05 DOI: 10.3390/jimaging12020066
Haoze Zheng, Heran Wang, Hualong Dong, Yurong Qian

Major crops worldwide are affected by various diseases yearly, leading to crop losses in different regions. The primary methods for addressing crop disease losses include manual inspection and chemical control. However, traditional manual inspection methods are time-consuming, labor-intensive, and require specialized knowledge. The preemptive use of chemicals also poses a risk of soil pollution, which may cause irreversible damage. With the advancement of computer hardware, photographic technology, and artificial intelligence, crop disease recognition methods based on spectral and red-green-blue (RGB) images not only recognize diseases without damaging the crops but also offer high accuracy and speed of recognition, essentially solving the problems associated with manual inspection and chemical control. This paper summarizes the research on disease recognition methods based on spectral and RGB images, with the literature spanning from 2020 through early 2025. Unlike previous surveys, this paper reviews recent advances involving emerging paradigms such as State Space Models (e.g., Mamba) and Generative AI in the context of crop disease recognition. In addition, it introduces public datasets and commonly used evaluation metrics for crop disease identification. Finally, the paper discusses potential issues and solutions encountered during research, including the use of diffusion models for data augmentation. Hopefully, this survey will help readers understand the current methods and effectiveness of crop disease detection, inspiring the development of more effective methods to assist farmers in identifying crop diseases.

世界各地的主要作物每年都受到各种疾病的影响,导致不同地区的作物损失。解决作物病害损失的主要方法包括人工检查和化学防治。然而,传统的人工检测方法耗时长,劳动强度大,需要专门的知识。先发制人地使用化学品也会带来土壤污染的风险,可能造成不可逆转的损害。随着计算机硬件、摄影技术和人工智能的进步,基于光谱和红绿蓝(RGB)图像的作物病害识别方法不仅在不损害作物的情况下进行病害识别,而且具有较高的识别精度和速度,从根本上解决了人工检测和化学控制的问题。本文综述了基于光谱和RGB图像的疾病识别方法的研究,文献跨度从2020年到2025年初。与以前的调查不同,本文回顾了最近在作物病害识别方面涉及新兴范式的进展,如状态空间模型(例如,Mamba)和生成人工智能。此外,还介绍了用于作物病害鉴定的公共数据集和常用评价指标。最后,本文讨论了研究过程中可能遇到的问题和解决方案,包括使用扩散模型进行数据增强。希望本调查能帮助读者了解目前作物病害检测的方法和有效性,启发开发更有效的方法来帮助农民识别作物病害。
{"title":"A Survey of Crop Disease Recognition Methods Based on Spectral and RGB Images.","authors":"Haoze Zheng, Heran Wang, Hualong Dong, Yurong Qian","doi":"10.3390/jimaging12020066","DOIUrl":"10.3390/jimaging12020066","url":null,"abstract":"<p><p>Major crops worldwide are affected by various diseases yearly, leading to crop losses in different regions. The primary methods for addressing crop disease losses include manual inspection and chemical control. However, traditional manual inspection methods are time-consuming, labor-intensive, and require specialized knowledge. The preemptive use of chemicals also poses a risk of soil pollution, which may cause irreversible damage. With the advancement of computer hardware, photographic technology, and artificial intelligence, crop disease recognition methods based on spectral and red-green-blue (RGB) images not only recognize diseases without damaging the crops but also offer high accuracy and speed of recognition, essentially solving the problems associated with manual inspection and chemical control. This paper summarizes the research on disease recognition methods based on spectral and RGB images, with the literature spanning from 2020 through early 2025. Unlike previous surveys, this paper reviews recent advances involving emerging paradigms such as State Space Models (e.g., Mamba) and Generative AI in the context of crop disease recognition. In addition, it introduces public datasets and commonly used evaluation metrics for crop disease identification. Finally, the paper discusses potential issues and solutions encountered during research, including the use of diffusion models for data augmentation. Hopefully, this survey will help readers understand the current methods and effectiveness of crop disease detection, inspiring the development of more effective methods to assist farmers in identifying crop diseases.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12942047/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIFT-SNN for Traffic-Flow Infrastructure Safety: A Real-Time Context-Aware Anomaly Detection Framework. 基于SIFT-SNN的交通流基础设施安全:实时上下文感知异常检测框架。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-01-31 DOI: 10.3390/jimaging12020064
Munish Rathee, Boris Bačić, Maryam Doborjeh

Automated anomaly detection in transportation infrastructure is essential for enhancing safety and reducing the operational costs associated with manual inspection protocols. This study presents an improved neuromorphic vision system, which extends the prior SIFT-SNN (scale-invariant feature transform-spiking neural network) proof-of-concept by incorporating temporal feature aggregation for context-aware and sequence-stable detection. Analysis of classical stitching-based pipelines exposed sensitivity to motion and lighting variations, motivating the proposed temporally smoothed neuromorphic design. SIFT keypoints are encoded into latency-based spike trains and classified using a leaky integrate-and-fire (LIF) spiking neural network implemented in PyTorch. Evaluated across three hardware configurations-an NVIDIA RTX 4060 GPU, an Intel i7 CPU, and a simulated Jetson Nano-the system achieved 92.3% accuracy and a macro F1 score of 91.0% under five-fold cross-validation. Inference latencies were measured at 9.5 ms, 26.1 ms, and ~48.3 ms per frame, respectively. Memory footprints were under 290 MB, and power consumption was estimated to be between 5 and 65 W. The classifier distinguishes between safe, partially dislodged, and fully dislodged barrier pins, which are critical failure modes for the Auckland Harbour Bridge's Movable Concrete Barrier (MCB) system. Temporal smoothing further improves recall for ambiguous cases. By achieving a compact model size (2.9 MB), low-latency inference, and minimal power demands, the proposed framework offers a deployable, interpretable, and energy-efficient alternative to conventional CNN-based inspection tools. Future work will focus on exploring the generalisability and transferability of the work presented, additional input sources, and human-computer interaction paradigms for various deployment infrastructures and advancements.

交通基础设施中的自动异常检测对于提高安全性和降低与人工检查协议相关的运营成本至关重要。本研究提出了一种改进的神经形态视觉系统,该系统通过结合用于上下文感知和序列稳定检测的时间特征聚合,扩展了先前的SIFT-SNN(尺度不变特征变换-峰值神经网络)概念验证。经典的基于缝合的管道的分析暴露了对运动和光照变化的敏感性,激发了提出的暂时平滑的神经形态设计。SIFT关键点被编码成基于延迟的尖峰序列,并使用PyTorch中实现的泄漏集成与发射(LIF)尖峰神经网络进行分类。通过三种硬件配置(NVIDIA RTX 4060 GPU, Intel i7 CPU和模拟Jetson nano)进行评估,系统在五次交叉验证下实现了92.3%的准确率和91.0%的宏F1分数。推断延迟分别为每帧9.5 ms、26.1 ms和~48.3 ms。内存占用在290 MB以下,功耗估计在5到65 W之间。分类器区分了安全、部分移位和完全移位的屏障销,这是奥克兰海港大桥可移动混凝土屏障(MCB)系统的关键失效模式。时间平滑进一步提高了模糊情况下的召回率。通过实现紧凑的模型尺寸(2.9 MB)、低延迟推理和最小的功耗需求,所提出的框架为传统的基于cnn的检测工具提供了可部署、可解释和节能的替代方案。未来的工作将侧重于探索所提出工作的通用性和可转移性、额外的输入源以及各种部署基础设施和进步的人机交互范例。
{"title":"SIFT-SNN for Traffic-Flow Infrastructure Safety: A Real-Time Context-Aware Anomaly Detection Framework.","authors":"Munish Rathee, Boris Bačić, Maryam Doborjeh","doi":"10.3390/jimaging12020064","DOIUrl":"10.3390/jimaging12020064","url":null,"abstract":"<p><p>Automated anomaly detection in transportation infrastructure is essential for enhancing safety and reducing the operational costs associated with manual inspection protocols. This study presents an improved neuromorphic vision system, which extends the prior SIFT-SNN (scale-invariant feature transform-spiking neural network) proof-of-concept by incorporating temporal feature aggregation for context-aware and sequence-stable detection. Analysis of classical stitching-based pipelines exposed sensitivity to motion and lighting variations, motivating the proposed temporally smoothed neuromorphic design. SIFT keypoints are encoded into latency-based spike trains and classified using a leaky integrate-and-fire (LIF) spiking neural network implemented in PyTorch. Evaluated across three hardware configurations-an NVIDIA RTX 4060 GPU, an Intel i7 CPU, and a simulated Jetson Nano-the system achieved 92.3% accuracy and a macro F1 score of 91.0% under five-fold cross-validation. Inference latencies were measured at 9.5 ms, 26.1 ms, and ~48.3 ms per frame, respectively. Memory footprints were under 290 MB, and power consumption was estimated to be between 5 and 65 W. The classifier distinguishes between safe, partially dislodged, and fully dislodged barrier pins, which are critical failure modes for the Auckland Harbour Bridge's Movable Concrete Barrier (MCB) system. Temporal smoothing further improves recall for ambiguous cases. By achieving a compact model size (2.9 MB), low-latency inference, and minimal power demands, the proposed framework offers a deployable, interpretable, and energy-efficient alternative to conventional CNN-based inspection tools. Future work will focus on exploring the generalisability and transferability of the work presented, additional input sources, and human-computer interaction paradigms for various deployment infrastructures and advancements.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12942226/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cross-Domain Benchmark of Intrinsic and Post Hoc Explainability for 3D Deep Learning Models. 三维深度学习模型的内在可解释性和事后可解释性的跨领域基准。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-01-30 DOI: 10.3390/jimaging12020063
Asmita Chakraborty, Gizem Karagoz, Nirvana Meratnia

Deep learning models for three-dimensional (3D) data are increasingly used in domains such as medical imaging, object recognition, and robotics. At the same time, the use of AI in these domains is increasing, while, due to their black-box nature, the need for explainability has grown significantly. However, the lack of standardized and quantitative benchmarks for explainable artificial intelligence (XAI) in 3D data limits the reliable comparison of explanation quality. In this paper, we present a unified benchmarking framework to evaluate both intrinsic and post hoc XAI methods across three representative 3D datasets: volumetric CT scans (MosMed), voxelized CAD models (ModelNet40), and real-world point clouds (ScanObjectNN). The evaluated methods include Grad-CAM, Integrated Gradients, Saliency, Occlusion, and the intrinsic ResAttNet-3D model. We quantitatively assess explanations using the Correctness (AOPC), Completeness (AUPC), and Compactness metrics, consistently applied across all datasets. Our results show that explanation quality significantly varies across methods and domains, demonstrating that Grad-CAM and intrinsic attention performed best on medical CT scans, while gradient-based methods excelled on voxelized and point-based data. Statistical tests (Kruskal-Wallis and Mann-Whitney U) confirmed significant performance differences between methods. No single approach achieved superior results across all domains, highlighting the importance of multi-metric evaluation. This work provides a reproducible framework for standardized assessment of 3D explainability and comparative insights to guide future XAI method selection.

三维(3D)数据的深度学习模型越来越多地应用于医学成像、物体识别和机器人等领域。与此同时,人工智能在这些领域的使用正在增加,同时,由于它们的黑箱性质,对可解释性的需求显着增长。然而,3D数据中可解释人工智能(XAI)缺乏标准化和定量的基准,限制了解释质量的可靠比较。在本文中,我们提出了一个统一的基准框架来评估三个代表性3D数据集的内在和即时性XAI方法:体积CT扫描(MosMed),体素化CAD模型(ModelNet40)和现实世界的点云(ScanObjectNN)。评估的方法包括Grad-CAM、Integrated Gradients、Saliency、Occlusion和固有的ResAttNet-3D模型。我们使用正确性(AOPC)、完整性(AUPC)和紧凑性指标对解释进行定量评估,这些指标一致地应用于所有数据集。我们的研究结果表明,解释质量在不同的方法和领域之间存在显著差异,表明Grad-CAM和内在关注在医学CT扫描上表现最好,而基于梯度的方法在体素化和基于点的数据上表现出色。统计检验(Kruskal-Wallis和Mann-Whitney U)证实了不同方法之间的显著性能差异。没有一种方法在所有领域都能取得优异的结果,这突出了多指标评估的重要性。这项工作为3D可解释性的标准化评估和比较见解提供了一个可重复的框架,以指导未来的XAI方法选择。
{"title":"A Cross-Domain Benchmark of Intrinsic and Post Hoc Explainability for 3D Deep Learning Models.","authors":"Asmita Chakraborty, Gizem Karagoz, Nirvana Meratnia","doi":"10.3390/jimaging12020063","DOIUrl":"10.3390/jimaging12020063","url":null,"abstract":"<p><p>Deep learning models for three-dimensional (3D) data are increasingly used in domains such as medical imaging, object recognition, and robotics. At the same time, the use of AI in these domains is increasing, while, due to their black-box nature, the need for explainability has grown significantly. However, the lack of standardized and quantitative benchmarks for explainable artificial intelligence (XAI) in 3D data limits the reliable comparison of explanation quality. In this paper, we present a unified benchmarking framework to evaluate both intrinsic and post hoc XAI methods across three representative 3D datasets: volumetric CT scans (MosMed), voxelized CAD models (ModelNet40), and real-world point clouds (ScanObjectNN). The evaluated methods include Grad-CAM, Integrated Gradients, Saliency, Occlusion, and the intrinsic ResAttNet-3D model. We quantitatively assess explanations using the Correctness (AOPC), Completeness (AUPC), and Compactness metrics, consistently applied across all datasets. Our results show that explanation quality significantly varies across methods and domains, demonstrating that Grad-CAM and intrinsic attention performed best on medical CT scans, while gradient-based methods excelled on voxelized and point-based data. Statistical tests (Kruskal-Wallis and Mann-Whitney U) confirmed significant performance differences between methods. No single approach achieved superior results across all domains, highlighting the importance of multi-metric evaluation. This work provides a reproducible framework for standardized assessment of 3D explainability and comparative insights to guide future XAI method selection.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12941976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AACNN-ViT: Adaptive Attention-Augmented Convolutional and Vision Transformer Fusion for Lung Cancer Detection. AACNN-ViT:自适应注意增强卷积与视觉变换融合用于肺癌检测。
IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY Pub Date : 2026-01-30 DOI: 10.3390/jimaging12020062
Mohammad Ishtiaque Rahman, Amrina Rahman

Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented Convolutional Neural Network with Vision Transformer (AACNN-ViT), a hybrid framework that integrates local convolutional representations with global transformer embeddings through an adaptive attention-based fusion module. The CNN branch captures fine-grained spatial patterns, the ViT branch encodes long-range contextual dependencies, and the adaptive fusion mechanism learns to weight cross-representation interactions to improve discriminability. To reduce the impact of imbalance, a hybrid objective that combines focal loss with categorical cross-entropy is incorporated during training. Experiments on the IQ-OTH/NCCD dataset (benign, malignant, and normal) show consistent performance progression in an ablation-style evaluation: CNN-only, ViT-only, CNN-ViT concatenation, and AACNN-ViT. The proposed AACNN-ViT achieved 96.97% accuracy on the validation set with macro-averaged precision/recall/F1 of 0.9588/0.9352/0.9458 and weighted F1 of 0.9693, substantially improving minority-class recognition (Benign recall 0.8333) compared with CNN-ViT (accuracy 89.09%, macro-F1 0.7680). One-vs.-rest ROC analysis further indicates strong separability across all classes (micro-average AUC 0.992). These results suggest that adaptive attention-based fusion offers a robust and clinically relevant approach for computer-aided lung cancer screening and decision support.

肺癌仍然是癌症相关死亡的主要原因。尽管通过CT图像对肺部病变进行可靠的多类别分类对于早期诊断至关重要,但由于类别间的细微差异、样本量有限和类别不平衡,这仍然具有挑战性。我们提出了一种带有视觉变压器的自适应注意力增强卷积神经网络(AACNN-ViT),这是一种混合框架,通过基于自适应注意力的融合模块将局部卷积表示与全局变压器嵌入集成在一起。CNN分支捕获细粒度空间模式,ViT分支编码远程上下文依赖,自适应融合机制学习加权交叉表示交互以提高可辨别性。为了减少不平衡的影响,在训练过程中加入了一个混合目标,将焦点损失与分类交叉熵相结合。在IQ-OTH/NCCD数据集(良性、恶性和正常)上的实验显示,消融式评估的性能进展一致:CNN-only、ViT-only、CNN-ViT拼接和AACNN-ViT。本文提出的AACNN-ViT在验证集上的准确率达到96.97%,宏观平均精度/召回率/F1为0.9588/0.9352/0.9458,加权F1为0.9693,与CNN-ViT(准确率89.09%,宏观F1为0.7680)相比,少数类识别(良性召回率0.8333)有显著提高。One-vs。其余ROC分析进一步表明,所有类别之间的可分离性很强(微平均AUC 0.992)。这些结果表明,适应性注意力融合为计算机辅助肺癌筛查和决策支持提供了一种强大的临床相关方法。
{"title":"AACNN-ViT: Adaptive Attention-Augmented Convolutional and Vision Transformer Fusion for Lung Cancer Detection.","authors":"Mohammad Ishtiaque Rahman, Amrina Rahman","doi":"10.3390/jimaging12020062","DOIUrl":"10.3390/jimaging12020062","url":null,"abstract":"<p><p>Lung cancer remains a leading cause of cancer-related mortality. Although reliable multiclass classification of lung lesions from CT imaging is essential for early diagnosis, it remains challenging due to subtle inter-class differences, limited sample sizes, and class imbalance. We propose an Adaptive Attention-Augmented Convolutional Neural Network with Vision Transformer (AACNN-ViT), a hybrid framework that integrates local convolutional representations with global transformer embeddings through an adaptive attention-based fusion module. The CNN branch captures fine-grained spatial patterns, the ViT branch encodes long-range contextual dependencies, and the adaptive fusion mechanism learns to weight cross-representation interactions to improve discriminability. To reduce the impact of imbalance, a hybrid objective that combines focal loss with categorical cross-entropy is incorporated during training. Experiments on the IQ-OTH/NCCD dataset (benign, malignant, and normal) show consistent performance progression in an ablation-style evaluation: CNN-only, ViT-only, CNN-ViT concatenation, and AACNN-ViT. The proposed AACNN-ViT achieved 96.97% accuracy on the validation set with macro-averaged precision/recall/F1 of 0.9588/0.9352/0.9458 and weighted F1 of 0.9693, substantially improving minority-class recognition (Benign recall 0.8333) compared with CNN-ViT (accuracy 89.09%, macro-F1 0.7680). One-vs.-rest ROC analysis further indicates strong separability across all classes (micro-average AUC 0.992). These results suggest that adaptive attention-based fusion offers a robust and clinically relevant approach for computer-aided lung cancer screening and decision support.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"12 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12941408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1