首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
ADcFNet-deep learning based facial expression identification using FER vision transformer 基于adcfnet深度学习的人脸表情识别
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-09 DOI: 10.1016/j.jvcir.2025.104637
M. Anand, S. Babu
The stacked Gaussian blur edge detection filter (S-Gbed) is used for filtering. Dynamic Histogram equalization (DHE) is used to improve the contrast of an image. Then, a triple attention-assisted FER-vision transformer (T-FERViT) is used for feature extraction, and optimal features are selected using the Honey Badger chaotic optimization (HbcOa) algorithm. Finally, facial expressions are classified based on emotion using African vulture assisted Depth convolutional stacked Long Short-Term Memory (LSTM) Frame Attention network (ADcFNet). The African Vulture Optimization Algorithm is used to optimize the network’s loss function. Karolinska Directed Emotional Face dataset (KDEF), Face Expression Recognition-2013 (FER-2013) dataset, and facial emotion dataset are used to evaluate the ADcFNet model. Then, the overall performance of the proposed model is compared with other existing models to describe its superiority. The ADcFNet model attained 99.17%, 91.6%, and 95.9% accuracy in terms of KDEF, FER-2013, and facial emotion datasets, respectively.
采用叠加高斯模糊边缘检测滤波器(S-Gbed)进行滤波。动态直方图均衡化(DHE)用于提高图像的对比度。在此基础上,利用三次注意辅助视觉变换(T-FERViT)进行特征提取,并利用蜜獾混沌优化(HbcOa)算法选择最优特征;最后,利用非洲秃鹫辅助深度卷积堆叠长短期记忆(LSTM)框架注意网络(ADcFNet)对面部表情进行情感分类。采用非洲秃鹫优化算法对网络的损失函数进行优化。使用卡罗林斯卡定向情绪面部数据集(KDEF)、面部表情识别-2013 (FER-2013)数据集和面部情绪数据集对ADcFNet模型进行评估。然后,将该模型的整体性能与其他现有模型进行比较,以描述其优越性。ADcFNet模型在KDEF、FER-2013和面部情绪数据集上的准确率分别达到99.17%、91.6%和95.9%。
{"title":"ADcFNet-deep learning based facial expression identification using FER vision transformer","authors":"M. Anand,&nbsp;S. Babu","doi":"10.1016/j.jvcir.2025.104637","DOIUrl":"10.1016/j.jvcir.2025.104637","url":null,"abstract":"<div><div>The stacked Gaussian blur edge detection filter (S-Gbed) is used for filtering. Dynamic Histogram equalization (DHE) is used to improve the contrast of an image. Then, a triple attention-assisted FER-vision transformer (T-FERViT) is used for feature extraction, and optimal features are selected using the Honey Badger chaotic optimization (HbcOa) algorithm. Finally, facial expressions are classified based on emotion using African vulture assisted Depth convolutional stacked Long Short-Term Memory (LSTM) Frame Attention network (ADcFNet). The African Vulture Optimization Algorithm is used to optimize the network’s loss function. Karolinska Directed Emotional Face dataset (KDEF), Face Expression Recognition-2013 (FER-2013) dataset, and facial emotion dataset are used to evaluate the ADcFNet model. Then, the overall performance of the proposed model is compared with other existing models to describe its superiority. The ADcFNet model attained 99.17%, 91.6%, and 95.9% accuracy in terms of KDEF, FER-2013, and facial emotion datasets, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104637"},"PeriodicalIF":3.1,"publicationDate":"2025-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-optimized two-stage Camouflaged Object Detection 双优化的两阶段伪装目标检测
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-07 DOI: 10.1016/j.jvcir.2025.104631
Sanxin Jiang, Hongliang Zhang, Changde Ding
To address the current issues of inaccurate object localization and insufficient edge information extraction in Camouflaged Object Detection (COD), inspired by how humans detect camouflaged objects—first identifying their general outline and then focusing on finer details—we propose a novel two-stage network, DONet, for COD. In the first stage, the network leverages an Edge Exploration Module (EEM) to locate object boundaries, refining this boundary information through Retrieve Attention. Subsequently, the Object Position Recognition Module (OPRM) detects the horizontal and vertical locations of camouflaged objects by integrating boundary information with high-level features. This information is further enhanced by combining multi-dilation channels and neighboring features. In the second stage, a Context Aggregation Module (CAM) is used to aggregate contextual information, improving detection accuracy. Extensive experiments demonstrate that DONet surpasses 16 state-of-the-art methods across three challenging datasets, highlighting its effectiveness and superior performance. In addition, DONet also has outstanding detection performance in the field of medical polyp segmentation.
为了解决当前伪装对象检测(COD)中目标定位不准确和边缘信息提取不足的问题,受人类检测伪装对象的方式(首先识别其大致轮廓,然后关注更精细的细节)的启发,我们提出了一种新的两阶段网络,DONet,用于COD。在第一阶段,该网络利用边缘探索模块(EEM)来定位目标边界,并通过检索注意力来细化该边界信息。随后,目标位置识别模块(OPRM)通过融合边界信息和高层特征,检测被伪装目标的水平和垂直位置。结合多扩张通道和邻近特征,该信息进一步增强。在第二阶段,使用上下文聚合模块(CAM)对上下文信息进行聚合,提高检测精度。广泛的实验表明,DONet在三个具有挑战性的数据集上超过了16种最先进的方法,突出了其有效性和卓越的性能。此外,DONet在医学息肉分割领域也有出色的检测性能。
{"title":"Dual-optimized two-stage Camouflaged Object Detection","authors":"Sanxin Jiang,&nbsp;Hongliang Zhang,&nbsp;Changde Ding","doi":"10.1016/j.jvcir.2025.104631","DOIUrl":"10.1016/j.jvcir.2025.104631","url":null,"abstract":"<div><div>To address the current issues of inaccurate object localization and insufficient edge information extraction in Camouflaged Object Detection (COD), inspired by how humans detect camouflaged objects—first identifying their general outline and then focusing on finer details—we propose a novel two-stage network, DONet, for COD. In the first stage, the network leverages an Edge Exploration Module (EEM) to locate object boundaries, refining this boundary information through Retrieve Attention. Subsequently, the Object Position Recognition Module (OPRM) detects the horizontal and vertical locations of camouflaged objects by integrating boundary information with high-level features. This information is further enhanced by combining multi-dilation channels and neighboring features. In the second stage, a Context Aggregation Module (CAM) is used to aggregate contextual information, improving detection accuracy. Extensive experiments demonstrate that DONet surpasses 16 state-of-the-art methods across three challenging datasets, highlighting its effectiveness and superior performance. In addition, DONet also has outstanding detection performance in the field of medical polyp segmentation.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104631"},"PeriodicalIF":3.1,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAN semantics for personalized facial beauty synthesis and enhancement 个性化面部美合成与增强的GAN语义
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-07 DOI: 10.1016/j.jvcir.2025.104640
Irina Lebedeva , Fangli Ying , Yi Guo , Taihao Li
Generative adversarial networks (GANs) whose popularity and scope of applications continue to grow, have already demonstrated impressive results in human face image processing. Face aging, completion, attribute transfer, and synthesis are not the only examples of the successful implementation of GANs. Although, beauty enhancement and face generation with conditioning on attractiveness level are also among the applications of GANs, it has been investigated only from the universal or generic point of view, and there are no studies addressed to the personalized aspect of these issues. In this work, this gap is filled and a generative framework that synthesizes a realistic human face that is based on an individual’s beauty preferences is introduced. To this end, StyleGAN’s properties and the capacities of semantic face manipulation in its latent space are studied and utilized. Beyond the face generation, the proposed framework is able to enhance a beauty level on a real face according to personal beauty preferences. Extensive experiments are conducted on two publicly available facial beauty datasets with different properties in terms of images and raters, SCUT-FBP5500 and multi-ethnic MEBeauty. The quantitative evaluations demonstrate the effectiveness of the proposed framework and its advantages compared to the state-of-the-art, while the qualitative evaluations also reveal and illustrate interesting social and cultural patterns in personal beauty preferences.
生成对抗网络(GANs)的普及和应用范围不断扩大,已经在人脸图像处理方面取得了令人印象深刻的成果。人脸老化、补全、属性转移和合成并不是gan成功实现的唯一例子。尽管基于吸引力水平的美容增强和面部生成也是gan的应用之一,但它仅从通用或通用的角度进行了研究,并且没有针对这些问题的个性化方面的研究。在这项工作中,填补了这一空白,并引入了一个生成框架,根据个人的审美偏好合成现实的人脸。为此,研究并利用了StyleGAN的属性及其潜在空间的语义脸操作能力。除了人脸生成之外,所提出的框架还能够根据个人的审美偏好增强真实人脸的美丽程度。在SCUT-FBP5500和多民族MEBeauty两个公开的面部美容数据集上进行了大量的实验,这些数据集在图像和评分者方面具有不同的属性。定量评估显示了所提出的框架的有效性及其与最新技术相比的优势,而定性评估也揭示并说明了个人审美偏好中有趣的社会和文化模式。
{"title":"GAN semantics for personalized facial beauty synthesis and enhancement","authors":"Irina Lebedeva ,&nbsp;Fangli Ying ,&nbsp;Yi Guo ,&nbsp;Taihao Li","doi":"10.1016/j.jvcir.2025.104640","DOIUrl":"10.1016/j.jvcir.2025.104640","url":null,"abstract":"<div><div>Generative adversarial networks (GANs) whose popularity and scope of applications continue to grow, have already demonstrated impressive results in human face image processing. Face aging, completion, attribute transfer, and synthesis are not the only examples of the successful implementation of GANs. Although, beauty enhancement and face generation with conditioning on attractiveness level are also among the applications of GANs, it has been investigated only from the universal or generic point of view, and there are no studies addressed to the personalized aspect of these issues. In this work, this gap is filled and a generative framework that synthesizes a realistic human face that is based on an individual’s beauty preferences is introduced. To this end, StyleGAN’s properties and the capacities of semantic face manipulation in its latent space are studied and utilized. Beyond the face generation, the proposed framework is able to enhance a beauty level on a real face according to personal beauty preferences. Extensive experiments are conducted on two publicly available facial beauty datasets with different properties in terms of images and raters, SCUT-FBP5500 and multi-ethnic MEBeauty. The quantitative evaluations demonstrate the effectiveness of the proposed framework and its advantages compared to the state-of-the-art, while the qualitative evaluations also reveal and illustrate interesting social and cultural patterns in personal beauty preferences.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104640"},"PeriodicalIF":3.1,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Copy-move forgery detection of social media images using tendency sparsity filtering and variable cluster spectral clustering 基于趋势稀疏滤波和可变聚类光谱聚类的社交媒体图像复制-移动伪造检测
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-07 DOI: 10.1016/j.jvcir.2025.104635
Cong Lin , Hai Yang , Ke Huang , Daqiang Long , Yuke Zhong , Yuqiao Deng , Yamin Wen
Copy-move forgery is a common way of image tampering. In reality, most of the images encountered are compressed by social media. Based on this, a copy-move forgery detection method of social media images based on tendency sparsity (TS) filtering and variable cluster spectral clustering (VCS clustering) is proposed. First, we normalize the image scale to obtain the sufficient number of keypoints. To accelerate the matching speed, the hierarchical matching method is adopted. Next, the TS filtering is applied to remove the preference set (PS) vectors that do not meet the condition. To estimate the good affine transformation, the PS vectors are clustered using the VCS clustering. Finally, the tampering location result is output. Through comparative experiments on several public uncompressed datasets, as well as datasets compressed by social media, it has been proven that the proposed method has good robustness in detecting social media images, outperforming the state-of-the-art methods.
复制-移动伪造是一种常见的图像篡改方式。在现实中,大多数遇到的图像都被社交媒体压缩了。在此基础上,提出了一种基于趋势稀疏度(TS)滤波和变聚类光谱聚类(VCS)聚类的社交媒体图像复制-移动伪造检测方法。首先,对图像尺度进行归一化,得到足够数量的关键点。为了加快匹配速度,采用了分层匹配方法。接下来,应用TS滤波去除不满足条件的偏好集(PS)向量。为了估计好的仿射变换,使用VCS聚类对PS向量进行聚类。最后输出篡改定位结果。通过对几个公开的未压缩数据集和社交媒体压缩数据集的对比实验,证明了该方法在社交媒体图像检测方面具有良好的鲁棒性,优于现有的方法。
{"title":"Copy-move forgery detection of social media images using tendency sparsity filtering and variable cluster spectral clustering","authors":"Cong Lin ,&nbsp;Hai Yang ,&nbsp;Ke Huang ,&nbsp;Daqiang Long ,&nbsp;Yuke Zhong ,&nbsp;Yuqiao Deng ,&nbsp;Yamin Wen","doi":"10.1016/j.jvcir.2025.104635","DOIUrl":"10.1016/j.jvcir.2025.104635","url":null,"abstract":"<div><div>Copy-move forgery is a common way of image tampering. In reality, most of the images encountered are compressed by social media. Based on this, a copy-move forgery detection method of social media images based on tendency sparsity (TS) filtering and variable cluster spectral clustering (VCS clustering) is proposed. First, we normalize the image scale to obtain the sufficient number of keypoints. To accelerate the matching speed, the hierarchical matching method is adopted. Next, the TS filtering is applied to remove the preference set (PS) vectors that do not meet the condition. To estimate the good affine transformation, the PS vectors are clustered using the VCS clustering. Finally, the tampering location result is output. Through comparative experiments on several public uncompressed datasets, as well as datasets compressed by social media, it has been proven that the proposed method has good robustness in detecting social media images, outperforming the state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104635"},"PeriodicalIF":3.1,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-layer graph constraint dictionary pair learning for image classification 用于图像分类的多层图约束字典对学习
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-06 DOI: 10.1016/j.jvcir.2025.104638
Yulin Sun , Guangming Shi , Weisheng Dong , Xuemei Xie
Multi-layer dictionary learning (MDL) has demonstrated significantly improved performance for image classification. However, most of the existing MDL methods just overall shared dictionary learning architecture, which weakens the discrimination ability of the dictionaries. For this, we proposed a powerful framework called the Multi-layer Graph Constraint Dictionary Pair Learning (MGDPL). Our MGDPL integrates multi-layer dictionary pair learning, structure graph constraint, and discrimination sparse representations into a unified framework. First, the multi-layer structured dictionary learning mechanism is applied to dictionary pairs to enhance the discrimination performance by rebuilding the reconstruction error of the previous layer via the latter layer. Second, it subjects the structure graph constraint on the sub-sparse representations to ensure the discrimination capability of the near neighbor graph. Third, the multi-layer discriminant graph regularized constraint term can ensure high intra-class tightness and inter-class dispersion of dictionary atoms in reconstruction space. Extensive experiments show that MGDPL can achieve excellent performance over other state-of-the-arts.
多层字典学习(MDL)在图像分类方面的性能得到了显著提高。然而,现有的MDL方法大多只是整体共享字典学习架构,这削弱了字典的识别能力。为此,我们提出了一个强大的框架,称为多层图约束字典对学习(MGDPL)。我们的MGDPL将多层字典对学习、结构图约束和判别稀疏表示集成到一个统一的框架中。首先,将多层结构化字典学习机制应用于字典对,通过后一层重建前一层的重建误差来提高识别性能。其次,对子稀疏表示进行结构图约束,保证近邻图的识别能力;第三,多层判别图正则化约束项可以保证字典原子在重构空间中的高类内紧密性和类间弥散性。大量实验表明,MGDPL的性能优于其他先进技术。
{"title":"Multi-layer graph constraint dictionary pair learning for image classification","authors":"Yulin Sun ,&nbsp;Guangming Shi ,&nbsp;Weisheng Dong ,&nbsp;Xuemei Xie","doi":"10.1016/j.jvcir.2025.104638","DOIUrl":"10.1016/j.jvcir.2025.104638","url":null,"abstract":"<div><div>Multi-layer dictionary learning (MDL) has demonstrated significantly improved performance for image classification. However, most of the existing MDL methods just overall shared dictionary learning architecture, which weakens the discrimination ability of the dictionaries. For this, we proposed a powerful framework called the Multi-layer Graph Constraint Dictionary Pair Learning (MGDPL). Our MGDPL integrates multi-layer dictionary pair learning, structure graph constraint, and discrimination sparse representations into a unified framework. First, the multi-layer structured dictionary learning mechanism is applied to dictionary pairs to enhance the discrimination performance by rebuilding the reconstruction error of the previous layer via the latter layer. Second, it subjects the structure graph constraint on the sub-sparse representations to ensure the discrimination capability of the near neighbor graph. Third, the multi-layer discriminant graph regularized constraint term can ensure high intra-class tightness and inter-class dispersion of dictionary atoms in reconstruction space. Extensive experiments show that MGDPL can achieve excellent performance over other state-of-the-arts.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104638"},"PeriodicalIF":3.1,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring a Non-Parametric Uncertain Adaptive training method for facial expression recognition 探索一种非参数不确定自适应面部表情识别方法
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-06 DOI: 10.1016/j.jvcir.2025.104636
Renhao Sun , Chaoqun Wang , Yujian Wang
In facial expression recognition, the uncertainties impregnated by ambiguous facial expressions and subjectiveness of annotators lead to inter-class similarity and intra-class diversity among annotated samples, which in turn leads to deterioration of recognition results. To mitigate bad performance due to uncertainties, we explore a Non-Parametric Uncertain Adaptive (NoPUA) method during the training process to suppress ambiguous samples for facial expression recognition. Specifically, we first propose a self-paced feature bank module on mini-batches to calculate the top-K similarity rank for each training sample, and then design a sample-to-class weighting score module based on the similarity rank to grade the different categories with respect to the similarity classes of the samples themselves. Finally, we modify the labels of each uncertain sample using the self-adaptive relabeling module for multi-category scoring described above. Our method is non-parametric and easy to implement. Moreover, it is model-agnostic. Extensive experiments on three public benchmarks (RAF-DB, FERPlus, AffectNet) validate the effectiveness of our NoPUA embedded in a variety of algorithms (baseline, SCN, RUL, EAC, DAN, POSTER++) and achieve better performance.
在面部表情识别中,由于面部表情的模糊性和标注者的主观性所带来的不确定性,导致标注样本的类间相似性和类内多样性,从而导致识别结果的恶化。为了减轻由于不确定性导致的性能差,我们在训练过程中探索了一种非参数不确定自适应(NoPUA)方法来抑制模糊样本用于面部表情识别。具体而言,我们首先提出了一个基于mini-batch的自定节奏特征库模块,用于计算每个训练样本的top-K相似度排名,然后设计一个基于相似度排名的样本到类别加权评分模块,对不同类别相对于样本本身的相似度类别进行评分。最后,我们使用上述多类别评分的自适应重标记模块修改每个不确定样本的标签。我们的方法是非参数的,易于实现。此外,它是模型不可知论的。在三个公共基准(RAF-DB, FERPlus, AffectNet)上进行的大量实验验证了我们的NoPUA嵌入各种算法(基线,SCN, RUL, EAC, DAN, POSTER++)的有效性,并取得了更好的性能。
{"title":"Exploring a Non-Parametric Uncertain Adaptive training method for facial expression recognition","authors":"Renhao Sun ,&nbsp;Chaoqun Wang ,&nbsp;Yujian Wang","doi":"10.1016/j.jvcir.2025.104636","DOIUrl":"10.1016/j.jvcir.2025.104636","url":null,"abstract":"<div><div>In facial expression recognition, the uncertainties impregnated by ambiguous facial expressions and subjectiveness of annotators lead to inter-class similarity and intra-class diversity among annotated samples, which in turn leads to deterioration of recognition results. To mitigate bad performance due to uncertainties, we explore a <strong>No</strong>n-<strong>P</strong>arametric <strong>U</strong>ncertain <strong>A</strong>daptive (NoPUA) method during the training process to suppress ambiguous samples for facial expression recognition. Specifically, we first propose a <em>self-paced feature bank module</em> on mini-batches to calculate the top-<em>K</em> similarity rank for each training sample, and then design a <em>sample-to-class weighting score module</em> based on the similarity rank to grade the different categories with respect to the similarity classes of the samples themselves. Finally, we modify the labels of each uncertain sample using the self-adaptive relabeling module for multi-category scoring described above. Our method is non-parametric and easy to implement. Moreover, it is model-agnostic. Extensive experiments on three public benchmarks (RAF-DB, FERPlus, AffectNet) validate the effectiveness of our NoPUA embedded in a variety of algorithms (baseline, SCN, RUL, EAC, DAN, POSTER++) and achieve better performance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104636"},"PeriodicalIF":3.1,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast adaptive QTMT partitioning for intra 360°video coding based on gradient boosted trees 基于梯度增强树的360°视频编码快速自适应QTMT分割
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-06 DOI: 10.1016/j.jvcir.2025.104629
Jose N. Filipe , Luis M.N. Tavora , Sergio M.M. Faria , Antonio Navarro , Pedro A.A. Assuncao
The rising demand for UHD and 360°content has driven the creation of advanced compression tools with enhanced coding efficiency. Versatile Video Coding (VVC) has recently improved coding efficiency over previous standards, but introduces significantly higher computational complexity. To address this, this paper presents a novel intra-coding method for 360°video in Equirectangular Projection (ERP) format that reduces complexity with minimal impact on coding efficiency. It shows that the North, Equator, and South regions of ERP images exhibit distinct complexity and spatial characteristics. A region-based approach uses multiple Gradient Boosted Trees models for each region to determine if a partition type can be skipped. Additionally, an adaptive decision threshold scheme is introduced to optimise vertical partitioning in polar regions. The paper also presents an optimisation solution for the Complexity/BD-Rate loss trade-off parameters. Experimental results demonstrate a 50% complexity gain with only a 0.37% BD-Rate loss, outperforming current state-of-the-art methods.
对超高清(UHD)和360°内容的需求不断增长,推动了先进压缩工具的开发,提高了编码效率。通用视频编码(VVC)最近比以前的标准提高了编码效率,但引入了显着更高的计算复杂度。为了解决这一问题,本文提出了一种新的360°等矩形投影(ERP)格式视频的内编码方法,该方法在降低编码复杂度的同时对编码效率的影响最小。结果表明,ERP图像的北、赤道和南区域具有明显的复杂性和空间特征。基于区域的方法为每个区域使用多个梯度增强树模型来确定是否可以跳过分区类型。此外,引入自适应决策阈值方案来优化极坐标区域的垂直划分。本文还提出了复杂度/BD-Rate损失权衡参数的优化方案。实验结果表明,该方法的复杂度提高了50%,而BD-Rate的损失仅为0.37%,优于目前最先进的方法。
{"title":"Fast adaptive QTMT partitioning for intra 360°video coding based on gradient boosted trees","authors":"Jose N. Filipe ,&nbsp;Luis M.N. Tavora ,&nbsp;Sergio M.M. Faria ,&nbsp;Antonio Navarro ,&nbsp;Pedro A.A. Assuncao","doi":"10.1016/j.jvcir.2025.104629","DOIUrl":"10.1016/j.jvcir.2025.104629","url":null,"abstract":"<div><div>The rising demand for UHD and 360°content has driven the creation of advanced compression tools with enhanced coding efficiency. Versatile Video Coding (VVC) has recently improved coding efficiency over previous standards, but introduces significantly higher computational complexity. To address this, this paper presents a novel intra-coding method for 360°video in Equirectangular Projection (ERP) format that reduces complexity with minimal impact on coding efficiency. It shows that the North, Equator, and South regions of ERP images exhibit distinct complexity and spatial characteristics. A region-based approach uses multiple Gradient Boosted Trees models for each region to determine if a partition type can be skipped. Additionally, an adaptive decision threshold scheme is introduced to optimise vertical partitioning in polar regions. The paper also presents an optimisation solution for the Complexity/BD-Rate loss trade-off parameters. Experimental results demonstrate a 50% complexity gain with only a 0.37% BD-Rate loss, outperforming current state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104629"},"PeriodicalIF":3.1,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable-rate learned image compression with integer-arithmetic-only inference 基于纯整数推理的可变速率学习图像压缩
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-04 DOI: 10.1016/j.jvcir.2025.104634
Fan Ye, Li Li, Dong Liu
Learned image compression (LIC) achieves superior rate–distortion performance over traditional codecs but faces deployment challenges due to floating-point inconsistencies and high computational cost. Existing quantized LIC models are typically single-rate and lack support for variable-rate compression, limiting their adaptability. We propose a fully quantized variable-rate LIC framework that enables integer-only inference across all components. Our method introduces bitrate-specific quantization parameters to address rate-dependent activation variations. All computations — including weights, biases, activations, and nonlinearities — are performed using 8-bit integer operations such as multiplications, bit-shifts, and lookup tables. To further enhance hardware efficiency, we adopt per-layer quantization and reduce intermediate precision from 32-bit to 16-bit. Experiments show that our fully 8-bit quantized model reduces bitrate by 19.2% compared to VTM-17.2 intra coding on standard test sets. It also achieves 50.5% and 52.2% speedup in encoding and decoding, respectively, over its floating-point counterpart.
学习图像压缩(LIC)具有比传统编解码器更好的率失真性能,但由于浮点数不一致和较高的计算成本而面临部署挑战。现有的量化LIC模型通常是单速率的,缺乏对可变速率压缩的支持,限制了它们的适应性。我们提出了一个完全量化的可变速率LIC框架,使所有组件之间的纯整数推理成为可能。我们的方法引入了比特率特定的量化参数来处理与速率相关的激活变化。所有计算——包括权重、偏置、激活和非线性——都是使用8位整数运算(如乘法、位移位和查找表)执行的。为了进一步提高硬件效率,我们采用逐层量化,并将中间精度从32位降低到16位。实验表明,与标准测试集上的VTM-17.2编码相比,我们的全8位量化模型的码率降低了19.2%。与浮点型相比,它在编码和解码方面也分别实现了50.5%和52.2%的加速。
{"title":"Variable-rate learned image compression with integer-arithmetic-only inference","authors":"Fan Ye,&nbsp;Li Li,&nbsp;Dong Liu","doi":"10.1016/j.jvcir.2025.104634","DOIUrl":"10.1016/j.jvcir.2025.104634","url":null,"abstract":"<div><div>Learned image compression (LIC) achieves superior rate–distortion performance over traditional codecs but faces deployment challenges due to floating-point inconsistencies and high computational cost. Existing quantized LIC models are typically single-rate and lack support for variable-rate compression, limiting their adaptability. We propose a fully quantized variable-rate LIC framework that enables integer-only inference across all components. Our method introduces bitrate-specific quantization parameters to address rate-dependent activation variations. All computations — including weights, biases, activations, and nonlinearities — are performed using 8-bit integer operations such as multiplications, bit-shifts, and lookup tables. To further enhance hardware efficiency, we adopt per-layer quantization and reduce intermediate precision from 32-bit to 16-bit. Experiments show that our fully 8-bit quantized model reduces bitrate by 19.2% compared to VTM-17.2 intra coding on standard test sets. It also achieves 50.5% and 52.2% speedup in encoding and decoding, respectively, over its floating-point counterpart.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104634"},"PeriodicalIF":3.1,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSUFormer: Spatial–spectral UnetFormer for improving hyperspectral image classification SSUFormer:用于改进高光谱图像分类的空间光谱UnetFormer
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-04 DOI: 10.1016/j.jvcir.2025.104633
Thuan Minh Nguyen , Khoi Anh Bui , Myungsik Yoo
For hyperspectral image (HSI) classification, convolutional neural networks with a local kernel neglect the global HSI properties, and transformer networks often predict only the central pixel. This study proposes a spatial–spectral UnetFormer network to extract the full local and global spatial similarities and the long short-range spectral dependencies for HSI classification. This approach fuses a spectral transformer subnetwork and a spatial attention U-net subnetwork to create outputs. In the spectral subnetwork, the transformer is tailored at the embedding and head layers to generate a prediction for all input pixels. In the spatial attention U-net subnetwork, a local–global spatial feature model is introduced based on the U-net structure with a singular value decomposition-aided spatial self-attention module to emphasize useful details, mitigate the impact of noise, and eventually learn the global spatial features. The proposed model obtains competitive results with state-of-the-art methods in HSI classification on various public datasets.
对于高光谱图像(HSI)分类,具有局部核的卷积神经网络忽略了全局高光谱图像的特性,而变压器网络通常只预测中心像素。本研究提出了一种空间-光谱UnetFormer网络,用于提取完整的局部和全局空间相似性以及用于HSI分类的长短距离光谱依赖性。这种方法融合了一个频谱变压器子网和一个空间注意力U-net子网来创建输出。在频谱子网络中,变压器在嵌入层和头部层进行定制,以生成对所有输入像素的预测。在空间注意力U-net子网络中,引入基于U-net结构的局部-全局空间特征模型,利用奇异值分解辅助的空间自注意模块来强调有用的细节,减轻噪声的影响,最终学习全局空间特征。该模型在各种公共数据集的HSI分类中获得了与最先进的方法相竞争的结果。
{"title":"SSUFormer: Spatial–spectral UnetFormer for improving hyperspectral image classification","authors":"Thuan Minh Nguyen ,&nbsp;Khoi Anh Bui ,&nbsp;Myungsik Yoo","doi":"10.1016/j.jvcir.2025.104633","DOIUrl":"10.1016/j.jvcir.2025.104633","url":null,"abstract":"<div><div>For hyperspectral image (HSI) classification, convolutional neural networks with a local kernel neglect the global HSI properties, and transformer networks often predict only the central pixel. This study proposes a spatial–spectral UnetFormer network to extract the full local and global spatial similarities and the long short-range spectral dependencies for HSI classification. This approach fuses a spectral transformer subnetwork and a spatial attention U-net subnetwork to create outputs. In the spectral subnetwork, the transformer is tailored at the embedding and head layers to generate a prediction for all input pixels. In the spatial attention U-net subnetwork, a local–global spatial feature model is introduced based on the U-net structure with a singular value decomposition-aided spatial self-attention module to emphasize useful details, mitigate the impact of noise, and eventually learn the global spatial features. The proposed model obtains competitive results with state-of-the-art methods in HSI classification on various public datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104633"},"PeriodicalIF":3.1,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieval augmented generation for smart calorie estimation in complex food scenarios 复杂食物场景中智能卡路里估算的检索增强生成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-03 DOI: 10.1016/j.jvcir.2025.104632
Mayank Sah, Saurya Suman, Jimson Mathew
Accurate food recognition and calorie estimation are critical for managing diet-related health issues such as obesity and diabetes. Traditional food logging methods rely on manual input, leading to inaccurate nutritional records. Although recent advances in computer vision and deep learning offer automated solutions, existing models struggle with generalizability due to homogeneous datasets and limited representation of complex cuisines like Indian food. This paper introduces a dataset containing over 15,000 images of 56 popular Indian food items. Curated from diverse sources, including social media and real-world photography, the dataset aims to capture the complexity of Indian meals, where multiple food items often appear together in a single image. This ensures greater lighting, presentation, and image quality variability compared to existing data sets. We evaluated the data set with various YOLO-based models, including YOLOv5 through YOLOv12, and enhanced the backbone with omniscale feature learning from OSNet, improving detection accuracy. In addition, we integrate a Retrieval-Augmented-Generation (RAG) module with YOLO, which refines food identification by associating fine-grained food categories with nutritional information, ingredients, and recipes. Our approach demonstrates improved performance in recognizing complex meals. It addresses key challenges in food recognition, offering a scalable solution for accurate calorie estimation, especially for culturally diverse cuisines like Indian food.
准确的食物识别和卡路里估算对于管理与饮食相关的健康问题(如肥胖和糖尿病)至关重要。传统的食物记录方法依赖于人工输入,导致营养记录不准确。尽管计算机视觉和深度学习的最新进展提供了自动化解决方案,但由于数据集同质,并且对印度菜等复杂美食的代表性有限,现有模型难以泛化。本文介绍了一个数据集,其中包含56种受欢迎的印度食品的15,000多张图像。该数据集来自各种来源,包括社交媒体和现实世界的照片,旨在捕捉印度食物的复杂性,其中多种食物经常一起出现在一张图片中。与现有数据集相比,这确保了更大的照明、呈现和图像质量可变性。我们使用多种基于YOLOv5到YOLOv12的模型对数据集进行评估,并通过从OSNet学习全尺度特征来增强主干,提高检测精度。此外,我们还将检索增强生成(RAG)模块与YOLO集成在一起,该模块通过将细粒度食品类别与营养信息、成分和食谱相关联来改进食品识别。我们的方法在识别复杂膳食方面表现出更高的性能。它解决了食物识别中的关键挑战,为准确估计卡路里提供了可扩展的解决方案,特别是对于印度菜等文化多样化的美食。
{"title":"Retrieval augmented generation for smart calorie estimation in complex food scenarios","authors":"Mayank Sah,&nbsp;Saurya Suman,&nbsp;Jimson Mathew","doi":"10.1016/j.jvcir.2025.104632","DOIUrl":"10.1016/j.jvcir.2025.104632","url":null,"abstract":"<div><div>Accurate food recognition and calorie estimation are critical for managing diet-related health issues such as obesity and diabetes. Traditional food logging methods rely on manual input, leading to inaccurate nutritional records. Although recent advances in computer vision and deep learning offer automated solutions, existing models struggle with generalizability due to homogeneous datasets and limited representation of complex cuisines like Indian food. This paper introduces a dataset containing over 15,000 images of 56 popular Indian food items. Curated from diverse sources, including social media and real-world photography, the dataset aims to capture the complexity of Indian meals, where multiple food items often appear together in a single image. This ensures greater lighting, presentation, and image quality variability compared to existing data sets. We evaluated the data set with various YOLO-based models, including YOLOv5 through YOLOv12, and enhanced the backbone with omniscale feature learning from OSNet, improving detection accuracy. In addition, we integrate a Retrieval-Augmented-Generation (RAG) module with YOLO, which refines food identification by associating fine-grained food categories with nutritional information, ingredients, and recipes. Our approach demonstrates improved performance in recognizing complex meals. It addresses key challenges in food recognition, offering a scalable solution for accurate calorie estimation, especially for culturally diverse cuisines like Indian food.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"113 ","pages":"Article 104632"},"PeriodicalIF":3.1,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1