首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Iterative decoupling deconvolution network for image restoration 用于图像复原的迭代去耦解卷积网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104288

The iterative decoupled deblurring BM3D (IDDBM3D) (Danielyan et al., 2011) combines the analysis representation and the synthesis representation, where deblurring and denoising operations are decoupled, so that both problems can be easily solved. However, the IDDBM3D has some limitations. First, the analysis transformation and the synthesis transformation are analytical, thus have limited representation ability. Second, it is difficult to effectively remove image noise from threshold transformation. Third, there exists hyper-parameters to be tuned manually, which is difficult and time consuming. In this work, we propose an iterative decoupling deconvolution network(IDDNet), by unrolling the iterative decoupling algorithm of the IDDBM3D. In the proposed IDDNet, the analysis/synthesis transformation are implemented by encoder/decoder modules; the denoising is implemented by convolutional neural network based denoiser; the hyper-parameters are estimated by hyper-parameter module. We apply our models for image deblurring and super-resolution. Experimental results show that the IDDNet significantly outperforms the state-of-the-art unfolding networks.

迭代解耦去毛刺 BM3D(IDDBM3D)(Danielyan 等人,2011 年)结合了分析表示法和合成表示法,将去毛刺和去噪操作解耦,从而可以轻松解决这两个问题。然而,IDDBM3D 也有一些局限性。首先,分析变换和合成变换都是解析变换,因此表示能力有限。其次,阈值变换难以有效去除图像噪声。第三,存在需要手动调整的超参数,这既困难又耗时。在这项工作中,我们通过展开 IDDBM3D 的迭代解耦解卷网络(IDDNet),提出了一种迭代解耦解卷网络。在提出的 IDDNet 中,分析/合成转换由编码器/解码器模块实现;去噪由基于卷积神经网络的去噪器实现;超参数由超参数模块估计。我们将模型用于图像去模糊和超分辨率。实验结果表明,IDDNet 明显优于最先进的展开网络。
{"title":"Iterative decoupling deconvolution network for image restoration","authors":"","doi":"10.1016/j.jvcir.2024.104288","DOIUrl":"10.1016/j.jvcir.2024.104288","url":null,"abstract":"<div><p>The iterative decoupled deblurring BM3D (IDDBM3D) (Danielyan et al., 2011) combines the analysis representation and the synthesis representation, where deblurring and denoising operations are decoupled, so that both problems can be easily solved. However, the IDDBM3D has some limitations. First, the analysis transformation and the synthesis transformation are analytical, thus have limited representation ability. Second, it is difficult to effectively remove image noise from threshold transformation. Third, there exists hyper-parameters to be tuned manually, which is difficult and time consuming. In this work, we propose an iterative decoupling deconvolution network(IDDNet), by unrolling the iterative decoupling algorithm of the IDDBM3D. In the proposed IDDNet, the analysis/synthesis transformation are implemented by encoder/decoder modules; the denoising is implemented by convolutional neural network based denoiser; the hyper-parameters are estimated by hyper-parameter module. We apply our models for image deblurring and super-resolution. Experimental results show that the IDDNet significantly outperforms the state-of-the-art unfolding networks.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LG-AKD: Application of a lightweight GCN model based on adversarial knowledge distillation to skeleton action recognition LG-AKD:基于对抗知识提炼的轻量级 GCN 模型在骨骼动作识别中的应用
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104286

Human action recognition, a pivotal topic in computer vision, is a highly complex and challenging task. It requires the analysis of not only spatial dependencies of targets but also temporal changes in these targets. In recent decades, the advancement of deep learning has led to the development of numerous action recognition methods based on deep neural networks. Given that the skeleton points of the human body can be treated as a graph structure, graph neural networks (GNNs) have emerged as an effective tool for modeling such data, garnering significant interest from researchers. This paper aims to address the issue of low test speed caused by over-complicated deep graph convolutional models. To achieve this, we compress the network structure using knowledge distillation from a teacher-student architecture, leading to a compact and lightweight student GNN. To enhance the model’s robustness and generalization capabilities, we introduce a data augmentation mechanism that generates diverse action sequences while maintaining consistent behavior labels, thereby providing a more comprehensive learning basis for the model. The proposed model integrates three distinct knowledge learning paths: teacher networks, original datasets, and derived data. The fusion of knowledge distillation and data augmentation enables lightweight student networks to outperform their teacher networks in terms of both performance and efficiency. Experimental results demonstrate the efficacy of our approach in the context of skeleton-based human action recognition, highlighting its potential to simplify state-of-the-art models while enhancing their performance.

人类动作识别是计算机视觉领域的一个重要课题,是一项高度复杂且极具挑战性的任务。它不仅需要分析目标的空间依赖性,还需要分析这些目标的时间变化。近几十年来,随着深度学习的发展,基于深度神经网络的动作识别方法层出不穷。鉴于人体骨架点可被视为图结构,图神经网络(GNN)已成为此类数据建模的有效工具,引起了研究人员的极大兴趣。本文旨在解决过于复杂的深度图卷积模型造成的测试速度低的问题。为此,我们利用从教师-学生架构中提炼出的知识来压缩网络结构,从而形成了紧凑轻量级的学生 GNN。为了增强模型的鲁棒性和泛化能力,我们引入了一种数据增强机制,在生成多样化动作序列的同时保持一致的行为标签,从而为模型提供更全面的学习基础。所提出的模型整合了三种不同的知识学习路径:教师网络、原始数据集和衍生数据。知识提炼和数据增强的融合使轻量级学生网络在性能和效率上都优于教师网络。实验结果证明了我们的方法在基于骨骼的人类动作识别中的有效性,突出了它在简化最先进模型的同时提高其性能的潜力。
{"title":"LG-AKD: Application of a lightweight GCN model based on adversarial knowledge distillation to skeleton action recognition","authors":"","doi":"10.1016/j.jvcir.2024.104286","DOIUrl":"10.1016/j.jvcir.2024.104286","url":null,"abstract":"<div><p>Human action recognition, a pivotal topic in computer vision, is a highly complex and challenging task. It requires the analysis of not only spatial dependencies of targets but also temporal changes in these targets. In recent decades, the advancement of deep learning has led to the development of numerous action recognition methods based on deep neural networks. Given that the skeleton points of the human body can be treated as a graph structure, graph neural networks (GNNs) have emerged as an effective tool for modeling such data, garnering significant interest from researchers. This paper aims to address the issue of low test speed caused by over-complicated deep graph convolutional models. To achieve this, we compress the network structure using knowledge distillation from a teacher-student architecture, leading to a compact and lightweight student GNN. To enhance the model’s robustness and generalization capabilities, we introduce a data augmentation mechanism that generates diverse action sequences while maintaining consistent behavior labels, thereby providing a more comprehensive learning basis for the model. The proposed model integrates three distinct knowledge learning paths: teacher networks, original datasets, and derived data. The fusion of knowledge distillation and data augmentation enables lightweight student networks to outperform their teacher networks in terms of both performance and efficiency. Experimental results demonstrate the efficacy of our approach in the context of skeleton-based human action recognition, highlighting its potential to simplify state-of-the-art models while enhancing their performance.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-capacity reversible data hiding in encrypted images based on adaptive block coding selection 基于自适应块编码选择的加密图像中的高容量可逆数据隐藏
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104291

Recently, data hiding techniques have flourished and addressed various challenges. However, reversible data hiding for encrypted images (RDHEI) using vacating room after encryption (VRAE) framework often falls short in terms of data embedding performance. To address this issue, this paper proposes a novel and high-capacity data hiding method based on adaptive block coding selection. Specifically, iterative encryption and block permutation are applied during image encryption to maintain high pixel correlation within blocks. For each block in the encrypted image, both entropy coding and zero-valued high bit-planes compression coding are pre-applied, then the coding method that vacates the most space is selected, leveraging the strengths of both coding techniques to maximize the effective embeddable room of each encrypted block. This adaptive block coding selection mechanism is suitable for images with varying characteristics. Extensive experiments demonstrate that the proposed VRAE-based method outperforms some state-of-the-art RDHEI methods in data embedding capacity. The average embedding rates (ERs) of the proposed method for three publicly-used datasets including BOSSbase, BOWS-2 and UCID, are 4.041 bpp, 3.929 bpp, and 3.181 bpp, respectively.

近年来,数据隐藏技术蓬勃发展,解决了各种难题。然而,使用加密后腾出空间(VRAE)框架的加密图像可逆数据隐藏(RDHEI)在数据嵌入性能方面往往存在不足。针对这一问题,本文提出了一种基于自适应块编码选择的新型高容量数据隐藏方法。具体来说,在图像加密过程中采用迭代加密和区块排列,以保持区块内像素的高度相关性。对于加密图像中的每个区块,预先应用熵编码和零值高位面压缩编码,然后选择腾出空间最大的编码方法,充分利用两种编码技术的优势,最大限度地提高每个加密区块的有效可嵌入空间。这种自适应块编码选择机制适用于具有不同特征的图像。大量实验证明,基于 VRAE 的方法在数据嵌入能力方面优于一些最先进的 RDHEI 方法。在三个公开使用的数据集(包括 BOSSbase、BOWS-2 和 UCID)中,拟议方法的平均嵌入率(ER)分别为 4.041 bpp、3.929 bpp 和 3.181 bpp。
{"title":"High-capacity reversible data hiding in encrypted images based on adaptive block coding selection","authors":"","doi":"10.1016/j.jvcir.2024.104291","DOIUrl":"10.1016/j.jvcir.2024.104291","url":null,"abstract":"<div><p>Recently, data hiding techniques have flourished and addressed various challenges. However, reversible data hiding for encrypted images (RDHEI) using vacating room after encryption (VRAE) framework often falls short in terms of data embedding performance. To address this issue, this paper proposes a novel and high-capacity data hiding method based on adaptive block coding selection. Specifically, iterative encryption and block permutation are applied during image encryption to maintain high pixel correlation within blocks. For each block in the encrypted image, both entropy coding and zero-valued high bit-planes compression coding are pre-applied, then the coding method that vacates the most space is selected, leveraging the strengths of both coding techniques to maximize the effective embeddable room of each encrypted block. This adaptive block coding selection mechanism is suitable for images with varying characteristics. Extensive experiments demonstrate that the proposed VRAE-based method outperforms some state-of-the-art RDHEI methods in data embedding capacity. The average embedding rates (ERs) of the proposed method for three publicly-used datasets including BOSSbase, BOWS-2 and UCID, are 4.041 bpp, 3.929 bpp, and 3.181 bpp, respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved semantic-guided network for skeleton-based action recognition 用于基于骨骼的动作识别的改进型语义引导网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-02 DOI: 10.1016/j.jvcir.2024.104281

A fundamental issue in skeleton-based action recognition is the extraction of useful features from skeleton joints. Unfortunately, the current state-of-the-art models for this task have a tendency to be overly complex and parameterized, which results in low model training and inference time efficiency for large-scale datasets. In this work, we develop a simple but yet an efficient baseline for skeleton-based Human Action Recognition (HAR). The architecture is based on adaptive GCNs (Graph Convolutional Networks) to capture the complex interconnections within skeletal structures automatically without the need of a predefined topology. The GCNs are followed and empowered with an attention mechanism to learn more informative representations. This paper reports interesting accuracy on a large-scale dataset NTU-RGB+D 60, 89.7% and 95.0% on respectively Cross-Subject, and Cross-View benchmarks. On NTU-RGB+D 120, 84.6% and 85.8% over Cross-Subject and Cross-Setup settings, respectively. This work provides an improvement of the existing model SGN (Semantic-Guided Neural Networks) when extracting more discriminant spatial and temporal features.

基于骨骼的动作识别的一个基本问题是从骨骼关节中提取有用的特征。遗憾的是,目前用于这一任务的最先进模型往往过于复杂和参数化,导致大规模数据集的模型训练和推理效率低下。在这项工作中,我们为基于骨骼的人类动作识别(HAR)开发了一个简单但高效的基线。该架构基于自适应 GCN(图卷积网络),无需预定义拓扑结构即可自动捕捉骨骼结构内部的复杂互连。GCNs 被关注机制跟踪和授权,以学习更多信息表征。本文报告了在大型数据集 NTU-RGB+D 60 上令人感兴趣的准确率,在跨主体和跨视图基准上的准确率分别为 89.7% 和 95.0%。在 NTU-RGB+D 120 上,跨主体和跨设置设置的准确率分别为 84.6% 和 85.8%。在提取更具区分性的空间和时间特征时,这项工作对现有模型 SGN(语义引导神经网络)进行了改进。
{"title":"Improved semantic-guided network for skeleton-based action recognition","authors":"","doi":"10.1016/j.jvcir.2024.104281","DOIUrl":"10.1016/j.jvcir.2024.104281","url":null,"abstract":"<div><p>A fundamental issue in skeleton-based action recognition is the extraction of useful features from skeleton joints. Unfortunately, the current state-of-the-art models for this task have a tendency to be overly complex and parameterized, which results in low model training and inference time efficiency for large-scale datasets. In this work, we develop a simple but yet an efficient baseline for skeleton-based Human Action Recognition (HAR). The architecture is based on adaptive GCNs (Graph Convolutional Networks) to capture the complex interconnections within skeletal structures automatically without the need of a predefined topology. The GCNs are followed and empowered with an attention mechanism to learn more informative representations. This paper reports interesting accuracy on a large-scale dataset NTU-RGB+D 60, 89.7% and 95.0% on respectively Cross-Subject, and Cross-View benchmarks. On NTU-RGB+D 120, 84.6% and 85.8% over Cross-Subject and Cross-Setup settings, respectively. This work provides an improvement of the existing model SGN (Semantic-Guided Neural Networks) when extracting more discriminant spatial and temporal features.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324002372/pdfft?md5=13e62eaf463a376574412ad44a346dd4&pid=1-s2.0-S1047320324002372-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach for long-term secure storage of domain independent videos 一种长期安全存储独立于领域的视频的新方法
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-02 DOI: 10.1016/j.jvcir.2024.104279

Long-term protection of multimedia contents is a complex task, especially when the video has critical elements. It demands sophisticated technology to ensure confidentiality. In this paper, we propose a blended approach which uses proactive visual cryptography scheme along with video summarization techniques to circumvent the aforementioned issues. Proactive visual cryptography is used to protect digital data by updating periodically or renewing the shares, which are stored in different servers. And, video summarization schemes are useful in various scenarios where memory is a major concern. We use a domain independent scheme for summarizing videos and is applicable to both edited and unedited videos. In our scheme, the visual continuity of the raw video is preserved even after summarization. The original video can be reconstructed through the shares using auxiliary data, which was generated during video summarization phase. The mathematical studies and experimental results demonstrate the applicability of our proposed method.

多媒体内容的长期保护是一项复杂的任务,尤其是当视频包含关键元素时。它需要复杂的技术来确保保密性。在本文中,我们提出了一种混合方法,使用主动可视化加密方案和视频摘要技术来规避上述问题。主动可视化加密技术通过定期更新或更新存储在不同服务器中的份额来保护数字数据。视频摘要方案适用于内存是主要问题的各种场景。我们采用了一种独立于领域的视频摘要方案,既适用于经过编辑的视频,也适用于未经编辑的视频。在我们的方案中,即使经过总结,原始视频的视觉连续性也会得到保留。原始视频可以通过共享视频摘要阶段生成的辅助数据重建。数学研究和实验结果证明了我们提出的方法的适用性。
{"title":"A novel approach for long-term secure storage of domain independent videos","authors":"","doi":"10.1016/j.jvcir.2024.104279","DOIUrl":"10.1016/j.jvcir.2024.104279","url":null,"abstract":"<div><p>Long-term protection of multimedia contents is a complex task, especially when the video has critical elements. It demands sophisticated technology to ensure confidentiality. In this paper, we propose a blended approach which uses proactive visual cryptography scheme along with video summarization techniques to circumvent the aforementioned issues. Proactive visual cryptography is used to protect digital data by updating periodically or renewing the shares, which are stored in different servers. And, video summarization schemes are useful in various scenarios where memory is a major concern. We use a domain independent scheme for summarizing videos and is applicable to both edited and unedited videos. In our scheme, the visual continuity of the raw video is preserved even after summarization. The original video can be reconstructed through the shares using auxiliary data, which was generated during video summarization phase. The mathematical studies and experimental results demonstrate the applicability of our proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142163278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VTPL: Visual and text prompt learning for visual-language models VTPL:视觉语言模型的视觉和文本提示学习
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-02 DOI: 10.1016/j.jvcir.2024.104280

Visual-language (V-L) models have achieved remarkable success in learning combined visual–textual representations from large web datasets. Prompt learning, as a solution for downstream tasks, can address the forgetting of knowledge associated with fine-tuning. However, current methods focus on a single modality and fail to fully use multimodal information. This paper aims to address these limitations by proposing a novel approach called visual and text prompt learning (VTPL) to train the model and enhance both visual and text prompts. Visual prompts align visual features with text features, whereas text prompts enrich the semantic information of the text. Additionally, this paper introduces a poly-1 information noise contrastive estimation (InfoNCE) loss and a center loss to increase the interclass distance and decrease the intraclass distance. Experiments on 11 image datasets show that VTPL outperforms state-of-the-art methods, achieving 1.61%, 1.63%, 1.99%, 2.42%, and 2.87% performance boosts over CoOp for 1, 2, 4, 8, and 16 shots, respectively.

视觉语言(V-L)模型在从大型网络数据集中学习视觉与文本相结合的表征方面取得了显著的成功。提示学习作为下游任务的一种解决方案,可以解决与微调相关的知识遗忘问题。然而,目前的方法只关注单一模态,未能充分利用多模态信息。本文旨在通过提出一种名为视觉和文本提示学习(VTPL)的新方法来训练模型并增强视觉和文本提示,从而解决这些局限性。视觉提示使视觉特征与文本特征相一致,而文本提示则丰富了文本的语义信息。此外,本文还引入了多-1 信息噪音对比估计(InfoNCE)损失和中心损失,以增加类间距离,减少类内距离。在 11 个图像数据集上进行的实验表明,VTPL 的性能优于最先进的方法,在 1、2、4、8 和 16 个镜头上分别比 CoOp 提高了 1.61%、1.63%、1.99%、2.42% 和 2.87%。
{"title":"VTPL: Visual and text prompt learning for visual-language models","authors":"","doi":"10.1016/j.jvcir.2024.104280","DOIUrl":"10.1016/j.jvcir.2024.104280","url":null,"abstract":"<div><p>Visual-language (V-L) models have achieved remarkable success in learning combined visual–textual representations from large web datasets. Prompt learning, as a solution for downstream tasks, can address the forgetting of knowledge associated with fine-tuning. However, current methods focus on a single modality and fail to fully use multimodal information. This paper aims to address these limitations by proposing a novel approach called visual and text prompt learning (VTPL) to train the model and enhance both visual and text prompts. Visual prompts align visual features with text features, whereas text prompts enrich the semantic information of the text. Additionally, this paper introduces a poly-1 information noise contrastive estimation (InfoNCE) loss and a center loss to increase the interclass distance and decrease the intraclass distance. Experiments on 11 image datasets show that VTPL outperforms state-of-the-art methods, achieving 1.61%, 1.63%, 1.99%, 2.42%, and 2.87% performance boosts over CoOp for 1, 2, 4, 8, and 16 shots, respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAFLFusionGait: Gait recognition network with separate attention and different granularity feature learnability fusion SAFLFusionGait:具有独立注意力和不同粒度特征可学性融合的步态识别网络
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-01 DOI: 10.1016/j.jvcir.2024.104284

Gait recognition, an essential branch of biometric identification, uses walking patterns to identify individuals. Despite its effectiveness, gait recognition faces challenges such as vulnerability to changes in appearance due to factors like angles and clothing conditions. Recent progress in deep learning has greatly enhanced gait recognition, especially through methods like deep convolutional neural networks, which demonstrate impressive performance. However, current approaches often overlook the connection between coarse-grained and fine-grained features, thereby restricting their overall effectiveness. To address this limitation, we propose a new framework for gait recognition framework that combines deep-supervised fine-grained separation with coarse-grained feature learnability. Our framework includes the LFF module, which consists of the SSeg module for fine-grained information extraction and a mechanism for fusing coarse-grained features. Furthermore, we introduce the F-LCM module to extract local disparity features more effectively with learnable weights. Evaluation on CASIA-B and OU-MVLP datasets shows superior performance compared to classical networks.

步态识别是生物识别的一个重要分支,它利用行走模式来识别个人。尽管步态识别非常有效,但它也面临着一些挑战,例如容易受到角度和服装条件等因素的影响而发生外观变化。深度学习的最新进展极大地增强了步态识别能力,特别是通过深度卷积神经网络等方法,表现出了令人印象深刻的性能。然而,目前的方法往往忽略了粗粒度特征与细粒度特征之间的联系,从而限制了其整体效果。为了解决这一局限性,我们提出了一种新的步态识别框架,它将深度监督的细粒度分离与粗粒度特征可学习性相结合。我们的框架包括 LFF 模块,该模块由用于细粒度信息提取的 SSeg 模块和粗粒度特征融合机制组成。此外,我们还引入了 F-LCM 模块,利用可学习权重更有效地提取局部差异特征。在 CASIA-B 和 OU-MVLP 数据集上进行的评估表明,与传统网络相比,该方法的性能更加优越。
{"title":"SAFLFusionGait: Gait recognition network with separate attention and different granularity feature learnability fusion","authors":"","doi":"10.1016/j.jvcir.2024.104284","DOIUrl":"10.1016/j.jvcir.2024.104284","url":null,"abstract":"<div><p>Gait recognition, an essential branch of biometric identification, uses walking patterns to identify individuals. Despite its effectiveness, gait recognition faces challenges such as vulnerability to changes in appearance due to factors like angles and clothing conditions. Recent progress in deep learning has greatly enhanced gait recognition, especially through methods like deep convolutional neural networks, which demonstrate impressive performance. However, current approaches often overlook the connection between coarse-grained and fine-grained features, thereby restricting their overall effectiveness. To address this limitation, we propose a new framework for gait recognition framework that combines deep-supervised fine-grained separation with coarse-grained feature learnability. Our framework includes the LFF module, which consists of the SSeg module for fine-grained information extraction and a mechanism for fusing coarse-grained features. Furthermore, we introduce the F-LCM module to extract local disparity features more effectively with learnable weights. Evaluation on CASIA-B and OU-MVLP datasets shows superior performance compared to classical networks.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind image deblurring with a difference of the mixed anisotropic and mixed isotropic total variation regularization 采用混合各向异性和混合各向同性总变化正则化差异进行盲图像去模糊处理
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-31 DOI: 10.1016/j.jvcir.2024.104285

This paper proposes a simple model for image deblurring with a new total variation regularization. Classically, the L1-21 regularizer represents a difference of anisotropic (i.e. L1) and isotropic (i.e. L21) total variation, so we define a new regularization as Le-2e, which is the weighted difference of the mixed anisotropic (i.e. L0 + L1 = Le) and mixed isotropic (i.e. L0 + L21 = L2e), and it is characterized by sparsity-promoting and robustness in image deblurring. Then, we merge the L0-gradient into the model for edge-preserving and detail-removing. The union of the Le-2e regularization and L0-gradient improves the performance of image deblurring and yields high-quality blur kernel estimates. Finally, we design a new solution format that alternately iterates the difference of convex algorithm, the split Bregman method, and the approach of half-quadratic splitting to optimize the proposed model. Experimental results on quantitative datasets and real-world images show that the proposed method can obtain results comparable to state-of-the-art works.

本文提出了一种利用新的总变化正则化进行图像去模糊的简单模型。通常,L1-21 正则化表示各向异性(即 L1)和各向同性(即 L21)总变化的差值,因此我们定义了一种新的正则化为 Le-2e,它是混合各向异性(即 L0 + L1 = Le)和混合各向同性(即 L0 + L21 = L2e)的加权差值,在图像去模糊中具有促进稀疏性和鲁棒性的特点。然后,我们将 L0 梯度合并到模型中,以实现边缘保留和细节去除。Le-2e 正则化和 L0-gradient 的结合提高了图像去模糊的性能,并得到了高质量的模糊核估计值。最后,我们设计了一种新的求解格式,交替迭代凸算法差分、分裂布雷格曼方法和半二次分裂方法,以优化所提出的模型。在定量数据集和真实世界图像上的实验结果表明,所提出的方法可以获得与最先进方法相媲美的结果。
{"title":"Blind image deblurring with a difference of the mixed anisotropic and mixed isotropic total variation regularization","authors":"","doi":"10.1016/j.jvcir.2024.104285","DOIUrl":"10.1016/j.jvcir.2024.104285","url":null,"abstract":"<div><p>This paper proposes a simple model for image deblurring with a new total variation regularization. Classically, the <em>L</em><sub>1-21</sub> regularizer represents a difference of anisotropic (i.e. <em>L</em><sub>1</sub>) and isotropic (i.e. <em>L</em><sub>21</sub>) total variation, so we define a new regularization as <em>L</em><sub>e-2e</sub>, which is the weighted difference of the mixed anisotropic (i.e. <em>L</em><sub>0</sub> + <em>L</em><sub>1</sub> = <em>L</em><sub>e</sub>) and mixed isotropic (i.e. <em>L</em><sub>0</sub> + <em>L</em><sub>21</sub> = <em>L</em><sub>2e</sub>), and it is characterized by sparsity-promoting<!--> <!-->and robustness in image deblurring. Then, we merge the <em>L</em><sub>0</sub>-gradient into the model for edge-preserving and detail-removing. The union of the <em>L</em><sub>e-2e</sub> regularization and <em>L</em><sub>0</sub>-gradient improves the performance of image deblurring and yields high-quality blur kernel estimates. Finally, we design a new solution format that alternately iterates the difference of convex algorithm, the split Bregman method, and the approach of half-quadratic splitting to optimize the proposed model. Experimental results on quantitative datasets and real-world images show that the proposed method can obtain results comparable to state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secret image sharing with distinct covers based on improved Cycling-XOR 基于改进型循环-XOR 的不同封面秘密图像共享
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-31 DOI: 10.1016/j.jvcir.2024.104282

Secret image sharing (SIS) is a technique used to distribute confidential data by dividing it into multiple image shadows. Most of the existing approaches or algorithms protect confidential data by encryption with secret keys. This paper proposes a novel SIS scheme without using any secret key. The secret images are first quantized and encrypted by self-encryption into noisy ones. Then, the encrypted images are mixed into secret shares by cross-encryption. The image shadows are generated by replacing the lower bit-planes of the cover images with the secret shares. In the extraction phase, the receiver can restore the quantized secret images by combinatorial operations of the extracted secret shares. Experimental results show that our method is able to deliver a large amount of data payload with a satisfactory cover image quality. Besides, the computational load is very low since the whole scheme is mostly based on cycling-XOR operations.

秘密图像共享(SIS)是一种通过将机密数据分割成多个图像阴影来分发机密数据的技术。现有的大多数方法或算法都是通过使用秘钥加密来保护机密数据的。本文提出了一种无需使用任何秘钥的新型 SIS 方案。秘密图像首先被量化,并通过自加密技术加密成噪声图像。然后,通过交叉加密将加密图像混合成秘密份额。用秘密份额替换封面图像的低位平面,生成图像阴影。在提取阶段,接收方可以通过对提取的秘密份额进行组合运算来还原量化的秘密图像。实验结果表明,我们的方法能够以令人满意的覆盖图像质量提供大量数据有效载荷。此外,由于整个方案主要基于循环-XOR 运算,因此计算负荷非常低。
{"title":"Secret image sharing with distinct covers based on improved Cycling-XOR","authors":"","doi":"10.1016/j.jvcir.2024.104282","DOIUrl":"10.1016/j.jvcir.2024.104282","url":null,"abstract":"<div><p>Secret image sharing (SIS) is a technique used to distribute confidential data by dividing it into multiple image shadows. Most of the existing approaches or algorithms protect confidential data by encryption with secret keys. This paper proposes a novel SIS scheme without using any secret key. The secret images are first quantized and encrypted by self-encryption into noisy ones. Then, the encrypted images are mixed into secret shares by cross-encryption. The image shadows are generated by replacing the lower bit-planes of the cover images with the secret shares. In the extraction phase, the receiver can restore the quantized secret images by combinatorial operations of the extracted secret shares. Experimental results show that our method is able to deliver a large amount of data payload with a satisfactory cover image quality. Besides, the computational load is very low since the whole scheme is mostly based on cycling-XOR operations.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Background adaptive PosMarker based on online generation and detection for locating watermarked regions in photographs 基于在线生成和检测的背景自适应 PosMarker,用于定位照片中的水印区域
IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-28 DOI: 10.1016/j.jvcir.2024.104269

Robust watermarking technology can embed invisible messages in screens to trace the source of unauthorized screen photographs. Locating the four vertices of the embedded region in the photograph is necessary, as existing watermarking methods require geometric correction of the embedded region before revealing the message. Existing localization methods suffer from a performance trade-off: either causing unaesthetic visual quality by embedding visible markers or achieving poor localization precision, leading to message extraction failure. To address this issue, we propose a background adaptive position marker, PosMarker, based on the gray level co-occurrence matrix and the noise visibility function. Besides, we propose an online generation scheme that employs a learnable generator to cooperate with the detector, allowing joint optimization between the two. This simultaneously improves both visual quality and detection precision. Extensive experiments demonstrate the superior localization precision of our PosMarker-based method compared to others.

强大的水印技术可以在屏幕中嵌入隐形信息,以追踪未经授权的屏幕照片的来源。定位照片中嵌入区域的四个顶点是必要的,因为现有的水印方法需要在显示信息之前对嵌入区域进行几何校正。现有的定位方法需要在性能上进行权衡:要么嵌入可见标记,造成不美观的视觉质量;要么定位精度低,导致信息提取失败。为了解决这个问题,我们提出了一种基于灰度共生矩阵和噪声可见性函数的背景自适应位置标记 PosMarker。此外,我们还提出了一种在线生成方案,利用可学习生成器与检测器合作,实现两者之间的联合优化。这同时提高了视觉质量和检测精度。大量实验证明,与其他方法相比,我们基于 PosMarker 的方法具有更高的定位精度。
{"title":"Background adaptive PosMarker based on online generation and detection for locating watermarked regions in photographs","authors":"","doi":"10.1016/j.jvcir.2024.104269","DOIUrl":"10.1016/j.jvcir.2024.104269","url":null,"abstract":"<div><p>Robust watermarking technology can embed invisible messages in screens to trace the source of unauthorized screen photographs. Locating the four vertices of the embedded region in the photograph is necessary, as existing watermarking methods require geometric correction of the embedded region before revealing the message. Existing localization methods suffer from a performance trade-off: either causing unaesthetic visual quality by embedding visible markers or achieving poor localization precision, leading to message extraction failure. To address this issue, we propose a background adaptive position marker, PosMarker, based on the gray level co-occurrence matrix and the noise visibility function. Besides, we propose an online generation scheme that employs a learnable generator to cooperate with the detector, allowing joint optimization between the two. This simultaneously improves both visual quality and detection precision. Extensive experiments demonstrate the superior localization precision of our PosMarker-based method compared to others.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1