ACM Transactions on Multimedia Computing Communications and Applications最新文献_第6页

A Quality-Aware and Obfuscation-Based Data Collection Scheme for Cyber-Physical Metaverse Systems 网络物理元宇宙系统的质量意识和基于混淆的数据收集方案

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-16 DOI: 10.1145/3659582

Jianheng Tang, Kejia Fan, Wenjie Yin, Shihao Yang, Yajiang Huang, Anfeng Liu, Neal N. Xiong, Mianxiong Dong, Tian Wang, Shaobo Zhang

In pursuit of an immersive virtual experience within the Cyber-Physical Metaverse Systems (CPMS), the construction of Avatars often requires a significant amount of real-world data. Mobile Crowd Sensing (MCS) has emerged as an efficient method for collecting data for CPMS. While progress has been made in protecting the privacy of workers, little attention has been given to safeguarding task privacy, potentially exposing the intentions of applications and posing risks to the development of the Metaverse. Additionally, existing privacy protection schemes hinder the exchange of information among entities, inadvertently compromising the quality of the collected data. To this end, we propose a Quality-aware and Obfuscation-based Task Privacy-Preserving (QOTPP) scheme, which protects task privacy and enhances data quality without third-party involvement. The QOTPP scheme initially employs the insight of “showing the fake, and hiding the real” by employing differential privacy techniques to create fake tasks and conceal genuine ones. Additionally, we introduce a two-tier truth discovery mechanism using Deep Matrix Factorization (DMF) to efficiently identify high-quality workers. Furthermore, we propose a Combinatorial Multi-Armed Bandit (CMAB)-based worker incentive and selection mechanism to improve the quality of data collection. Theoretical analysis confirms that our QOTPP scheme satisfies essential properties such as truthfulness, individual rationality, and ϵ-differential privacy. Extensive simulation experiments validate the state-of-the-art performance achieved by QOTPP.

为了在网络物理元宇宙系统（CPMS）中获得身临其境的虚拟体验，"头像 "的构建往往需要大量的真实世界数据。移动人群感应（MCS）已成为为 CPMS 收集数据的有效方法。虽然在保护工作人员隐私方面取得了进展，但在保护任务隐私方面却关注甚少，这可能会暴露应用程序的意图，并给元宇宙的发展带来风险。此外，现有的隐私保护方案阻碍了实体间的信息交流，无意中损害了所收集数据的质量。为此，我们提出了一种基于质量感知和混淆的任务隐私保护（QOTPP）方案，它能在没有第三方参与的情况下保护任务隐私并提高数据质量。QOTPP 方案最初采用了 "以假乱真 "的观点，利用差分隐私技术创建虚假任务并隐藏真实任务。此外，我们还利用深度矩阵因式分解（DMF）引入了双层真相发现机制，以有效识别高质量的工人。此外，我们还提出了一种基于组合多臂匪徒（CMAB）的工人激励和选择机制，以提高数据收集的质量。理论分析证实，我们的 QOTPP 方案满足真实性、个体理性和ϵ-差分隐私等基本属性。广泛的模拟实验验证了 QOTPP 所达到的一流性能。

{"title":"A Quality-Aware and Obfuscation-Based Data Collection Scheme for Cyber-Physical Metaverse Systems","authors":"Jianheng Tang, Kejia Fan, Wenjie Yin, Shihao Yang, Yajiang Huang, Anfeng Liu, Neal N. Xiong, Mianxiong Dong, Tian Wang, Shaobo Zhang","doi":"10.1145/3659582","DOIUrl":"https://doi.org/10.1145/3659582","url":null,"abstract":"In pursuit of an immersive virtual experience within the Cyber-Physical Metaverse Systems (CPMS), the construction of Avatars often requires a significant amount of real-world data. Mobile Crowd Sensing (MCS) has emerged as an efficient method for collecting data for CPMS. While progress has been made in protecting the privacy of workers, little attention has been given to safeguarding task privacy, potentially exposing the intentions of applications and posing risks to the development of the Metaverse. Additionally, existing privacy protection schemes hinder the exchange of information among entities, inadvertently compromising the quality of the collected data. To this end, we propose a Quality-aware and Obfuscation-based Task Privacy-Preserving (QOTPP) scheme, which protects task privacy and enhances data quality without third-party involvement. The QOTPP scheme initially employs the insight of “showing the fake, and hiding the real” by employing differential privacy techniques to create fake tasks and conceal genuine ones. Additionally, we introduce a two-tier truth discovery mechanism using Deep Matrix Factorization (DMF) to efficiently identify high-quality workers. Furthermore, we propose a Combinatorial Multi-Armed Bandit (CMAB)-based worker incentive and selection mechanism to improve the quality of data collection. Theoretical analysis confirms that our QOTPP scheme satisfies essential properties such as truthfulness, individual rationality, and ϵ-differential privacy. Extensive simulation experiments validate the state-of-the-art performance achieved by QOTPP.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"63 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval 基于等级散列的高效图像检索近邻搜索

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-16 DOI: 10.1145/3659580

Vinicius Sato Kawai, Lucas Pascotti Valem, Alexandro Baldassin, Edson Borin, Daniel Carlos Guimarães Pedronette, Longin Jan Latecki

The large and growing amount of digital data creates a pressing need for approaches capable of indexing and retrieving multimedia content. A traditional and fundamental challenge consists of effectively and efficiently performing nearest-neighbor searches. After decades of research, several different methods are available, including trees, hashing, and graph-based approaches. Most of the current methods exploit learning to hash approaches based on deep learning. In spite of effective results and compact codes obtained, such methods often require a significant amount of labeled data for training. Unsupervised approaches also rely on expensive training procedures usually based on a huge amount of data. In this work, we propose an unsupervised data-independent approach for nearest neighbor searches, which can be used with different features, including deep features trained by transfer learning. The method uses a rank-based formulation and exploits a hashing approach for efficient ranked list computation at query time. A comprehensive experimental evaluation was conducted on 7 public datasets, considering deep features based on CNNs and Transformers. Both effectiveness and efficiency aspects were evaluated. The proposed approach achieves remarkable results in comparison to traditional and state-of-the-art methods. Hence, it is an attractive and innovative solution, especially when costly training procedures need to be avoided.

由于数字数据量庞大且不断增长，因此迫切需要能够索引和检索多媒体内容的方法。有效和高效地执行最近邻搜索是一项传统和基本的挑战。经过几十年的研究，目前已有几种不同的方法，包括树、散列和基于图的方法。目前的大多数方法都利用了基于深度学习的哈希学习方法。尽管这些方法能获得有效的结果和紧凑的代码，但通常需要大量的标注数据进行训练。无监督方法也依赖于通常基于海量数据的昂贵的训练程序。在这项工作中，我们提出了一种独立于数据的无监督近邻搜索方法，可用于不同的特征，包括通过迁移学习训练的深度特征。该方法采用基于等级的表述，并利用哈希方法在查询时高效计算等级列表。考虑到基于 CNN 和 Transformers 的深度特征，我们在 7 个公共数据集上进行了全面的实验评估。对有效性和效率两方面都进行了评估。与传统方法和最先进的方法相比，所提出的方法取得了显著的效果。因此，它是一种有吸引力的创新解决方案，尤其是在需要避免昂贵的训练程序时。

{"title":"Rank-based Hashing for Effective and Efficient Nearest Neighbor Search for Image Retrieval","authors":"Vinicius Sato Kawai, Lucas Pascotti Valem, Alexandro Baldassin, Edson Borin, Daniel Carlos Guimarães Pedronette, Longin Jan Latecki","doi":"10.1145/3659580","DOIUrl":"https://doi.org/10.1145/3659580","url":null,"abstract":"The large and growing amount of digital data creates a pressing need for approaches capable of indexing and retrieving multimedia content. A traditional and fundamental challenge consists of effectively and efficiently performing nearest-neighbor searches. After decades of research, several different methods are available, including trees, hashing, and graph-based approaches. Most of the current methods exploit learning to hash approaches based on deep learning. In spite of effective results and compact codes obtained, such methods often require a significant amount of labeled data for training. Unsupervised approaches also rely on expensive training procedures usually based on a huge amount of data. In this work, we propose an unsupervised data-independent approach for nearest neighbor searches, which can be used with different features, including deep features trained by transfer learning. The method uses a rank-based formulation and exploits a hashing approach for efficient ranked list computation at query time. A comprehensive experimental evaluation was conducted on 7 public datasets, considering deep features based on CNNs and Transformers. Both effectiveness and efficiency aspects were evaluated. The proposed approach achieves remarkable results in comparison to traditional and state-of-the-art methods. Hence, it is an attractive and innovative solution, especially when costly training procedures need to be avoided.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"17 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition 利用帧和特征级渐进增强技术实现半监督式动作识别

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-11 DOI: 10.1145/3655025

Zhewei Tu, Xiangbo Shu, Peng Huang, Rui Yan, Zhenxing Liu, Jiachao Zhang

Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F²-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F²-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F²-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.

半监督动作识别是一项具有挑战性而又前景广阔的任务，因为它对代价高昂的标记视频依赖性低。一种备受瞩目的解决方案是探索帧级弱/强增强，以学习丰富的表征，其灵感来自半监督图像分类任务中占主导地位的 FixMatch 框架。然而，这种解决方案主要带来了纹理和尺度方面的扰动，导致在具有时空冗余和复杂性的视频中学习动作表征时受到限制。因此，我们重新审视了 FixMatch 中弱/强增强的创造性技巧，然后提出了一种新颖的帧和特征级增强 FixMatch（称为 F2-FixMatch）框架，以学习更丰富的动作表征，从而鲁棒地应对复杂多变的视频场景。具体来说，我们设计了一种新的渐进增强（P-Aug）机制，首先在帧级实施弱/强增强，然后在特征级实施扰动，从而在更广阔的扰动空间中获得丰富的四类增强特征。此外，我们还提出了一种进化的多头伪标签（MPL）方案，以促进基于伪标签的不同增强版本的特征一致性。我们在多个公共数据集上进行了大量实验，证明与目前最先进的方法相比，我们的 F2-FixMatch 实现了性能提升。F2-FixMatch 的源代码可在 https://github.com/zwtu/F2FixMatch 上公开获取。

{"title":"Leveraging Frame- and Feature-Level Progressive Augmentation for Semi-supervised Action Recognition","authors":"Zhewei Tu, Xiangbo Shu, Peng Huang, Rui Yan, Zhenxing Liu, Jiachao Zhang","doi":"10.1145/3655025","DOIUrl":"https://doi.org/10.1145/3655025","url":null,"abstract":"Semi-supervised action recognition is a challenging yet prospective task due to its low reliance on costly labeled videos. One high-profile solution is to explore frame-level weak/strong augmentations for learning abundant representations, inspired by the FixMatch framework dominating the semi-supervised image classification task. However, such a solution mainly brings perturbations in terms of texture and scale, leading to the limitation in learning action representations in videos with spatiotemporal redundancy and complexity. Therefore, we revisit the creative trick of weak/strong augmentations in FixMatch, and then propose a novel Frame- and Feature-level augmentation FixMatch (dubbed as F2-FixMatch) framework to learn more abundant action representations for being robust to complex and dynamic video scenarios. Specifically, we design a new Progressive Augmentation (P-Aug) mechanism that implements the weak/strong augmentations first at the frame level, and further implements the perturbation at the feature level, to obtain abundant four types of augmented features in broader perturbation spaces. Moreover, we present an evolved Multihead Pseudo-Labeling (MPL) scheme to promote the consistency of features across different augmented versions based on the pseudo labels. We conduct extensive experiments on several public datasets to demonstrate that our F2-FixMatch achieves the performance gain compared with current state-of-the-art methods. The source codes of F2-FixMatch are publicly available at https://github.com/zwtu/F2FixMatch.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"36 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GANs in the Panorama of Synthetic Data Generation Methods: Application and Evaluation: Enhancing Fake News Detection with GAN-Generated Synthetic Data: ACM Transactions on Multimedia Computing, Communications, and Applications: Vol 0, No ja 合成数据生成方法全景中的 GANs：应用与评估利用 GAN 生成的合成数据加强假新闻检测》：ACM Transactions on Multimedia Computing, Communications, and Applications：Vol 0, No ja

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-10 DOI: 10.1145/3657294

Bruno Vaz, Álvaro Figueira

This paper focuses on the creation and evaluation of synthetic data to address the challenges of imbalanced datasets in machine learning applications (ML), using fake news detection as a case study. We conducted a thorough literature review on generative adversarial networks (GANs) for tabular data, synthetic data generation methods, and synthetic data quality assessment. By augmenting a public news dataset with synthetic data generated by different GAN architectures, we demonstrate the potential of synthetic data to improve ML models’ performance in fake news detection. Our results show a significant improvement in classification performance, especially in the underrepresented class. We also modify and extend a data usage approach to evaluate the quality of synthetic data and investigate the relationship between synthetic data quality and data augmentation performance in classification tasks. We found a positive correlation between synthetic data quality and performance in the underrepresented class, highlighting the importance of high-quality synthetic data for effective data augmentation.

本文以假新闻检测为例，重点介绍合成数据的创建和评估，以解决机器学习应用（ML）中不平衡数据集所带来的挑战。我们对表格数据生成对抗网络（GAN）、合成数据生成方法和合成数据质量评估进行了全面的文献综述。通过用不同 GAN 架构生成的合成数据来增强公共新闻数据集，我们展示了合成数据在提高 ML 模型的假新闻检测性能方面的潜力。我们的结果表明，分类性能有了显著提高，尤其是在代表性不足的类别中。我们还修改并扩展了数据使用方法，以评估合成数据的质量，并研究了分类任务中合成数据质量与数据增强性能之间的关系。我们发现，合成数据质量与代表性不足类别的性能之间存在正相关，这突出了高质量合成数据对于有效数据增强的重要性。

{"title":"GANs in the Panorama of Synthetic Data Generation Methods: Application and Evaluation: Enhancing Fake News Detection with GAN-Generated Synthetic Data: ACM Transactions on Multimedia Computing, Communications, and Applications: Vol 0, No ja","authors":"Bruno Vaz, Álvaro Figueira","doi":"10.1145/3657294","DOIUrl":"https://doi.org/10.1145/3657294","url":null,"abstract":"This paper focuses on the creation and evaluation of synthetic data to address the challenges of imbalanced datasets in machine learning applications (ML), using fake news detection as a case study. We conducted a thorough literature review on generative adversarial networks (GANs) for tabular data, synthetic data generation methods, and synthetic data quality assessment. By augmenting a public news dataset with synthetic data generated by different GAN architectures, we demonstrate the potential of synthetic data to improve ML models’ performance in fake news detection. Our results show a significant improvement in classification performance, especially in the underrepresented class. We also modify and extend a data usage approach to evaluate the quality of synthetic data and investigate the relationship between synthetic data quality and data augmentation performance in classification tasks. We found a positive correlation between synthetic data quality and performance in the underrepresented class, highlighting the importance of high-quality synthetic data for effective data augmentation.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"215 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DBGAN: Dual Branch Generative Adversarial Network for Multi-modal MRI Translation DBGAN：用于多模态核磁共振成像翻译的双分支生成对抗网络

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-10 DOI: 10.1145/3657298

Jun Lyu, Shouang Yan, M. Shamim Hossain

Existing Magnetic resonance imaging (MRI) translation models rely on Generative Adversarial Networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships within MRI images. While the advent of Transformers enables capturing long-range feature dependencies, they often compromise the preservation of local feature details. To address these limitations and enhance both local and global representations, we introduce a novel Dual-Branch Generative Adversarial Network (DBGAN). In this framework, the Transformer branch comprises sparse attention blocks and dense self-attention blocks, allowing for a wider receptive field while simultaneously capturing local and global information. The CNN branch, built with integrated residual convolutional layers, enhances local modeling capabilities. Additionally, we propose a fusion module that cleverly integrates features extracted from both branches. Extensive experimentation on two public datasets and one clinical dataset validates significant performance improvements with DBGAN. On Brats2018, it achieves a 10%improvement in MAE, 3.2% in PSNR, and 4.8% in SSIM for image generation tasks compablack to RegGAN. Notably, the generated MRIs receive positive feedback from radiologists, underscoring the potential of our proposed method as a valuable tool in clinical settings.

现有的磁共振成像（MRI）翻译模型依赖于生成对抗网络，主要采用简单的卷积神经网络。遗憾的是，这些网络难以捕捉 MRI 图像中的全局表征和上下文关系。虽然变形器的出现能够捕捉远距离特征依赖关系，但它们往往会影响局部特征细节的保存。为了解决这些局限性并增强局部和全局表征，我们引入了一种新型双分支生成对抗网络（DBGAN）。在这一框架中，变换器分支由稀疏的注意块和密集的自注意块组成，从而在捕捉局部和全局信息的同时，获得更广阔的感受野。CNN 分支由集成的残差卷积层构成，增强了局部建模能力。此外，我们还提出了一个融合模块，巧妙地整合了从两个分支中提取的特征。在两个公共数据集和一个临床数据集上进行的广泛实验验证了 DBGAN 的显著性能提升。在 Brats2018 上，与 RegGAN 相比，在图像生成任务中，DBGAN 的 MAE 提高了 10%，PSNR 提高了 3.2%，SSIM 提高了 4.8%。值得注意的是，生成的核磁共振图像得到了放射科医生的积极反馈，这凸显了我们提出的方法作为一种有价值的临床工具的潜力。

{"title":"DBGAN: Dual Branch Generative Adversarial Network for Multi-modal MRI Translation","authors":"Jun Lyu, Shouang Yan, M. Shamim Hossain","doi":"10.1145/3657298","DOIUrl":"https://doi.org/10.1145/3657298","url":null,"abstract":"Existing Magnetic resonance imaging (MRI) translation models rely on Generative Adversarial Networks, primarily employing simple convolutional neural networks. Unfortunately, these networks struggle to capture global representations and contextual relationships within MRI images. While the advent of Transformers enables capturing long-range feature dependencies, they often compromise the preservation of local feature details. To address these limitations and enhance both local and global representations, we introduce a novel Dual-Branch Generative Adversarial Network (DBGAN). In this framework, the Transformer branch comprises sparse attention blocks and dense self-attention blocks, allowing for a wider receptive field while simultaneously capturing local and global information. The CNN branch, built with integrated residual convolutional layers, enhances local modeling capabilities. Additionally, we propose a fusion module that cleverly integrates features extracted from both branches. Extensive experimentation on two public datasets and one clinical dataset validates significant performance improvements with DBGAN. On Brats2018, it achieves a 10%improvement in MAE, 3.2% in PSNR, and 4.8% in SSIM for image generation tasks compablack to RegGAN. Notably, the generated MRIs receive positive feedback from radiologists, underscoring the potential of our proposed method as a valuable tool in clinical settings.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"68 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Action Segmentation SigFormer：用于多模态动作分割的稀疏信号引导变换器

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-10 DOI: 10.1145/3657296

Qi Liu, Xinchen Liu, Kun Liu, Xiaoyan Gu, Wu Liu

Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential contributions of sparse IoT sensor signals, which can be crucial for achieving accurate recognition, have not been fully explored. To make up for this, we introduce a Sparse signal-guided Transformer (SigFormer) to combine both dense and sparse signals. We employ mask attention to fuse localized features by constraining cross-attention within the regions where sparse signals are valid. However, since sparse signals are discrete, they lack sufficient information about the temporal action boundaries. Therefore, in SigFormer, we propose to emphasize the boundary information at two stages to alleviate this problem. In the first feature extraction stage, we introduce an intermediate bottleneck module to jointly learn both category and boundary features of each dense modality through the inner loss functions. After the fusion of dense modalities and sparse signals, we then devise a two-branch architecture that explicitly models the interrelationship between action category and temporal boundary. Experimental results demonstrate that SigFormer outperforms the state-of-the-art approaches on a multi-modal action segmentation dataset from real industrial environments, reaching an outstanding F1 score of 0.958. The codes and pre-trained models have been available at https://github.com/LIUQI-creat/SigFormer.

多模态人体动作分割是一项关键而具有挑战性的任务，其应用范围十分广泛。目前，大多数方法都集中在密集信号（即 RGB、光流和深度图）的融合上。然而，稀疏物联网传感器信号的潜在贡献尚未得到充分挖掘，而这些信号对于实现准确识别至关重要。为了弥补这一不足，我们引入了稀疏信号引导变换器（SigFormer）来结合密集信号和稀疏信号。我们通过限制稀疏信号有效区域内的交叉注意，利用掩码注意来融合局部特征。然而，由于稀疏信号是离散的，它们缺乏有关时间动作边界的足够信息。因此，在 SigFormer 中，我们建议在两个阶段强调边界信息，以缓解这一问题。在第一个特征提取阶段，我们引入了一个中间瓶颈模块，通过内部损失函数共同学习每个密集模态的类别和边界特征。在融合了密集模态和稀疏信号之后，我们设计了一种双分支架构，明确地模拟了动作类别和时间边界之间的相互关系。实验结果表明，在来自真实工业环境的多模态动作分割数据集上，SigFormer 的表现优于最先进的方法，F1 分数高达 0.958。代码和预训练模型可在 https://github.com/LIUQI-creat/SigFormer 网站上获取。

{"title":"SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Action Segmentation","authors":"Qi Liu, Xinchen Liu, Kun Liu, Xiaoyan Gu, Wu Liu","doi":"10.1145/3657296","DOIUrl":"https://doi.org/10.1145/3657296","url":null,"abstract":"Multi-modal human action segmentation is a critical and challenging task with a wide range of applications. Nowadays, the majority of approaches concentrate on the fusion of dense signals (i.e., RGB, optical flow, and depth maps). However, the potential contributions of sparse IoT sensor signals, which can be crucial for achieving accurate recognition, have not been fully explored. To make up for this, we introduce a Sparse signal-guided Transformer (SigFormer) to combine both dense and sparse signals. We employ mask attention to fuse localized features by constraining cross-attention within the regions where sparse signals are valid. However, since sparse signals are discrete, they lack sufficient information about the temporal action boundaries. Therefore, in SigFormer, we propose to emphasize the boundary information at two stages to alleviate this problem. In the first feature extraction stage, we introduce an intermediate bottleneck module to jointly learn both category and boundary features of each dense modality through the inner loss functions. After the fusion of dense modalities and sparse signals, we then devise a two-branch architecture that explicitly models the interrelationship between action category and temporal boundary. Experimental results demonstrate that SigFormer outperforms the state-of-the-art approaches on a multi-modal action segmentation dataset from real industrial environments, reaching an outstanding F1 score of 0.958. The codes and pre-trained models have been available at https://github.com/LIUQI-creat/SigFormer.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"66 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos 从视频中发现社会互动群体的心理引导型环境感知网络

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-09 DOI: 10.1145/3657295

Jiaqi Yu, Jinhai Yang, Hua Yang, Renjie Pan, Pingrui Lai, Guangtao Zhai

Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge whether or not there are social interactions in a real-world scene, it is difficult for an intelligent system to discover social interactions. Initiating and concluding social interactions are greatly influenced by an individual’s social cognition and the surrounding environment, which are closely related to psychology. Thus, converting the psychological factors that impact social interactions into quantifiable visual representations and creating a model for interaction relationships poses a significant challenge. To this end, we propose a Psychology-Guided Environment Aware Network (PEAN) that models social interaction among people in videos using supervised learning. Specifically, we divide the surrounding environment into scene-aware visual-based and human-aware visual-based descriptions. For the scene-aware visual clue, we utilize 3D features as global visual representations. For the human-aware visual clue, we consider instance-based location and behaviour-related visual representations to map human-centered interaction elements in social psychology: distance, openness and orientation. In addition, we design an environment aware mechanism to integrate features from visual clues, with a Transformer to explore the relation between individuals and construct pairwise interaction strength features. The interaction intensity matrix reflecting the mutual nature of the interaction is obtained by processing the interaction strength features with the interaction discovery module. An interaction constrained loss function composed of interaction critical loss function and smooth F_β loss function is proposed to optimize the whole framework to improve the distinction of the interaction matrix and alleviate class imbalance caused by pairwise interaction sparsity. Given the diversity of real-world interactions, we collect a new dataset named Social Basketball Activity Dataset (Soical-BAD), covering complex social interactions. Our method achieves the best performance among social-CAD, social-BAD, and their combined dataset named Video Social Interaction Dataset (VSID).

社会互动是人类社会的一种普遍现象。与根据个人行为的相似性发现群体不同，社会互动更注重人与人之间的相互影响。虽然人们可以很容易地判断现实世界场景中是否存在社会互动，但智能系统却很难发现社会互动。社会交往的发起和结束在很大程度上受个人的社会认知和周围环境的影响，而这些都与心理学密切相关。因此，将影响社交互动的心理因素转化为可量化的可视化表征，并创建互动关系模型是一项重大挑战。为此，我们提出了心理引导环境感知网络（PEAN），利用监督学习对视频中人与人之间的社交互动进行建模。具体来说，我们将周围环境分为基于场景感知的视觉描述和基于人类感知的视觉描述。对于场景感知视觉线索，我们利用三维特征作为全局视觉表征。对于人感知视觉线索，我们考虑基于实例的位置和行为相关视觉表征，以映射社会心理学中以人为中心的交互元素：距离、开放性和方向。此外，我们还设计了一种环境感知机制来整合来自视觉线索的特征，并利用变形器来探索个体之间的关系，构建成对的交互强度特征。通过交互发现模块处理交互强度特征，可获得反映交互相互性质的交互强度矩阵。由交互临界损失函数和平滑 Fβ 损失函数组成的交互约束损失函数被提出来对整个框架进行优化，以提高交互矩阵的区分度，缓解因成对交互稀疏而导致的类不平衡。鉴于现实世界中互动的多样性，我们收集了一个新的数据集，名为社交篮球活动数据集（Soical-BAD），涵盖了复杂的社交互动。我们的方法在 social-CAD、social-BAD 以及它们的组合数据集（名为视频社交互动数据集，Video Social Interaction Dataset (VSID)）中取得了最佳性能。

{"title":"Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from Videos","authors":"Jiaqi Yu, Jinhai Yang, Hua Yang, Renjie Pan, Pingrui Lai, Guangtao Zhai","doi":"10.1145/3657295","DOIUrl":"https://doi.org/10.1145/3657295","url":null,"abstract":"Social interaction is a common phenomenon in human societies. Different from discovering groups based on the similarity of individuals’ actions, social interaction focuses more on the mutual influence between people. Although people can easily judge whether or not there are social interactions in a real-world scene, it is difficult for an intelligent system to discover social interactions. Initiating and concluding social interactions are greatly influenced by an individual’s social cognition and the surrounding environment, which are closely related to psychology. Thus, converting the psychological factors that impact social interactions into quantifiable visual representations and creating a model for interaction relationships poses a significant challenge. To this end, we propose a Psychology-Guided Environment Aware Network (PEAN) that models social interaction among people in videos using supervised learning. Specifically, we divide the surrounding environment into scene-aware visual-based and human-aware visual-based descriptions. For the scene-aware visual clue, we utilize 3D features as global visual representations. For the human-aware visual clue, we consider instance-based location and behaviour-related visual representations to map human-centered interaction elements in social psychology: distance, openness and orientation. In addition, we design an environment aware mechanism to integrate features from visual clues, with a Transformer to explore the relation between individuals and construct pairwise interaction strength features. The interaction intensity matrix reflecting the mutual nature of the interaction is obtained by processing the interaction strength features with the interaction discovery module. An interaction constrained loss function composed of interaction critical loss function and smooth Fβ loss function is proposed to optimize the whole framework to improve the distinction of the interaction matrix and alleviate class imbalance caused by pairwise interaction sparsity. Given the diversity of real-world interactions, we collect a new dataset named Social Basketball Activity Dataset (Soical-BAD), covering complex social interactions. Our method achieves the best performance among social-CAD, social-BAD, and their combined dataset named Video Social Interaction Dataset (VSID).","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"44 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RSUIGM: Realistic Synthetic Underwater Image Generation with Image Formation Model RSUIGM：利用图像形成模型生成逼真的合成水下图像

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-08 DOI: 10.1145/3656473

Chaitra Desai, Sujay Benur, Ujwala Patil, Uma Mudenagudi

In this paper, we propose to synthesize realistic underwater images with a novel image formation model, considering both downwelling depth and line of sight (LOS) distance as cue and call it as Realistic Synthetic Underwater Image Generation Model, RSUIGM. The light interaction in the ocean is a complex process and demands specific modeling of direct and backscattering phenomenon to capture the degradations. Most of the image formation models rely on complex radiative transfer models and in-situ measurements for synthesizing and restoration of underwater images. Typical image formation models consider only line of sight distance z and ignore downwelling depth d in the estimation of effect of direct light scattering. We derive the dependencies of downwelling irradiance in direct light estimation for generation of synthetic underwater images unlike state-of-the-art image formation models. We propose to incorporate the derived downwelling irradiance in estimation of direct light scattering for modeling the image formation process and generate realistic synthetic underwater images with the proposed RSUIGM, and name it as RSUIGM dataset. We demonstrate the effectiveness of the proposed RSUIGM by using RSUIGM dataset in training deep learning based restoration methods. We compare the quality of restored images with state-of-the-art methods using benchmark real underwater image datasets and achieve improved results. In addition, we validate the distribution of realistic synthetic underwater images versus real underwater images both qualitatively and quantitatively. The proposed RSUIGM dataset is available here.

在本文中，我们提出用一种新颖的图像形成模型合成逼真的水下图像，该模型同时考虑了下沉深度和视线（LOS）距离，并将其称为 "逼真合成水下图像生成模型"（Realistic Synthetic Underwater Image Generation Model，RSUIGM）。海洋中的光相互作用是一个复杂的过程，需要对直接散射和反向散射现象进行具体建模，以捕捉衰减现象。大多数图像生成模型都依赖于复杂的辐射传递模型和现场测量来合成和还原水下图像。典型的图像形成模型在估算直接光散射的影响时，只考虑视线距离 z，而忽略了下沉深度 d。与最先进的图像形成模型不同的是，我们在生成合成水下图像的直射光估算中推导出了下沉辐照度的依赖关系。我们建议将推导出的下沉辐照度纳入直射光散射的估算中，以模拟图像形成过程，并利用所提出的 RSUIGM 生成逼真的合成水下图像，并将其命名为 RSUIGM 数据集。我们利用 RSUIGM 数据集训练基于深度学习的修复方法，证明了所提出的 RSUIGM 的有效性。我们使用基准真实水下图像数据集，将修复图像的质量与最先进的方法进行了比较，并取得了更好的结果。此外，我们还从定性和定量两方面验证了现实合成水下图像与真实水下图像的分布情况。建议的 RSUIGM 数据集可在此处获取。

{"title":"RSUIGM: Realistic Synthetic Underwater Image Generation with Image Formation Model","authors":"Chaitra Desai, Sujay Benur, Ujwala Patil, Uma Mudenagudi","doi":"10.1145/3656473","DOIUrl":"https://doi.org/10.1145/3656473","url":null,"abstract":"In this paper, we propose to synthesize realistic underwater images with a novel image formation model, considering both downwelling depth and line of sight (LOS) distance as cue and call it as Realistic Synthetic Underwater Image Generation Model, RSUIGM. The light interaction in the ocean is a complex process and demands specific modeling of direct and backscattering phenomenon to capture the degradations. Most of the image formation models rely on complex radiative transfer models and in-situ measurements for synthesizing and restoration of underwater images. Typical image formation models consider only line of sight distance z and ignore downwelling depth d in the estimation of effect of direct light scattering. We derive the dependencies of downwelling irradiance in direct light estimation for generation of synthetic underwater images unlike state-of-the-art image formation models. We propose to incorporate the derived downwelling irradiance in estimation of direct light scattering for modeling the image formation process and generate realistic synthetic underwater images with the proposed RSUIGM, and name it as RSUIGM dataset. We demonstrate the effectiveness of the proposed RSUIGM by using RSUIGM dataset in training deep learning based restoration methods. We compare the quality of restored images with state-of-the-art methods using benchmark real underwater image datasets and achieve improved results. In addition, we validate the distribution of realistic synthetic underwater images versus real underwater images both qualitatively and quantitatively. The proposed RSUIGM dataset is available here.\u0000","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"1 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140590438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High Fidelity Makeup via 2D and 3D Identity Preservation Net 通过二维和三维身份保护网进行高保真化妆

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-08 DOI: 10.1145/3656475

Jinliang Liu, Zhedong Zheng, Zongxin Yang, Yi Yang

In this paper, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown promising results in aligning facial parts and transferring makeup textures. However, they often neglect the facial geometry of the source image, leading to two adverse effects: (1) alterations in geometrically relevant facial features, causing face flattening and loss of personality, and (2) difficulties in maintaining background consistency, as networks cannot clearly determine the face-background boundary. To jointly tackle these issues, we propose the High Fidelity Makeup via 2D and 3D Identity Preservation Network (IP23-Net), a novel framework that leverages facial geometry information to generate more realistic results. Our method comprises a 3D Shape Identity Encoder, which extracts identity and 3D shape features. We incorporate a 3D face reconstruction model to ensure the three-dimensional effect of face makeup, thereby preserving the characters’ depth and natural appearance. To preserve background consistency, our Background Correction Decoder automatically predicts an adaptive mask for the source image, distinguishing the foreground and background. In addition to popular benchmarks, we introduce a new large-scale High Resolution Synthetic Makeup Dataset containing 335,230 diverse high-resolution face images, to evaluate our method’s generalization ability. Experiments demonstrate that IP23-Net achieves high-fidelity makeup transfer while effectively preserving background consistency. The code will be made publicly available.

本文探讨了具有挑战性的化妆转移任务，旨在将化妆从参考图像转移到源图像，同时保持面部几何和背景的一致性。现有的基于深度神经网络的方法在对齐面部部件和转移妆容纹理方面取得了可喜的成果。然而，这些方法往往忽略了源图像的面部几何特征，从而导致两种不良后果：(1) 改变与几何特征相关的面部特征，造成面部扁平和个性缺失；(2) 由于网络无法明确确定面部-背景边界，因此难以保持背景一致性。为了共同解决这些问题，我们提出了 "通过二维和三维身份保护网络实现高保真化妆"（IP23-Net），这是一个利用面部几何信息生成更逼真效果的新颖框架。我们的方法包括一个三维形状身份编码器，用于提取身份和三维形状特征。我们结合了三维面部重建模型，以确保面部化妆的三维效果，从而保留人物的深度和自然外观。为了保持背景的一致性，我们的背景校正解码器会自动预测源图像的自适应遮罩，从而区分前景和背景。除了常用的基准数据外，我们还引入了一个新的大规模高分辨率合成化妆数据集，其中包含 335230 张不同的高分辨率人脸图像，以评估我们方法的泛化能力。实验证明，IP23-Net 在有效保持背景一致性的同时，实现了高保真化妆转移。代码将公开发布。

{"title":"High Fidelity Makeup via 2D and 3D Identity Preservation Net","authors":"Jinliang Liu, Zhedong Zheng, Zongxin Yang, Yi Yang","doi":"10.1145/3656475","DOIUrl":"https://doi.org/10.1145/3656475","url":null,"abstract":"In this paper, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown promising results in aligning facial parts and transferring makeup textures. However, they often neglect the facial geometry of the source image, leading to two adverse effects: (1) alterations in geometrically relevant facial features, causing face flattening and loss of personality, and (2) difficulties in maintaining background consistency, as networks cannot clearly determine the face-background boundary. To jointly tackle these issues, we propose the High Fidelity Makeup via 2D and 3D Identity Preservation Network (IP23-Net), a novel framework that leverages facial geometry information to generate more realistic results. Our method comprises a 3D Shape Identity Encoder, which extracts identity and 3D shape features. We incorporate a 3D face reconstruction model to ensure the three-dimensional effect of face makeup, thereby preserving the characters’ depth and natural appearance. To preserve background consistency, our Background Correction Decoder automatically predicts an adaptive mask for the source image, distinguishing the foreground and background. In addition to popular benchmarks, we introduce a new large-scale High Resolution Synthetic Makeup Dataset containing 335,230 diverse high-resolution face images, to evaluate our method’s generalization ability. Experiments demonstrate that IP23-Net achieves high-fidelity makeup transfer while effectively preserving background consistency. The code will be made publicly available.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"56 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140603157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding 基于信息嵌入的 NFT 图像艺术版权自卫保护方案

IF 5.1 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications

Pub Date : 2024-04-06 DOI: 10.1145/3652519

Fan Wang, Zhangjie Fu, Xiang Zhang

Non-convertible tokens (NFTs) have become a fundamental part of the metaverse ecosystem due to its uniqueness and immutability. However, existing copyright protection schemes of NFT image art relied on the NFTs itself minted by third-party platforms. A minted NFT image art only tracks and verifies the entire transaction process, but the legitimacy of the source and ownership of its mapped digital image art cannot be determined. The original author or authorized publisher lack an active defense mechanism to prove ownership of the digital image art mapped by the unauthorized NFT. Therefore, we propose a self-defense copyright protection scheme for NFT image art based on information embedding in this paper, called SDCP-IE. The original author or authorized publisher can embed the copyright information into the published digital image art without damaging its visual effect in advance. Different from the existing information embedding works, the proposed SDCP-IE can generally enhance the invisibility of copyright information with different embedding capacity. Furthermore, considering the scenario of copyright information being discovered or even destroyed by unauthorized parties, the designed SDCP-IE can efficiently generate enhanced digital image art to improve the security performance of embedded image, thus resisting the detection of multiple known and unknown detection models simultaneously. The experimental results have also shown that the PSNR values of enhanced embedded image are all over 57db on three datasets BOSSBase, BOWS2 and ALASKA#2. Moreover, compared with existing information embedding works, the enhanced embedded images generated by SDCP-IE reaches the best transferability performance on the advanced CNN-based detection models. When the target detector is the pre-trained SRNet at 0.4bpp, the test error rate of SDCP-IE at 0.4bpp on the evaluated detection model YeNet reaches 53.38%, which is 4.92%, 28.62% and 7.05% higher than that of the UTGAN, SPS-ENH and Xie-Model, respectively.

不可兑换代币（NFT）因其独特性和不变性，已成为元宇宙生态系统的基本组成部分。然而，现有的 NFT 图像艺术版权保护方案依赖于第三方平台铸造的 NFT 本身。铸造的 NFT 图像艺术只能跟踪和验证整个交易过程，但其映射的数字图像艺术的来源和所有权的合法性却无法确定。原作者或授权出版商缺乏主动防御机制，无法证明未经授权的 NFT 所映射的数字图像艺术的所有权。因此，我们在本文中提出了一种基于信息嵌入的 NFT 图像艺术版权自卫保护方案，称为 SDCP-IE。原作者或授权发布者可以在不破坏其视觉效果的前提下，预先将版权信息嵌入到已发布的数字图像艺术作品中。与现有的信息嵌入作品不同，本文提出的 SDCP-IE 可以通过不同的嵌入能力普遍增强版权信息的隐蔽性。此外，考虑到版权信息被非授权方发现甚至破坏的情况，所设计的 SDCP-IE 可以有效生成增强的数字图像艺术，提高嵌入图像的安全性能，从而同时抵御多种已知和未知检测模型的检测。实验结果还表明，在 BOSSBase、BOWS2 和 ALASKA#2 三个数据集上，增强嵌入图像的 PSNR 值均超过 57db。此外，与现有的信息嵌入作品相比，SDCP-IE 生成的增强嵌入图像在基于 CNN 的高级检测模型上达到了最佳的可移植性能。当目标检测器为0.4bpp的预训练SRNet时，SDCP-IE在0.4bpp的检测模型YeNet上的测试错误率达到53.38%，比UTGAN、SPS-ENH和Xie-Model分别高出4.92%、28.62%和7.05%。

{"title":"A Self-Defense Copyright Protection Scheme for NFT Image Art Based on Information Embedding","authors":"Fan Wang, Zhangjie Fu, Xiang Zhang","doi":"10.1145/3652519","DOIUrl":"https://doi.org/10.1145/3652519","url":null,"abstract":"Non-convertible tokens (NFTs) have become a fundamental part of the metaverse ecosystem due to its uniqueness and immutability. However, existing copyright protection schemes of NFT image art relied on the NFTs itself minted by third-party platforms. A minted NFT image art only tracks and verifies the entire transaction process, but the legitimacy of the source and ownership of its mapped digital image art cannot be determined. The original author or authorized publisher lack an active defense mechanism to prove ownership of the digital image art mapped by the unauthorized NFT. Therefore, we propose a self-defense copyright protection scheme for NFT image art based on information embedding in this paper, called SDCP-IE. The original author or authorized publisher can embed the copyright information into the published digital image art without damaging its visual effect in advance. Different from the existing information embedding works, the proposed SDCP-IE can generally enhance the invisibility of copyright information with different embedding capacity. Furthermore, considering the scenario of copyright information being discovered or even destroyed by unauthorized parties, the designed SDCP-IE can efficiently generate enhanced digital image art to improve the security performance of embedded image, thus resisting the detection of multiple known and unknown detection models simultaneously. The experimental results have also shown that the PSNR values of enhanced embedded image are all over 57db on three datasets BOSSBase, BOWS2 and ALASKA#2. Moreover, compared with existing information embedding works, the enhanced embedded images generated by SDCP-IE reaches the best transferability performance on the advanced CNN-based detection models. When the target detector is the pre-trained SRNet at 0.4bpp, the test error rate of SDCP-IE at 0.4bpp on the evaluated detection model YeNet reaches 53.38%, which is 4.92%, 28.62% and 7.05% higher than that of the UTGAN, SPS-ENH and Xie-Model, respectively.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"82 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0