首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
A large corpus for the recognition of Greek Sign Language gestures 识别希腊手语手势的大型语料库
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-29 DOI: 10.1016/j.cviu.2024.104212
Katerina Papadimitriou , Galini Sapountzaki , Kyriaki Vasilaki , Eleni Efthimiou , Stavroula-Evita Fotinea , Gerasimos Potamianos
Sign language recognition (SLR) from videos constitutes a captivating problem in gesture recognition, requiring the interpretation of hand movements, facial expressions, and body postures. The complexity of sign formation, signing variability among signers, and the technical hurdles of visual detection and tracking render SLR a challenging task. At the same time, the scarcity of large-scale SLR datasets, which are critical for developing robust data-intensive deep-learning SLR models, exacerbates these issues. In this article, we introduce a multi-signer video corpus of Greek Sign Language (GSL), which is the largest GSL database to date, serving as a valuable resource for SLR research. This corpus comprises an extensive RGB+D video collection that conveys rich lexical content in a multi-modal fashion, encompassing three subsets: (i) isolated signs; (ii) continuous signing; and (iii) continuous alphabet fingerspelling of words. Moreover, we introduce a comprehensive experimental setup that paves the way for more accurate and robust SLR solutions. In particular, except for the multi-signer (MS) and signer-independent (SI) settings, we employ a signer-adapted (SA) experimental paradigm, facilitating a comprehensive evaluation of system performance across various scenarios. Further, we provide three baseline SLR systems for isolated signs, continuous signing, and continuous fingerspelling. These systems leverage cutting-edge methods in deep learning and sequence modeling to capture the intricate temporal dynamics inherent in sign gestures. The models are evaluated on the three corpus subsets, setting their state-of-the-art recognition benchmark. The SL-ReDu GSL corpus, including its recommended experimental frameworks, is publicly available at https://sl-redu.e-ce.uth.gr/corpus.
视频手语识别(SLR)是手势识别领域的一个难题,需要对手部动作、面部表情和身体姿势进行解读。手势形成的复杂性、手语者之间的差异性以及视觉检测和跟踪的技术障碍使得手语识别成为一项具有挑战性的任务。同时,大规模 SLR 数据集的缺乏也加剧了这些问题,而大规模 SLR 数据集对于开发稳健的数据密集型深度学习 SLR 模型至关重要。在本文中,我们介绍了希腊手语(GSL)的多手语视频语料库,这是迄今为止最大的希腊手语数据库,是 SLR 研究的宝贵资源。该语料库由大量 RGB+D 视频组成,以多模态方式传递丰富的词汇内容,包括三个子集:(i) 孤立手势;(ii) 连续手势;(iii) 连续字母指拼单词。此外,我们还引入了一个综合实验装置,为更准确、更稳健的 SLR 解决方案铺平了道路。特别是,除了多签名者(MS)和独立签名者(SI)设置外,我们还采用了签名者适应(SA)实验范例,便于在各种情况下全面评估系统性能。此外,我们还提供了针对孤立符号、连续签名和连续指拼的三种基线 SLR 系统。这些系统利用深度学习和序列建模的前沿方法捕捉手势中固有的复杂时间动态。这些模型在三个语料子集上进行了评估,设定了最先进的识别基准。SL-ReDu GSL 语料库,包括其推荐的实验框架,可在 https://sl-redu.e-ce.uth.gr/corpus 上公开获取。
{"title":"A large corpus for the recognition of Greek Sign Language gestures","authors":"Katerina Papadimitriou ,&nbsp;Galini Sapountzaki ,&nbsp;Kyriaki Vasilaki ,&nbsp;Eleni Efthimiou ,&nbsp;Stavroula-Evita Fotinea ,&nbsp;Gerasimos Potamianos","doi":"10.1016/j.cviu.2024.104212","DOIUrl":"10.1016/j.cviu.2024.104212","url":null,"abstract":"<div><div>Sign language recognition (SLR) from videos constitutes a captivating problem in gesture recognition, requiring the interpretation of hand movements, facial expressions, and body postures. The complexity of sign formation, signing variability among signers, and the technical hurdles of visual detection and tracking render SLR a challenging task. At the same time, the scarcity of large-scale SLR datasets, which are critical for developing robust data-intensive deep-learning SLR models, exacerbates these issues. In this article, we introduce a multi-signer video corpus of Greek Sign Language (GSL), which is the largest GSL database to date, serving as a valuable resource for SLR research. This corpus comprises an extensive RGB+D video collection that conveys rich lexical content in a multi-modal fashion, encompassing three subsets: (i) isolated signs; (ii) continuous signing; and (iii) continuous alphabet fingerspelling of words. Moreover, we introduce a comprehensive experimental setup that paves the way for more accurate and robust SLR solutions. In particular, except for the multi-signer (MS) and signer-independent (SI) settings, we employ a signer-adapted (SA) experimental paradigm, facilitating a comprehensive evaluation of system performance across various scenarios. Further, we provide three baseline SLR systems for isolated signs, continuous signing, and continuous fingerspelling. These systems leverage cutting-edge methods in deep learning and sequence modeling to capture the intricate temporal dynamics inherent in sign gestures. The models are evaluated on the three corpus subsets, setting their state-of-the-art recognition benchmark. The SL-ReDu GSL corpus, including its recommended experimental frameworks, is publicly available at <span><span>https://sl-redu.e-ce.uth.gr/corpus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104212"},"PeriodicalIF":4.3,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image compressive sensing reconstruction via nonlocal low-rank residual-based ADMM framework 通过基于非局部低阶残差的 ADMM 框架进行图像压缩传感重建
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-28 DOI: 10.1016/j.cviu.2024.104204
Junhao Zhang , Kim-Hui Yap , Lap-Pui Chau , Ce Zhu
The nonlocal low-rank (LR) modeling has proven to be an effective approach in image compressive sensing (CS) reconstruction, which starts by clustering similar patches using the nonlocal self-similarity (NSS) prior into nonlocal image group and then imposes an LR penalty on each nonlocal image group. However, most existing methods only approximate the LR matrix directly from the degraded nonlocal image group, which may lead to suboptimal LR matrix approximation and thus obtain unsatisfactory reconstruction results. In this paper, we propose a novel nonlocal low-rank residual (NLRR) approach for image CS reconstruction, which progressively approximates the underlying LR matrix by minimizing the LR residual. To do this, we first use the NSS prior to obtaining a good estimate of the original nonlocal image group, and then the LR residual between the degraded nonlocal image group and the estimated nonlocal image group is minimized to derive a more accurate LR matrix. To ensure the optimization is both feasible and reliable, we employ an alternative direction multiplier method (ADMM) to solve the NLRR-based image CS reconstruction problem. Our experimental results show that the proposed NLRR algorithm achieves superior performance against many popular or state-of-the-art image CS reconstruction methods, both in objective metrics and subjective perceptual quality.
非局部低阶(LR)建模已被证明是图像压缩传感(CS)重建中的一种有效方法,它首先利用非局部自相似性(NSS)先验将相似斑块聚类为非局部图像组,然后对每个非局部图像组施加 LR 惩罚。然而,大多数现有方法只是直接从退化的非局部图像组近似 LR 矩阵,这可能会导致 LR 矩阵近似效果不理想,从而得到不尽人意的重建结果。在本文中,我们提出了一种用于图像 CS 重建的新型非局部低阶残差(NLRR)方法,该方法通过最小化 LR 残差逐步逼近底层 LR 矩阵。为此,我们首先使用 NSS 先验法获得原始非本地图像组的良好估计值,然后最小化退化的非本地图像组和估计的非本地图像组之间的 LR 残差,从而得出更精确的 LR 矩阵。为确保优化的可行性和可靠性,我们采用了另一种方向乘法(ADMM)来解决基于 NLRR 的图像 CS 重建问题。我们的实验结果表明,与许多流行的或最先进的图像 CS 重建方法相比,所提出的 NLRR 算法在客观指标和主观感知质量方面都取得了优异的性能。
{"title":"Image compressive sensing reconstruction via nonlocal low-rank residual-based ADMM framework","authors":"Junhao Zhang ,&nbsp;Kim-Hui Yap ,&nbsp;Lap-Pui Chau ,&nbsp;Ce Zhu","doi":"10.1016/j.cviu.2024.104204","DOIUrl":"10.1016/j.cviu.2024.104204","url":null,"abstract":"<div><div>The nonlocal low-rank (LR) modeling has proven to be an effective approach in image compressive sensing (CS) reconstruction, which starts by clustering similar patches using the nonlocal self-similarity (NSS) prior into nonlocal image group and then imposes an LR penalty on each nonlocal image group. However, most existing methods only approximate the LR matrix directly from the degraded nonlocal image group, which may lead to suboptimal LR matrix approximation and thus obtain unsatisfactory reconstruction results. In this paper, we propose a novel nonlocal low-rank residual (NLRR) approach for image CS reconstruction, which progressively approximates the underlying LR matrix by minimizing the LR residual. To do this, we first use the NSS prior to obtaining a good estimate of the original nonlocal image group, and then the LR residual between the degraded nonlocal image group and the estimated nonlocal image group is minimized to derive a more accurate LR matrix. To ensure the optimization is both feasible and reliable, we employ an alternative direction multiplier method (ADMM) to solve the NLRR-based image CS reconstruction problem. Our experimental results show that the proposed NLRR algorithm achieves superior performance against many popular or state-of-the-art image CS reconstruction methods, both in objective metrics and subjective perceptual quality.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104204"},"PeriodicalIF":4.3,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A MLP architecture fusing RGB and CASSI for computational spectral imaging 融合 RGB 和 CASSI 的 MLP 架构,用于计算光谱成像
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-25 DOI: 10.1016/j.cviu.2024.104214
Zeyu Cai , Ru Hong , Xun Lin , Jiming Yang , YouLiang Ni , Zhen Liu , Chengqian Jin , Feipeng Da
The coded Aperture Snapshot Spectral Imaging (CASSI) system offers significant advantages in dynamically acquiring hyper-spectral images compared to traditional measurement methods. However, it faces the following challenges: (1) Traditional masks rely on random patterns or analytical design, limiting CASSI’s performance improvement. (2) Existing CASSI reconstruction algorithms do not fully utilize RGB information. (3) High-quality reconstruction algorithms are often slow and limited to offline scene reconstruction. To address these issues, this paper proposes a new MLP architecture, Spectral–Spatial MLP (SSMLP), which replaces the transformer structure with a network using CASSI measurements and RGB as multimodal inputs. This maintains reconstruction quality while significantly improving reconstruction speed. Additionally, we constructed a teacher-student network (SSMLP with a teacher, SSMLP-WT) to transfer the knowledge learned from a large model to a smaller network, further enhancing the smaller network’s accuracy. Extensive experiments show that SSMLP matches the performance of transformer-based structures in spectral image reconstruction while improving inference speed by at least 50%. The reconstruction quality of SSMLP-WT is further improved by knowledge transfer without changing the network, and the teacher boosts the performance by 0.92 dB (44.73 dB vs. 43.81 dB).
与传统测量方法相比,编码孔径快照光谱成像(CASSI)系统在动态获取高光谱图像方面具有显著优势。然而,它也面临着以下挑战:(1)传统的掩膜依赖于随机模式或分析设计,限制了 CASSI 性能的提高。(2) 现有的 CASSI 重建算法没有充分利用 RGB 信息。(3) 高质量的重建算法通常速度较慢,且仅限于离线场景重建。为了解决这些问题,本文提出了一种新的 MLP 架构--光谱空间 MLP (SSMLP),它利用 CASSI 测量和 RGB 作为多模态输入,用一个网络取代了变压器结构。这既保持了重建质量,又大大提高了重建速度。此外,我们还构建了一个师生网络(SSMLP with a teacher,SSMLP-WT),将从大型模型中学到的知识转移到小型网络中,进一步提高了小型网络的准确性。大量实验表明,SSMLP 在光谱图像重建方面的性能与基于变压器的结构相当,同时推理速度至少提高了 50%。在不改变网络的情况下,通过知识转移,SSMLP-WT 的重建质量得到了进一步提高,教师将其性能提高了 0.92 dB(44.73 dB 对 43.81 dB)。
{"title":"A MLP architecture fusing RGB and CASSI for computational spectral imaging","authors":"Zeyu Cai ,&nbsp;Ru Hong ,&nbsp;Xun Lin ,&nbsp;Jiming Yang ,&nbsp;YouLiang Ni ,&nbsp;Zhen Liu ,&nbsp;Chengqian Jin ,&nbsp;Feipeng Da","doi":"10.1016/j.cviu.2024.104214","DOIUrl":"10.1016/j.cviu.2024.104214","url":null,"abstract":"<div><div>The coded Aperture Snapshot Spectral Imaging (CASSI) system offers significant advantages in dynamically acquiring hyper-spectral images compared to traditional measurement methods. However, it faces the following challenges: (1) Traditional masks rely on random patterns or analytical design, limiting CASSI’s performance improvement. (2) Existing CASSI reconstruction algorithms do not fully utilize RGB information. (3) High-quality reconstruction algorithms are often slow and limited to offline scene reconstruction. To address these issues, this paper proposes a new MLP architecture, Spectral–Spatial MLP (SSMLP), which replaces the transformer structure with a network using CASSI measurements and RGB as multimodal inputs. This maintains reconstruction quality while significantly improving reconstruction speed. Additionally, we constructed a teacher-student network (SSMLP with a teacher, SSMLP-WT) to transfer the knowledge learned from a large model to a smaller network, further enhancing the smaller network’s accuracy. Extensive experiments show that SSMLP matches the performance of transformer-based structures in spectral image reconstruction while improving inference speed by at least 50%. The reconstruction quality of SSMLP-WT is further improved by knowledge transfer without changing the network, and the teacher boosts the performance by 0.92 dB (44.73 dB vs. 43.81 dB).</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104214"},"PeriodicalIF":4.3,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A GCN and Transformer complementary network for skeleton-based action recognition 用于基于骨骼的动作识别的 GCN 和 Transformer 互补网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.cviu.2024.104213
Xuezhi Xiang , Xiaoheng Li , Xuzhao Liu , Yulong Qiao , Abdulmotaleb El Saddik
Graph Convolution Networks (GCNs) have been widely used in skeleton-based action recognition. Although there are significant progress, the inherent limitation still lies in the restricted receptive field of GCN, hindering its ability to extract global dependencies effectively. And the joints that are structurally separated can also have strong correlation. Previous works rarely explore local and global correlations of joints, leading to insufficiently model the complex dynamics of skeleton sequences. To address this issue, we propose a GCN and Transformer complementary network (GTC-Net) that allows parallel communications between GCN and Transformer domains. Specifically, we introduce a graph convolution and self-attention combined module (GAM), which can effectively leverage the complementarity of GCN and self-attention to perceive local and global dependencies of joints for the human body. Furthermore, in order to address the problems of long-term sequence ordering and position detection, we design a position-aware module (PAM), which can explicitly capture the ordering information and unique identity information for body joints of skeleton sequence. Extensive experiments on NTU RGB+D 60 and NTU RGB+D 120 datasets are conducted to evaluate our proposed method. The results demonstrate that our method can achieve competitive results on both datasets.
图卷积网络(GCN)已被广泛应用于基于骨骼的动作识别。虽然取得了重大进展,但其固有的局限性仍然在于 GCN 的感受野有限,阻碍了其有效提取全局依赖关系的能力。而结构上分离的关节也可能具有很强的相关性。以往的研究很少探讨关节的局部和全局相关性,导致无法充分模拟骨架序列的复杂动态。为解决这一问题,我们提出了一种 GCN 和变换器互补网络(GTC-Net),允许 GCN 和变换器域之间并行通信。具体来说,我们引入了图卷积和自注意组合模块(GAM),它能有效利用 GCN 和自注意的互补性来感知人体关节的局部和全局依赖关系。此外,针对长期序列排序和位置检测的问题,我们设计了位置感知模块(PAM),该模块可以明确捕捉骨架序列中身体关节的排序信息和唯一标识信息。我们在 NTU RGB+D 60 和 NTU RGB+D 120 数据集上进行了大量实验,以评估我们提出的方法。结果表明,我们的方法在这两个数据集上都能取得具有竞争力的结果。
{"title":"A GCN and Transformer complementary network for skeleton-based action recognition","authors":"Xuezhi Xiang ,&nbsp;Xiaoheng Li ,&nbsp;Xuzhao Liu ,&nbsp;Yulong Qiao ,&nbsp;Abdulmotaleb El Saddik","doi":"10.1016/j.cviu.2024.104213","DOIUrl":"10.1016/j.cviu.2024.104213","url":null,"abstract":"<div><div>Graph Convolution Networks (GCNs) have been widely used in skeleton-based action recognition. Although there are significant progress, the inherent limitation still lies in the restricted receptive field of GCN, hindering its ability to extract global dependencies effectively. And the joints that are structurally separated can also have strong correlation. Previous works rarely explore local and global correlations of joints, leading to insufficiently model the complex dynamics of skeleton sequences. To address this issue, we propose a GCN and Transformer complementary network (GTC-Net) that allows parallel communications between GCN and Transformer domains. Specifically, we introduce a graph convolution and self-attention combined module (GAM), which can effectively leverage the complementarity of GCN and self-attention to perceive local and global dependencies of joints for the human body. Furthermore, in order to address the problems of long-term sequence ordering and position detection, we design a position-aware module (PAM), which can explicitly capture the ordering information and unique identity information for body joints of skeleton sequence. Extensive experiments on NTU RGB+D 60 and NTU RGB+D 120 datasets are conducted to evaluate our proposed method. The results demonstrate that our method can achieve competitive results on both datasets.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104213"},"PeriodicalIF":4.3,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142528589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reverse Stable Diffusion: What prompt was used to generate this image? 反向稳定扩散:生成这张图片时使用了什么提示?
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.cviu.2024.104210
Florinel-Alin Croitoru , Vlad Hondru , Radu Tudor Ionescu , Mubarak Shah
Text-to-image diffusion models have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we study the task of predicting the prompt embedding given an image generated by a generative diffusion model. We consider a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned). We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation. Our code is publicly available for download at https://github.com/CroitoruAlin/Reverse-Stable-Diffusion.
文本到图像的扩散模型最近引起了许多研究人员的兴趣,而反转扩散过程对于更好地理解生成过程以及如何设计提示以获得所需的图像具有重要作用。为此,我们研究了在生成扩散模型生成图像的情况下预测提示嵌入的任务。我们考虑了一系列白盒模型和黑盒模型(可访问和不可访问扩散网络的权重)来处理所提出的任务。我们提出了一个新颖的学习框架,其中包括一个联合提示回归和多标签词汇分类目标,可生成改进的提示。为了进一步改进我们的方法,我们采用了一种课程学习程序,以促进学习具有较低标签噪声(即更好地对齐)的图像-提示对。我们在 DiffusionDB 数据集上进行了实验,从稳定扩散法生成的图像中预测文本提示。此外,我们还发现了一个有趣的现象:当模型直接用于文本到图像的生成时,在提示生成任务上训练扩散模型可以使模型生成的图像与输入提示更好地对齐。我们的代码可在 https://github.com/CroitoruAlin/Reverse-Stable-Diffusion 上公开下载。
{"title":"Reverse Stable Diffusion: What prompt was used to generate this image?","authors":"Florinel-Alin Croitoru ,&nbsp;Vlad Hondru ,&nbsp;Radu Tudor Ionescu ,&nbsp;Mubarak Shah","doi":"10.1016/j.cviu.2024.104210","DOIUrl":"10.1016/j.cviu.2024.104210","url":null,"abstract":"<div><div>Text-to-image diffusion models have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we study the task of predicting the prompt embedding given an image generated by a generative diffusion model. We consider a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (<em>i</em>.<em>e</em>. that are better aligned). We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation. Our code is publicly available for download at <span><span>https://github.com/CroitoruAlin/Reverse-Stable-Diffusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104210"},"PeriodicalIF":4.3,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142528466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Invisible backdoor attack with attention and steganography 利用注意力和隐写术进行隐形后门攻击
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.cviu.2024.104208
Wenmin Chen, Xiaowei Xu, Xiaodong Wang, Huasong Zhou, Zewen Li, Yangming Chen
Recently, with the development and widespread application of deep neural networks (DNNs), backdoor attacks have posed new security threats to the training process of DNNs. Backdoor attacks on neural networks undermine the security and trustworthiness of DNNs by implanting hidden, unauthorized triggers, leading to benign behavior on clean samples while exhibiting malicious behavior on samples containing backdoor triggers. Existing backdoor attacks typically employ triggers that are sample-agnostic and identical for each sample, resulting in poisoned images that lack naturalness and are ineffective against existing backdoor defenses. To address these issues, this paper proposes a novel stealthy backdoor attack, where the backdoor trigger is dynamic and specific to each sample. Specifically, we leverage spatial attention on images and pre-trained models to obtain dynamic triggers, which are then injected using an encoder–decoder network. The design of the injection network benefits from recent advances in steganography research. To demonstrate the effectiveness of the proposed steganographic network, we design two backdoor attack modes named ASBA and ATBA, where ASBA utilizes the steganographic network for attack, while ATBA is a backdoor attack without steganography. Subsequently, we conducted attacks on Deep Neural Networks (DNNs) using four standard datasets. Our extensive experiments show that ASBA surpasses ATBA in terms of stealthiness and resilience against current defensive measures. Furthermore, both ASBA and ATBA demonstrate superior attack efficiency.
最近,随着深度神经网络(DNN)的发展和广泛应用,后门攻击给 DNN 的训练过程带来了新的安全威胁。对神经网络的后门攻击通过植入隐藏的、未经授权的触发器来破坏 DNN 的安全性和可信度,在干净的样本上导致良性行为,而在包含后门触发器的样本上则表现出恶意行为。现有的后门攻击通常采用与样本无关的触发器,而且每个样本的触发器都是相同的,这导致中毒图像缺乏自然感,对现有的后门防御无效。为了解决这些问题,本文提出了一种新颖的隐形后门攻击,其中的后门触发器是动态的,且针对每个样本。具体来说,我们利用图像上的空间注意力和预先训练的模型来获取动态触发器,然后使用编码器-解码器网络注入这些触发器。注入网络的设计得益于隐写术研究的最新进展。为了证明所提出的隐写术网络的有效性,我们设计了两种后门攻击模式,分别命名为 ASBA 和 ATBA,其中 ASBA 利用隐写术网络进行攻击,而 ATBA 是一种不使用隐写术的后门攻击。随后,我们使用四个标准数据集对深度神经网络(DNN)进行了攻击。我们的大量实验表明,ASBA 在隐蔽性和对当前防御措施的恢复能力方面都超过了 ATBA。此外,ASBA 和 ATBA 都表现出了卓越的攻击效率。
{"title":"Invisible backdoor attack with attention and steganography","authors":"Wenmin Chen,&nbsp;Xiaowei Xu,&nbsp;Xiaodong Wang,&nbsp;Huasong Zhou,&nbsp;Zewen Li,&nbsp;Yangming Chen","doi":"10.1016/j.cviu.2024.104208","DOIUrl":"10.1016/j.cviu.2024.104208","url":null,"abstract":"<div><div>Recently, with the development and widespread application of deep neural networks (DNNs), backdoor attacks have posed new security threats to the training process of DNNs. Backdoor attacks on neural networks undermine the security and trustworthiness of DNNs by implanting hidden, unauthorized triggers, leading to benign behavior on clean samples while exhibiting malicious behavior on samples containing backdoor triggers. Existing backdoor attacks typically employ triggers that are sample-agnostic and identical for each sample, resulting in poisoned images that lack naturalness and are ineffective against existing backdoor defenses. To address these issues, this paper proposes a novel stealthy backdoor attack, where the backdoor trigger is dynamic and specific to each sample. Specifically, we leverage spatial attention on images and pre-trained models to obtain dynamic triggers, which are then injected using an encoder–decoder network. The design of the injection network benefits from recent advances in steganography research. To demonstrate the effectiveness of the proposed steganographic network, we design two backdoor attack modes named ASBA and ATBA, where ASBA utilizes the steganographic network for attack, while ATBA is a backdoor attack without steganography. Subsequently, we conducted attacks on Deep Neural Networks (DNNs) using four standard datasets. Our extensive experiments show that ASBA surpasses ATBA in terms of stealthiness and resilience against current defensive measures. Furthermore, both ASBA and ATBA demonstrate superior attack efficiency.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104208"},"PeriodicalIF":4.3,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142528462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters NeRFtrinsic Four:端到端可训练 NeRF,联合优化各种内在和外在相机参数
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1016/j.cviu.2024.104206
Hannah Schieber , Fabian Deuser , Bernhard Egger , Norbert Oswald , Daniel Roth
Novel view synthesis using neural radiance fields (NeRF) is the state-of-the-art technique for generating high-quality images from novel viewpoints. Existing methods require a priori knowledge about extrinsic and intrinsic camera parameters. This limits their applicability to synthetic scenes, or real-world scenarios with the necessity of a preprocessing step. Current research on the joint optimization of camera parameters and NeRF focuses on refining noisy extrinsic camera parameters and often relies on the preprocessing of intrinsic camera parameters. Further approaches are limited to cover only one single camera intrinsic. To address these limitations, we propose a novel end-to-end trainable approach called NeRFtrinsic Four. We utilize Gaussian Fourier features to estimate extrinsic camera parameters and dynamically predict varying intrinsic camera parameters through the supervision of the projection error. Our approach outperforms existing joint optimization methods on LLFF and BLEFF. In addition to these existing datasets, we introduce a new dataset called iFF with varying intrinsic camera parameters. NeRFtrinsic Four is a step forward in joint optimization NeRF-based view synthesis and enables more realistic and flexible rendering in real-world scenarios with varying camera parameters.
利用神经辐射场(NeRF)进行新视角合成,是目前从新视角生成高质量图像的最先进技术。现有方法需要外在和内在相机参数的先验知识。这就限制了它们在合成场景或现实世界场景中的适用性,因为它们必须进行预处理。目前关于相机参数和 NeRF 联合优化的研究主要集中在改进有噪声的外在相机参数上,通常依赖于对内在相机参数的预处理。更多的方法仅限于覆盖单个相机内在参数。为了解决这些局限性,我们提出了一种新颖的端到端可训练方法,称为 NeRFtrinsic Four。我们利用高斯傅立叶特征来估计外在相机参数,并通过投影误差的监督来动态预测变化的内在相机参数。我们的方法在 LLFF 和 BLEFF 上的表现优于现有的联合优化方法。除了这些现有的数据集之外,我们还引入了一个名为 iFF 的新数据集,该数据集具有变化的内在相机参数。NeRFtrinsic Four 是在基于 NeRF 的联合优化视图合成方面向前迈出的一步,可在相机参数变化的真实世界场景中实现更逼真、更灵活的渲染。
{"title":"NeRFtrinsic Four: An end-to-end trainable NeRF jointly optimizing diverse intrinsic and extrinsic camera parameters","authors":"Hannah Schieber ,&nbsp;Fabian Deuser ,&nbsp;Bernhard Egger ,&nbsp;Norbert Oswald ,&nbsp;Daniel Roth","doi":"10.1016/j.cviu.2024.104206","DOIUrl":"10.1016/j.cviu.2024.104206","url":null,"abstract":"<div><div>Novel view synthesis using neural radiance fields (NeRF) is the state-of-the-art technique for generating high-quality images from novel viewpoints. Existing methods require a priori knowledge about extrinsic and intrinsic camera parameters. This limits their applicability to synthetic scenes, or real-world scenarios with the necessity of a preprocessing step. Current research on the joint optimization of camera parameters and NeRF focuses on refining noisy extrinsic camera parameters and often relies on the preprocessing of intrinsic camera parameters. Further approaches are limited to cover only one single camera intrinsic. To address these limitations, we propose a novel end-to-end trainable approach called NeRFtrinsic Four. We utilize Gaussian Fourier features to estimate extrinsic camera parameters and dynamically predict varying intrinsic camera parameters through the supervision of the projection error. Our approach outperforms existing joint optimization methods on LLFF and BLEFF. In addition to these existing datasets, we introduce a new dataset called iFF with varying intrinsic camera parameters. NeRFtrinsic Four is a step forward in joint optimization NeRF-based view synthesis and enables more realistic and flexible rendering in real-world scenarios with varying camera parameters.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104206"},"PeriodicalIF":4.3,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142528464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight cross-modal transformer for RGB-D salient object detection 用于 RGB-D 突出物体检测的轻量级跨模态变换器
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-17 DOI: 10.1016/j.cviu.2024.104194
Nianchang Huang , Yang Yang , Qiang Zhang , Jungong Han , Jin Huang
Recently, Transformer-based RGB-D salient object detection (SOD) models have pushed the performance to a new level. However, they come at the cost of consuming abundant resources, including memory and power, thus hindering their real-life applications. To remedy this situation, a novel lightweight cross-modal Transformer (LCT) for RGB-D SOD will be presented in this paper. Specifically, LCT will first reduce its parameters and computational costs by employing a middle-level feature fusion structure and taking a lightweight Transformer as the backbone. Then, with the aid of Transformers, it will compensate for performance degradation by effectively capturing the cross-modal and cross-level complementary information from the multi-modal input images. To this end, a cross-modal enhancement and fusion module (CEFM) with a lightweight channel-wise cross attention block (LCCAB) will be designed to capture the cross-modal complementary information effectively but with fewer costs. A bi-directional multi-level feature interaction module (Bi-MFIM) with a lightweight spatial-wise cross attention block (LSCAB) will be designed to capture the cross-level complementary context information. By virtue of CEFM and Bi-MFIM, the performance degradation caused by parameter reduction can be well compensated, thus boosting the performances. By doing so, our proposed model has only 2.8M parameters with 7.6G FLOPs and runs at 66 FPS. Furthermore, experimental results on several benchmark datasets show that our proposed model can achieve competitive or even better results than other models. Our code will be released on https://github.com/nexiakele/lightweight-cross-modal-Transformer-LCT-for-RGB-D-SOD.
最近,基于变压器的 RGB-D 突出物体检测(SOD)模型将性能提升到了一个新的水平。然而,它们的代价是消耗大量资源,包括内存和电力,从而阻碍了它们在现实生活中的应用。为了解决这一问题,本文将介绍一种用于 RGB-D SOD 的新型轻量级跨模态变换器(LCT)。具体来说,LCT 将首先采用中间层特征融合结构,以轻量级 Transformer 为骨干,从而降低参数和计算成本。然后,借助变换器,有效捕捉多模态输入图像中的跨模态和跨级别互补信息,从而弥补性能的下降。为此,将设计一个带有轻量级信道交叉注意模块(LCCAB)的跨模态增强和融合模块(CEFM),以有效捕捉跨模态互补信息,同时降低成本。双向多级特征交互模块(Bi-MFIM)与轻量级空间交叉注意模块(LSCAB)将被设计用于捕捉跨级互补上下文信息。借助 CEFM 和 Bi-MFIM,可以很好地补偿因参数减少而导致的性能下降,从而提高性能。通过这种方法,我们提出的模型只有 2.8M 个参数、7.6G FLOPs 和 66 FPS 的运行速度。此外,在多个基准数据集上的实验结果表明,我们提出的模型可以取得与其他模型相当甚至更好的结果。我们的代码将在 https://github.com/nexiakele/lightweight-cross-modal-Transformer-LCT-for-RGB-D-SOD 上发布。
{"title":"Lightweight cross-modal transformer for RGB-D salient object detection","authors":"Nianchang Huang ,&nbsp;Yang Yang ,&nbsp;Qiang Zhang ,&nbsp;Jungong Han ,&nbsp;Jin Huang","doi":"10.1016/j.cviu.2024.104194","DOIUrl":"10.1016/j.cviu.2024.104194","url":null,"abstract":"<div><div>Recently, Transformer-based RGB-D salient object detection (SOD) models have pushed the performance to a new level. However, they come at the cost of consuming abundant resources, including memory and power, thus hindering their real-life applications. To remedy this situation, a novel lightweight cross-modal Transformer (LCT) for RGB-D SOD will be presented in this paper. Specifically, LCT will first reduce its parameters and computational costs by employing a middle-level feature fusion structure and taking a lightweight Transformer as the backbone. Then, with the aid of Transformers, it will compensate for performance degradation by effectively capturing the cross-modal and cross-level complementary information from the multi-modal input images. To this end, a cross-modal enhancement and fusion module (CEFM) with a lightweight channel-wise cross attention block (LCCAB) will be designed to capture the cross-modal complementary information effectively but with fewer costs. A bi-directional multi-level feature interaction module (Bi-MFIM) with a lightweight spatial-wise cross attention block (LSCAB) will be designed to capture the cross-level complementary context information. By virtue of CEFM and Bi-MFIM, the performance degradation caused by parameter reduction can be well compensated, thus boosting the performances. By doing so, our proposed model has only 2.8M parameters with 7.6G FLOPs and runs at 66 FPS. Furthermore, experimental results on several benchmark datasets show that our proposed model can achieve competitive or even better results than other models. Our code will be released on <span><span>https://github.com/nexiakele/lightweight-cross-modal-Transformer-LCT-for-RGB-D-SOD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104194"},"PeriodicalIF":4.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142528465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning PMGNet:互不纠缠和纠缠互利,促进合成零点学习
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-16 DOI: 10.1016/j.cviu.2024.104197
Yu Liu , Jianghao Li , Yanyi Zhang , Qi Jia , Weimin Wang , Nan Pu , Nicu Sebe
Compositional zero-shot learning (CZSL) aims to model compositions of two primitives (i.e., attributes and objects) to classify unseen attribute-object pairs. Most studies are devoted to integrating disentanglement and entanglement strategies to circumvent the trade-off between contextuality and generalizability. Indeed, the two strategies can mutually benefit when used together. Nevertheless, they neglect the significance of developing mutual guidance between the two strategies. In this work, we take full advantage of guidance from disentanglement to entanglement and vice versa. Additionally, we propose exploring multi-scale feature learning to achieve fine-grained mutual guidance in a progressive framework. Our approach, termed Progressive Mutual Guidance Network (PMGNet), unifies disentanglement–entanglement representation learning, allowing them to learn from and teach each other progressively in one unified model. Furthermore, to alleviate overfitting recognition on seen pairs, we adopt a relaxed cross-entropy loss to train PMGNet, without an increase of time and memory cost. Extensive experiments on three benchmarks demonstrate that our method achieves distinct improvements, reaching state-of-the-art performance. Moreover, PMGNet exhibits promising performance under the most challenging open-world CZSL setting, especially for unseen pairs.
组合零点学习(CZSL)旨在为两个基元(即属性和对象)的组合建模,从而对未见的属性-对象对进行分类。大多数研究都致力于整合非纠缠和纠缠策略,以规避情境性和概括性之间的权衡。事实上,这两种策略结合使用可以互惠互利。然而,这些研究忽视了在两种策略之间发展相互引导的意义。在这项工作中,我们充分利用了从非纠缠到纠缠以及反之亦然的引导优势。此外,我们还提出探索多尺度特征学习,以在渐进框架中实现细粒度的相互引导。我们的方法被称为渐进式相互引导网络(Progressive Mutual Guidance Network,PMGNet),它将非纠缠-纠缠表示学习统一起来,使它们能够在一个统一的模型中逐步相互学习和传授。此外,为了减轻对所见对的过拟合识别,我们采用了一种宽松的交叉熵损失来训练 PMGNet,而不会增加时间和内存成本。在三个基准上进行的广泛实验表明,我们的方法取得了明显的改进,达到了最先进的性能。此外,在最具挑战性的开放世界 CZSL 环境下,PMGNet 表现出了良好的性能,尤其是对于未识别的配对。
{"title":"PMGNet: Disentanglement and entanglement benefit mutually for compositional zero-shot learning","authors":"Yu Liu ,&nbsp;Jianghao Li ,&nbsp;Yanyi Zhang ,&nbsp;Qi Jia ,&nbsp;Weimin Wang ,&nbsp;Nan Pu ,&nbsp;Nicu Sebe","doi":"10.1016/j.cviu.2024.104197","DOIUrl":"10.1016/j.cviu.2024.104197","url":null,"abstract":"<div><div>Compositional zero-shot learning (CZSL) aims to model compositions of two primitives (i.e., attributes and objects) to classify unseen attribute-object pairs. Most studies are devoted to integrating disentanglement and entanglement strategies to circumvent the trade-off between contextuality and generalizability. Indeed, the two strategies can mutually benefit when used together. Nevertheless, they neglect the significance of developing mutual guidance between the two strategies. In this work, we take full advantage of guidance from disentanglement to entanglement and vice versa. Additionally, we propose exploring multi-scale feature learning to achieve fine-grained mutual guidance in a progressive framework. Our approach, termed Progressive Mutual Guidance Network (PMGNet), unifies disentanglement–entanglement representation learning, allowing them to learn from and teach each other progressively in one unified model. Furthermore, to alleviate overfitting recognition on seen pairs, we adopt a relaxed cross-entropy loss to train PMGNet, without an increase of time and memory cost. Extensive experiments on three benchmarks demonstrate that our method achieves distinct improvements, reaching state-of-the-art performance. Moreover, PMGNet exhibits promising performance under the most challenging open-world CZSL setting, especially for unseen pairs.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104197"},"PeriodicalIF":4.3,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FTM: The Face Truth Machine—Hand-crafted features from micro-expressions to support lie detection FTM:面部真实机器--从微表情中手工创建特征,支持谎言检测
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-16 DOI: 10.1016/j.cviu.2024.104188
Maria De Marsico, Giordano Dionisi, Donato Francesco Pio Stanco
This work deals with the delicate task of lie detection from facial dynamics. The proposed Face Truth Machine (FTM) is an intelligent system able to support a human operator without any special equipment. It can be embedded in the present infrastructures for forensic investigation or whenever it is required to assess the trustworthiness of responses during an interview. Due to its flexibility and its non-invasiveness, it can overcome some limitations of present solutions. Of course, privacy issues may arise from the use of such systems, as often underlined nowadays. However, it is up to the utilizer to take these into account and make fair use of tools of this kind. The paper will discuss particular aspects of the dynamic analysis of face landmarks to detect lies. In particular, it will delve into the behavior of the features used for detection and how these influence the system’s final decision. The novel detection system underlying the Face Truth Machine is able to analyze the subject’s expressions in a wide range of poses. The results of the experiments presented testify to the potential of the proposed approach and also highlight the very good results obtained in cross-dataset testing, which usually represents a challenge for other approaches.
这项工作涉及从面部动态侦测谎言的精细任务。所提出的面部真实机器(FTM)是一种智能系统,无需任何特殊设备即可为人类操作员提供支持。它可以嵌入到现有的法医调查基础设施中,也可以在任何需要评估访谈过程中回答的可信度的时候使用。由于其灵活性和非侵入性,它可以克服现有解决方案的一些局限性。当然,使用这种系统可能会产生隐私问题,这也是目前经常强调的问题。不过,使用者应该考虑到这些问题,并公平地使用这类工具。本文将讨论动态分析人脸地标以检测谎言的特定方面。特别是,本文将深入探讨用于检测的特征的行为,以及这些特征如何影响系统的最终决定。脸部真实机器所采用的新型检测系统能够分析被试者在各种姿势下的表情。所展示的实验结果证明了所提出方法的潜力,同时也凸显了在跨数据集测试中取得的优异成绩,而这通常是其他方法所面临的挑战。
{"title":"FTM: The Face Truth Machine—Hand-crafted features from micro-expressions to support lie detection","authors":"Maria De Marsico,&nbsp;Giordano Dionisi,&nbsp;Donato Francesco Pio Stanco","doi":"10.1016/j.cviu.2024.104188","DOIUrl":"10.1016/j.cviu.2024.104188","url":null,"abstract":"<div><div>This work deals with the delicate task of lie detection from facial dynamics. The proposed Face Truth Machine (FTM) is an intelligent system able to support a human operator without any special equipment. It can be embedded in the present infrastructures for forensic investigation or whenever it is required to assess the trustworthiness of responses during an interview. Due to its flexibility and its non-invasiveness, it can overcome some limitations of present solutions. Of course, privacy issues may arise from the use of such systems, as often underlined nowadays. However, it is up to the utilizer to take these into account and make fair use of tools of this kind. The paper will discuss particular aspects of the dynamic analysis of face landmarks to detect lies. In particular, it will delve into the behavior of the features used for detection and how these influence the system’s final decision. The novel detection system underlying the Face Truth Machine is able to analyze the subject’s expressions in a wide range of poses. The results of the experiments presented testify to the potential of the proposed approach and also highlight the very good results obtained in cross-dataset testing, which usually represents a challenge for other approaches.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104188"},"PeriodicalIF":4.3,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142528463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1