首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Dual cross-enhancement network for highly accurate dichotomous image segmentation 用于高精度二分图像分割的双交叉增强网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-02 DOI: 10.1016/j.cviu.2024.104122

The existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, such as salient, camouflaged, and meticulous objects, etc. However, the research of highly accurate Dichotomous Image Segmentation (DIS) combining these tasks has just started and still faces problems such as insufficient information interaction between layers and incomplete integration of high-level semantic information and low-level detailed features. In this paper, a new dual cross-enhancement network (DCENet) for highly accurate DIS is proposed, which mainly consists of two new modules: a cross-scaling guidance (CSG) module and a semantic cross-transplantation (SCT) module. Specifically, the CSG module adopts the adjacent-layer cross-scaling guidance method, which can efficiently interact with the multi-scale features of the adjacent layers extracted; the SCT module uses dual-branch features to complement each other. Moreover, in the way of transplantation, the high-level semantic information of the low-resolution branch is used to guide the low-level detail features of the high-resolution branch, and the features of different resolution branches are effectively fused. Finally, experimental results on the challenging DIS5K benchmark dataset show that the proposed network outperforms the 9 state-of-the-art (SOTA) networks in 5 widely used evaluation metrics. In addition, the ablation experiments also demonstrate the effectiveness of the cross-scaling guidance module and the semantic cross-transplantation module.

现有的图像分割任务主要集中于分割具有特定特征的物体,如突出物体、伪装物体和细致物体等。然而,结合这些任务的高精度二分图像分割(DIS)研究才刚刚起步,仍然面临着层间信息交互不足、高层语义信息与低层细节特征融合不彻底等问题。本文提出了一种用于高精度 DIS 的新型双交叉增强网络(DCENet),它主要由两个新模块组成:交叉缩放引导(CSG)模块和语义交叉移植(SCT)模块。具体来说,CSG 模块采用相邻层交叉缩放引导方法,可与提取的相邻层多尺度特征有效交互;SCT 模块采用双分支特征互补。此外,在移植方式上,利用低分辨率分支的高层语义信息引导高分辨率分支的低层细节特征,有效融合了不同分辨率分支的特征。最后,在具有挑战性的 DIS5K 基准数据集上的实验结果表明,在 5 个广泛使用的评估指标上,所提出的网络优于 9 个最先进的网络(SOTA)。此外,消融实验还证明了交叉缩放引导模块和语义交叉移植模块的有效性。
{"title":"Dual cross-enhancement network for highly accurate dichotomous image segmentation","authors":"","doi":"10.1016/j.cviu.2024.104122","DOIUrl":"10.1016/j.cviu.2024.104122","url":null,"abstract":"<div><p>The existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, such as salient, camouflaged, and meticulous objects, etc. However, the research of highly accurate Dichotomous Image Segmentation (DIS) combining these tasks has just started and still faces problems such as insufficient information interaction between layers and incomplete integration of high-level semantic information and low-level detailed features. In this paper, a new dual cross-enhancement network (DCENet) for highly accurate DIS is proposed, which mainly consists of two new modules: a cross-scaling guidance (CSG) module and a semantic cross-transplantation (SCT) module. Specifically, the CSG module adopts the adjacent-layer cross-scaling guidance method, which can efficiently interact with the multi-scale features of the adjacent layers extracted; the SCT module uses dual-branch features to complement each other. Moreover, in the way of transplantation, the high-level semantic information of the low-resolution branch is used to guide the low-level detail features of the high-resolution branch, and the features of different resolution branches are effectively fused. Finally, experimental results on the challenging DIS5K benchmark dataset show that the proposed network outperforms the 9 state-of-the-art (SOTA) networks in 5 widely used evaluation metrics. In addition, the ablation experiments also demonstrate the effectiveness of the cross-scaling guidance module and the semantic cross-transplantation module.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VADS: Visuo-Adaptive DualStrike attack on visual question answer VADS:视觉问题解答的 Visuo-Adaptive DualStrike 攻击
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-31 DOI: 10.1016/j.cviu.2024.104137

Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. The adversarial vulnerability of VQA models is crucial for their reliability in real-world applications. However, current VQA attacks are mainly focused on the white-box and transfer-based settings, which require the attacker to have full or partial prior knowledge of victim VQA models. Besides that, query-based VQA attacks require a massive amount of query times, which the victim model may detect. In this paper, we propose the Visuo-Adaptive DualStrike (VADS) attack, a novel adversarial attack method combining transfer-based and query-based strategies to exploit vulnerabilities in VQA systems. Unlike current VQA attacks focusing on either approach, VADS leverages a momentum-like ensemble method to search potential attack targets and compress the perturbation. After that, our method employs a query-based strategy to dynamically adjust the weight of perturbation per surrogate model. We evaluate the effectiveness of VADS across 8 VQA models and two datasets. The results demonstrate that VADS outperforms existing adversarial techniques in both efficiency and success rate. Our code is available at: https://github.com/stevenzhang9577/VADS.

视觉问题解答(VQA)是计算机视觉和自然语言处理领域的一项基本任务。VQA 模型的对抗脆弱性对其在实际应用中的可靠性至关重要。然而,目前的 VQA 攻击主要集中在白盒和基于传输的设置上,这要求攻击者对受害者的 VQA 模型有完全或部分的先验知识。此外,基于查询的 VQA 攻击需要大量的查询次数,而受害者模型可能会检测到这些查询次数。在本文中,我们提出了 Visuo-Adaptive DualStrike(VADS)攻击,这是一种新型对抗攻击方法,结合了基于传输和基于查询的策略,以利用 VQA 系统中的漏洞。不同于目前的 VQA 攻击只关注其中一种方法,VADS 利用类似动量的集合方法来搜索潜在的攻击目标并压缩扰动。然后,我们的方法采用基于查询的策略,动态调整每个代理模型的扰动权重。我们在 8 个 VQA 模型和两个数据集上评估了 VADS 的有效性。结果表明,VADS 在效率和成功率上都优于现有的对抗技术。我们的代码可在以下网址获取:https://github.com/stevenzhang9577/VADS。
{"title":"VADS: Visuo-Adaptive DualStrike attack on visual question answer","authors":"","doi":"10.1016/j.cviu.2024.104137","DOIUrl":"10.1016/j.cviu.2024.104137","url":null,"abstract":"<div><p>Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. The adversarial vulnerability of VQA models is crucial for their reliability in real-world applications. However, current VQA attacks are mainly focused on the white-box and transfer-based settings, which require the attacker to have full or partial prior knowledge of victim VQA models. Besides that, query-based VQA attacks require a massive amount of query times, which the victim model may detect. In this paper, we propose the Visuo-Adaptive DualStrike (VADS) attack, a novel adversarial attack method combining transfer-based and query-based strategies to exploit vulnerabilities in VQA systems. Unlike current VQA attacks focusing on either approach, VADS leverages a momentum-like ensemble method to search potential attack targets and compress the perturbation. After that, our method employs a query-based strategy to dynamically adjust the weight of perturbation per surrogate model. We evaluate the effectiveness of VADS across 8 VQA models and two datasets. The results demonstrate that VADS outperforms existing adversarial techniques in both efficiency and success rate. Our code is available at: <span><span>https://github.com/stevenzhang9577/VADS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetrical Siamese Network for pose-guided person synthesis 用于姿势引导的人物合成的对称连体网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-28 DOI: 10.1016/j.cviu.2024.104134

Pose-Guided Person Image Synthesis (PGPIS) aims to generate a realistic person image that preserves the appearance of the source person while adopting the target pose. Various appearances and drastic pose changes make this task highly challenging. Due to the insufficient utilization of paired data, existing models face difficulties in accurately preserving the source appearance details and high-frequency textures in the generated images. Meanwhile, although current popular AdaIN-based methods are advantageous in handling drastic pose changes, they struggle to capture diverse clothing shapes imposed by the limitation of global feature statistics. To address these issues, we propose a novel Symmetrical Siamese Network (SSNet) for PGPIS, which consists of two synergistic symmetrical generative branches that leverage prior knowledge of paired data to comprehensively exploit appearance details. For feature integration, we propose a Style Matching Module (SMM) to transfer multi-level region appearance styles and gradient information to the desired pose for enriching the high-frequency textures. Furthermore, to overcome the limitation of global feature statistics, a Spatial Attention Module (SAM) is introduced to complement the SMM for capturing clothing shapes. Extensive experiments show the effectiveness of our SSNet, achieving state-of-the-art results on public datasets. Moreover, our SSNet can also edit the source appearance attributes, making it versatile in wider application scenarios.

姿态引导的人物图像合成(PGPIS)旨在生成逼真的人物图像,在采用目标姿态的同时保留源人物的外观。各种外观和剧烈的姿势变化使这项任务极具挑战性。由于没有充分利用配对数据,现有模型在生成的图像中难以准确保留源外观细节和高频纹理。同时,尽管目前流行的基于 AdaIN 的方法在处理剧烈姿势变化方面具有优势,但由于全局特征统计的限制,这些方法在捕捉不同的服装形状方面存在困难。为了解决这些问题,我们为 PGPIS 提出了一种新颖的对称连体网络(SSNet),它由两个协同对称生成分支组成,可利用配对数据的先验知识来全面利用外观细节。在特征整合方面,我们提出了风格匹配模块(SMM),将多级区域外观风格和梯度信息传输到所需姿势,以丰富高频纹理。此外,为了克服全局特征统计的局限性,我们还引入了空间关注模块(SAM)来补充 SMM,以捕捉服装形状。广泛的实验表明,我们的 SSNet 非常有效,在公共数据集上取得了最先进的结果。此外,我们的 SSNet 还能编辑源外观属性,使其在更广泛的应用场景中发挥更大作用。
{"title":"Symmetrical Siamese Network for pose-guided person synthesis","authors":"","doi":"10.1016/j.cviu.2024.104134","DOIUrl":"10.1016/j.cviu.2024.104134","url":null,"abstract":"<div><p>Pose-Guided Person Image Synthesis (PGPIS) aims to generate a realistic person image that preserves the appearance of the source person while adopting the target pose. Various appearances and drastic pose changes make this task highly challenging. Due to the insufficient utilization of paired data, existing models face difficulties in accurately preserving the source appearance details and high-frequency textures in the generated images. Meanwhile, although current popular AdaIN-based methods are advantageous in handling drastic pose changes, they struggle to capture diverse clothing shapes imposed by the limitation of global feature statistics. To address these issues, we propose a novel Symmetrical Siamese Network (SSNet) for PGPIS, which consists of two synergistic symmetrical generative branches that leverage prior knowledge of paired data to comprehensively exploit appearance details. For feature integration, we propose a Style Matching Module (SMM) to transfer multi-level region appearance styles and gradient information to the desired pose for enriching the high-frequency textures. Furthermore, to overcome the limitation of global feature statistics, a Spatial Attention Module (SAM) is introduced to complement the SMM for capturing clothing shapes. Extensive experiments show the effectiveness of our SSNet, achieving state-of-the-art results on public datasets. Moreover, our SSNet can also edit the source appearance attributes, making it versatile in wider application scenarios.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142148048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An egocentric video and eye-tracking dataset for visual search in convenience stores 用于便利店视觉搜索的自我中心视频和眼动跟踪数据集
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-28 DOI: 10.1016/j.cviu.2024.104129

We introduce an egocentric video and eye-tracking dataset, comprised of 108 first-person videos of 36 shoppers searching for three different products (orange juice, KitKat chocolate bars, and canned tuna) in a convenience store, along with the frame-centered eye fixation locations for each video frame. The dataset also includes demographic information about each participant in the form of an 11-question survey. The paper describes two applications using the dataset — an analysis of eye fixations during search in the store, and a training of a clustered saliency model for predicting saliency of viewers engaged in product search in the store. The fixation analysis shows that fixation duration statistics are very similar to those found in image and video viewing, suggesting that similar visual processing is employed during search in 3D environments and during viewing of imagery on computer screens. A clustering technique was applied to the questionnaire data, which resulted in two clusters being detected. Based on these clusters, personalized saliency prediction models were trained on the store fixation data, which provided improved performance in prediction saliency on the store video data compared to state-of-the art universal saliency prediction methods.

我们介绍了一个以自我为中心的视频和眼动跟踪数据集,该数据集由 108 个第一人称视频组成,视频中的 36 名购物者在一家便利店中寻找三种不同的产品(橙汁、KitKat 巧克力棒和金枪鱼罐头),同时还包括每个视频帧的以帧为中心的眼球固定位置。数据集还包括以 11 个问题的调查形式提供的每位参与者的人口统计学信息。论文介绍了使用该数据集的两个应用--在商店搜索过程中的眼球定格分析,以及用于预测在商店中进行产品搜索的观众的显著性的聚类显著性模型的训练。定点分析表明,定点持续时间统计与图像和视频观看中的定点持续时间统计非常相似,这表明在三维环境中进行搜索和在计算机屏幕上观看图像时采用了类似的视觉处理方法。对问卷数据采用了聚类技术,结果发现了两个聚类。在这些聚类的基础上,对商店的固定数据进行了个性化的显著性预测模型训练,与最先进的通用显著性预测方法相比,这种方法在预测商店视频数据的显著性方面有更好的表现。
{"title":"An egocentric video and eye-tracking dataset for visual search in convenience stores","authors":"","doi":"10.1016/j.cviu.2024.104129","DOIUrl":"10.1016/j.cviu.2024.104129","url":null,"abstract":"<div><p>We introduce an egocentric video and eye-tracking dataset, comprised of 108 first-person videos of 36 shoppers searching for three different products (orange juice, KitKat chocolate bars, and canned tuna) in a convenience store, along with the frame-centered eye fixation locations for each video frame. The dataset also includes demographic information about each participant in the form of an 11-question survey. The paper describes two applications using the dataset — an analysis of eye fixations during search in the store, and a training of a clustered saliency model for predicting saliency of viewers engaged in product search in the store. The fixation analysis shows that fixation duration statistics are very similar to those found in image and video viewing, suggesting that similar visual processing is employed during search in 3D environments and during viewing of imagery on computer screens. A clustering technique was applied to the questionnaire data, which resulted in two clusters being detected. Based on these clusters, personalized saliency prediction models were trained on the store fixation data, which provided improved performance in prediction saliency on the store video data compared to state-of-the art universal saliency prediction methods.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002108/pdfft?md5=dfee816a569ed0f626ef6b190cabb0bc&pid=1-s2.0-S1077314224002108-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142097630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URINet: Unsupervised point cloud rotation invariant representation learning via semantic and structural reasoning URINet:通过语义和结构推理进行无监督点云旋转不变表示学习
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-28 DOI: 10.1016/j.cviu.2024.104136

In recent years, many rotation-invariant networks have been proposed to alleviate the interference caused by point cloud arbitrary rotations. These networks have demonstrated powerful representation learning capabilities. However, most of those methods rely on costly manually annotated supervision for model training. Moreover, they fail to reason the structural relations and lose global information. To address these issues, we present an unsupervised method for achieving comprehensive rotation invariant representations without human annotation. Specifically, we propose a novel encoder–decoder architecture named URINet, which learns a point cloud representation by combining local semantic and global structural information, and then reconstructs the input without rotation perturbation. In detail, the encoder is a two-branch network where the graph convolution based structural branch models the relationships among local regions to learn global structural knowledge and the semantic branch learns rotation invariant local semantic features. The two branches derive complementary information and explore the point clouds comprehensively. Furthermore, to avoid the self-reconstruction ambiguity brought by uncertain poses, a bidirectional alignment is proposed to measure the quality of reconstruction results without orientation knowledge. Extensive experiments on downstream tasks show that the proposed method significantly surpasses existing state-of-the-art methods on both synthetic and real-world datasets.

近年来,人们提出了许多旋转不变网络,以减轻点云任意旋转造成的干扰。这些网络展示了强大的表征学习能力。然而,这些方法大多依赖于昂贵的人工标注监督来训练模型。此外,这些方法无法推理结构关系,也会丢失全局信息。为了解决这些问题,我们提出了一种无监督方法,无需人工标注即可实现全面的旋转不变表示。具体来说,我们提出了一种名为 URINet 的新型编码器-解码器架构,它通过结合局部语义信息和全局结构信息来学习点云表示,然后在没有旋转扰动的情况下重建输入。具体来说,编码器是一个双分支网络,其中基于图卷积的结构分支对局部区域之间的关系进行建模,以学习全局结构知识,而语义分支则学习旋转不变的局部语义特征。两个分支获得互补信息,全面探索点云。此外,为了避免不确定姿态带来的自我重建模糊性,还提出了一种双向配准方法,用于衡量无方向知识情况下的重建结果质量。对下游任务的广泛实验表明,所提出的方法在合成数据集和真实世界数据集上都大大超过了现有的最先进方法。
{"title":"URINet: Unsupervised point cloud rotation invariant representation learning via semantic and structural reasoning","authors":"","doi":"10.1016/j.cviu.2024.104136","DOIUrl":"10.1016/j.cviu.2024.104136","url":null,"abstract":"<div><p>In recent years, many rotation-invariant networks have been proposed to alleviate the interference caused by point cloud arbitrary rotations. These networks have demonstrated powerful representation learning capabilities. However, most of those methods rely on costly manually annotated supervision for model training. Moreover, they fail to reason the structural relations and lose global information. To address these issues, we present an unsupervised method for achieving comprehensive rotation invariant representations without human annotation. Specifically, we propose a novel encoder–decoder architecture named URINet, which learns a point cloud representation by combining local semantic and global structural information, and then reconstructs the input without rotation perturbation. In detail, the encoder is a two-branch network where the graph convolution based structural branch models the relationships among local regions to learn global structural knowledge and the semantic branch learns rotation invariant local semantic features. The two branches derive complementary information and explore the point clouds comprehensively. Furthermore, to avoid the self-reconstruction ambiguity brought by uncertain poses, a bidirectional alignment is proposed to measure the quality of reconstruction results without orientation knowledge. Extensive experiments on downstream tasks show that the proposed method significantly surpasses existing state-of-the-art methods on both synthetic and real-world datasets.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSU-GAN: A robust frontal face recognition approach based on generative adversarial network DSU-GAN:基于生成式对抗网络的稳健正面人脸识别方法
IF 4.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-26 DOI: 10.1016/j.cviu.2024.104128
Deyu Lin, Huanxin Wang, Xin Lei, Weidong Min, Chenguang Yao, Yuan Zhong, Yong Liang Guan
Face recognition technology is widely used in different areas, such as entrance guard, payment . However, little attention has been given to non-positive faces recognition, especially model training and the quality of the generated images. To this end, a novel robust frontal face recognition approach based on generative adversarial network (DSU-GAN) is proposed in this paper. To enhance the robustness of the generator in learning pose-variant face images, the deformable convolution is proposed in the generator–encoder. A mechanism of consistency loss is presented in deformable convolution to avoid additional computational overhead and the problem of overfitting. In addition, a self-attention mechanism is presented in generator–encoder to avoid information overloading, which is able to construct the long-term dependencies of any two positions of the feature map at pixel level. To balance the capability between the generator and discriminator, a novel discriminator architecture based U-Net is proposed. Finally, the single-way discriminator is improved through a new up-sampling module. Experiment results demonstrate that our proposal achieves an average Rank-1 recognition rate of 95.14% on the Multi-PIE face dataset in dealing with the multi-pose. In addition, it is proven that our proposal has achieved outstanding performance in recent benchmarks conducted on both IJB-A and IJB-C.
人脸识别技术被广泛应用于不同的领域,如门禁、支付、医疗、教育等。然而,人们很少关注非正面人脸识别,尤其是模型训练和生成图像的质量。为此,本文提出了一种基于生成式对抗网络(DSU-GAN)的新型鲁棒正面人脸识别方法。为了增强生成器在学习姿态变化的人脸图像时的鲁棒性,在生成器-编码器中提出了可变形卷积。在可变形卷积中提出了一致性损失机制,以避免额外的计算开销和过拟合问题。此外,在生成器-编码器中还提出了一种自我关注机制,以避免信息过载,该机制能够在像素级构建特征图中任意两个位置的长期依赖关系。为了平衡生成器和鉴别器之间的能力,提出了一种基于 U-Net 的新型鉴别器架构。最后,通过一个新的上采样模块改进了单向判别器。实验结果表明,我们的建议在处理多用途人脸时,在 Multi-PIE 人脸数据集上实现了 95.14% 的平均 Rank-1 识别率。此外,在最近进行的 IJB-A 和 IJB-C 基准测试中,我们的方案也取得了优异的成绩。
{"title":"DSU-GAN: A robust frontal face recognition approach based on generative adversarial network","authors":"Deyu Lin, Huanxin Wang, Xin Lei, Weidong Min, Chenguang Yao, Yuan Zhong, Yong Liang Guan","doi":"10.1016/j.cviu.2024.104128","DOIUrl":"https://doi.org/10.1016/j.cviu.2024.104128","url":null,"abstract":"Face recognition technology is widely used in different areas, such as entrance guard, payment . However, little attention has been given to non-positive faces recognition, especially model training and the quality of the generated images. To this end, a novel robust frontal face recognition approach based on generative adversarial network (DSU-GAN) is proposed in this paper. To enhance the robustness of the generator in learning pose-variant face images, the deformable convolution is proposed in the generator–encoder. A mechanism of consistency loss is presented in deformable convolution to avoid additional computational overhead and the problem of overfitting. In addition, a self-attention mechanism is presented in generator–encoder to avoid information overloading, which is able to construct the long-term dependencies of any two positions of the feature map at pixel level. To balance the capability between the generator and discriminator, a novel discriminator architecture based U-Net is proposed. Finally, the single-way discriminator is improved through a new up-sampling module. Experiment results demonstrate that our proposal achieves an average Rank-1 recognition rate of 95.14% on the Multi-PIE face dataset in dealing with the multi-pose. In addition, it is proven that our proposal has achieved outstanding performance in recent benchmarks conducted on both IJB-A and IJB-C.","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved high dynamic range imaging using multi-scale feature flows balanced between task-orientedness and accuracy 利用多尺度特征流改进高动态范围成像,兼顾任务导向性和准确性
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-24 DOI: 10.1016/j.cviu.2024.104126

Deep learning has made it possible to accurately generate high dynamic range (HDR) images from multiple images taken at different exposure settings, largely owing to advancements in neural network design. However, generating images without artifacts remains difficult, especially in scenes with moving objects. In such cases, issues like color distortion, geometric misalignment, or ghosting can appear. Current state-of-the-art network designs address this by estimating the optical flow between input images to align them better. The parameters for the flow estimation are learned through the primary goal, producing high-quality HDR images. However, we find that this ”task-oriented flow” approach has its drawbacks, especially in minimizing artifacts. To address this, we introduce a new network design and training method that improve the accuracy of flow estimation. This aims to strike a balance between task-oriented flow and accurate flow. Additionally, the network utilizes multi-scale features extracted from the input images for both flow estimation and HDR image reconstruction. Our experiments demonstrate that these two innovations result in HDR images with fewer artifacts and enhanced quality.

主要由于神经网络设计的进步,深度学习已经能够从不同曝光设置下拍摄的多幅图像中准确生成高动态范围(HDR)图像。然而,生成没有伪影的图像仍然很困难,尤其是在有移动物体的场景中。在这种情况下,可能会出现色彩失真、几何错位或重影等问题。目前最先进的网络设计通过估计输入图像之间的光流来解决这一问题,从而更好地对齐图像。光流估计的参数是通过生成高质量 HDR 图像这一主要目标来学习的。然而,我们发现这种 "以任务为导向的光流 "方法有其缺点,尤其是在最小化伪影方面。为了解决这个问题,我们引入了一种新的网络设计和训练方法,以提高流量估计的准确性。这样做的目的是在面向任务的流量和精确流量之间取得平衡。此外,该网络还利用从输入图像中提取的多尺度特征进行流量估计和 HDR 图像重建。我们的实验证明,这两项创新所生成的 HDR 图像具有更少的伪影和更高的质量。
{"title":"Improved high dynamic range imaging using multi-scale feature flows balanced between task-orientedness and accuracy","authors":"","doi":"10.1016/j.cviu.2024.104126","DOIUrl":"10.1016/j.cviu.2024.104126","url":null,"abstract":"<div><p>Deep learning has made it possible to accurately generate high dynamic range (HDR) images from multiple images taken at different exposure settings, largely owing to advancements in neural network design. However, generating images without artifacts remains difficult, especially in scenes with moving objects. In such cases, issues like color distortion, geometric misalignment, or ghosting can appear. Current state-of-the-art network designs address this by estimating the optical flow between input images to align them better. The parameters for the flow estimation are learned through the primary goal, producing high-quality HDR images. However, we find that this ”task-oriented flow” approach has its drawbacks, especially in minimizing artifacts. To address this, we introduce a new network design and training method that improve the accuracy of flow estimation. This aims to strike a balance between task-oriented flow and accurate flow. Additionally, the network utilizes multi-scale features extracted from the input images for both flow estimation and HDR image reconstruction. Our experiments demonstrate that these two innovations result in HDR images with fewer artifacts and enhanced quality.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002078/pdfft?md5=35e8f40c73b01f0b9afae0db47e39486&pid=1-s2.0-S1077314224002078-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142148049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The shading isophotes: Model and methods for Lambertian planes and a point light 阴影等透镜:朗伯平面和点光源的模型和方法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-24 DOI: 10.1016/j.cviu.2024.104135

Structure-from-Motion (SfM) and Shape-from-Shading (SfS) are complementary classical approaches to 3D vision. Broadly speaking, SfM exploits geometric primitives from textured surfaces and SfS exploits pixel intensity from the shading image. We propose an approach that exploits virtual geometric primitives extracted from the shading image, namely the level-sets, which we name shading isophotes. Our approach thus combines the strength of geometric reasoning with the rich shading information. We focus on the case of untextured Lambertian planes of unknown albedo lit by an unknown Point Light Source (PLS) of unknown intensity. We derive a comprehensive geometric model showing that the unknown scene parameters are in general all recoverable from a single image of at least two planes. We propose computational methods to detect the isophotes, to reconstruct the scene parameters in closed-form and to refine the results densely using pixel intensity. Our methods thus estimate light source, plane pose and camera pose parameters for untextured planes, which cannot be achieved by the existing approaches. We evaluate our model and methods on synthetic and real images.

从运动看结构(SfM)和从阴影看形状(SfS)是互补的三维视觉经典方法。从广义上讲,SfM 利用纹理表面的几何基元,而 SfS 则利用阴影图像的像素强度。我们提出的方法是利用从阴影图像中提取的虚拟几何基元,即水平集,我们将其命名为阴影等值线。因此,我们的方法结合了几何推理的优势和丰富的阴影信息。我们将重点放在未知反照率的无纹理朗伯平面上,该平面由未知强度的未知点光源 (PLS) 照亮。我们推导出一个全面的几何模型,表明未知场景参数一般都可以从至少两个平面的单一图像中恢复。我们提出了检测等光点的计算方法,以闭合形式重建场景参数,并利用像素强度对结果进行密集细化。因此,我们的方法可以估算出无纹理平面的光源、平面姿态和相机姿态参数,而现有方法无法实现这一点。我们在合成图像和真实图像上评估了我们的模型和方法。
{"title":"The shading isophotes: Model and methods for Lambertian planes and a point light","authors":"","doi":"10.1016/j.cviu.2024.104135","DOIUrl":"10.1016/j.cviu.2024.104135","url":null,"abstract":"<div><p>Structure-from-Motion (SfM) and Shape-from-Shading (SfS) are complementary classical approaches to 3D vision. Broadly speaking, SfM exploits geometric primitives from textured surfaces and SfS exploits pixel intensity from the shading image. We propose an approach that exploits virtual geometric primitives extracted from the shading image, namely the level-sets, which we name shading isophotes. Our approach thus combines the strength of geometric reasoning with the rich shading information. We focus on the case of untextured Lambertian planes of unknown albedo lit by an unknown Point Light Source (PLS) of unknown intensity. We derive a comprehensive geometric model showing that the unknown scene parameters are in general all recoverable from a single image of at least two planes. We propose computational methods to detect the isophotes, to reconstruct the scene parameters in closed-form and to refine the results densely using pixel intensity. Our methods thus estimate light source, plane pose and camera pose parameters for untextured planes, which cannot be achieved by the existing approaches. We evaluate our model and methods on synthetic and real images.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DDGPnP: Differential degree graph based PnP solution to handle outliers DDGPnP:处理异常值的基于差分度图的 PnP 解决方案
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-23 DOI: 10.1016/j.cviu.2024.104130

Existing external relationships for outlier removal in the perspective-n-point problem are generally spatial coherence among the neighbor correspondences. In the situation of high noise or spatially incoherent distributions, pose estimation is relatively inaccurate due to a small number of detected inliers. To address these problems, this paper explores the globally coherent external relationships for outlier removal and pose estimation. To this end, the differential degree graph (DDG) is proposed to employ the intersection angles between rays of correspondences to handle outliers. Firstly, a pair of two degree graphs are constructed to establish the external relationships between 3D-2D correspondences in the world and camera coordinates. Secondly, the DDG is estimated through subtracting the two degree graphs and operating binary operation with a degree threshold. Besides, this paper mathematically proves that the maximum clique of the DDG represents the inliers. Thirdly, a novel vertice degree based method is put forward to extract the maximum clique from DDG for outlier removal. Besides, this paper proposes a novel pipeline of DDG based PnP solution, i.e. DDGPnP, to achieve accurate pose estimation. Experiments demonstrate the superiority and effectiveness of the proposed method in the aspects of outlier removal and pose estimation by comparison with the state of the arts. Especially for the high noise situation, the DDGPnP method can achieve not only accurate pose but also a large number of correct correspondences.

在透视点问题中,用于去除异常点的现有外部关系通常是相邻对应关系之间的空间一致性。在高噪声或空间分布不一致的情况下,由于检测到的异常值数量较少,姿态估计相对不准确。为解决这些问题,本文探讨了用于离群点去除和姿态估计的全局一致性外部关系。为此,本文提出了差分度图(DDG),利用对应射线之间的交角来处理异常值。首先,构建一对两度图,以建立世界和摄像机坐标中 3D-2D 对应点之间的外部关系。其次,通过对两个度数图进行减法运算,并利用度数阈值进行二进制运算,从而估算出 DDG。此外,本文还从数学角度证明了 DDG 的最大簇代表离群值。第三,本文提出了一种新颖的基于顶点度的方法,从 DDG 中提取最大克团以去除离群值。此外,本文还提出了一种基于 DDG 的 PnP 解决方案,即 DDGPnP,以实现精确的姿态估计。实验证明,与现有技术相比,本文提出的方法在离群点去除和姿态估计方面具有优越性和有效性。特别是在高噪声情况下,DDGPnP 方法不仅能获得准确的姿态,还能获得大量正确的对应关系。
{"title":"DDGPnP: Differential degree graph based PnP solution to handle outliers","authors":"","doi":"10.1016/j.cviu.2024.104130","DOIUrl":"10.1016/j.cviu.2024.104130","url":null,"abstract":"<div><p>Existing external relationships for outlier removal in the perspective-n-point problem are generally spatial coherence among the neighbor correspondences. In the situation of high noise or spatially incoherent distributions, pose estimation is relatively inaccurate due to a small number of detected inliers. To address these problems, this paper explores the globally coherent external relationships for outlier removal and pose estimation. To this end, the differential degree graph (DDG) is proposed to employ the intersection angles between rays of correspondences to handle outliers. Firstly, a pair of two degree graphs are constructed to establish the external relationships between 3D-2D correspondences in the world and camera coordinates. Secondly, the DDG is estimated through subtracting the two degree graphs and operating binary operation with a degree threshold. Besides, this paper mathematically proves that the maximum clique of the DDG represents the inliers. Thirdly, a novel vertice degree based method is put forward to extract the maximum clique from DDG for outlier removal. Besides, this paper proposes a novel pipeline of DDG based PnP solution, i.e. DDGPnP, to achieve accurate pose estimation. Experiments demonstrate the superiority and effectiveness of the proposed method in the aspects of outlier removal and pose estimation by comparison with the state of the arts. Especially for the high noise situation, the DDGPnP method can achieve not only accurate pose but also a large number of correct correspondences.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual cross perception network with texture and boundary guidance for camouflaged object detection 具有纹理和边界引导功能的双交叉感知网络,用于伪装物体检测
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-22 DOI: 10.1016/j.cviu.2024.104131

Camouflaged object detection (COD) is a task needs to segment objects that subtly blend into their surroundings effectively. Edge and texture information of the objects can be utilized to reveal the edges of camouflaged objects and detect texture differences between camouflaged objects and the surrounding environment. However, existing methods often fail to fully exploit the advantages of these two types of information. Considering this, our paper proposes an innovative Dual Cross Perception Network (DCPNet) with texture and boundary guidance for camouflaged object detection. DCPNet consists of two essential modules, namely Dual Cross Fusion Module (DCFM) and the Subgroup Aggregation Module (SAM). DCFM utilizes attention techniques to emphasize the information that exists in edges and textures by cross-fusing features of the edge, texture, and basic RGB image, which strengthens the ability to capture edge information and texture details in image analysis. SAM gives varied weights to low-level and high-level features in order to enhance the comprehension of objects and scenes of various sizes. Several experiments have demonstrated that DCPNet outperforms 13 state-of-the-art methods on four widely used assessment metrics.

伪装物体检测(COD)是一项需要对巧妙融入周围环境的物体进行有效分割的任务。物体的边缘和纹理信息可用于揭示伪装物体的边缘,并检测伪装物体与周围环境的纹理差异。然而,现有的方法往往不能充分发挥这两类信息的优势。有鉴于此,我们的论文提出了一种创新的双交叉感知网络(DCPNet),它具有纹理和边界引导功能,可用于伪装物体检测。DCPNet 由两个基本模块组成,即双交叉融合模块(DCFM)和子群聚合模块(SAM)。DCFM 利用注意力技术,通过交叉融合边缘、纹理和基本 RGB 图像的特征,强调存在于边缘和纹理中的信息,从而增强了图像分析中捕捉边缘信息和纹理细节的能力。SAM 对低级和高级特征赋予不同的权重,以增强对不同大小物体和场景的理解。多项实验证明,DCPNet 在四个广泛使用的评估指标上优于 13 种最先进的方法。
{"title":"Dual cross perception network with texture and boundary guidance for camouflaged object detection","authors":"","doi":"10.1016/j.cviu.2024.104131","DOIUrl":"10.1016/j.cviu.2024.104131","url":null,"abstract":"<div><p>Camouflaged object detection (COD) is a task needs to segment objects that subtly blend into their surroundings effectively. Edge and texture information of the objects can be utilized to reveal the edges of camouflaged objects and detect texture differences between camouflaged objects and the surrounding environment. However, existing methods often fail to fully exploit the advantages of these two types of information. Considering this, our paper proposes an innovative Dual Cross Perception Network (DCPNet) with texture and boundary guidance for camouflaged object detection. DCPNet consists of two essential modules, namely Dual Cross Fusion Module (DCFM) and the Subgroup Aggregation Module (SAM). DCFM utilizes attention techniques to emphasize the information that exists in edges and textures by cross-fusing features of the edge, texture, and basic RGB image, which strengthens the ability to capture edge information and texture details in image analysis. SAM gives varied weights to low-level and high-level features in order to enhance the comprehension of objects and scenes of various sizes. Several experiments have demonstrated that DCPNet outperforms 13 state-of-the-art methods on four widely used assessment metrics.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1