首页 > 最新文献

2021 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Decoder Derived Cross-Component Linear Model Intra-Prediction for Video Coding 基于解码器的视频编码交叉分量线性模型内预测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506173
Z. Deng, Kai Zhang, Li Zhang
This paper presents a decoder derived cross-component linear model (DD-CCLM) intra-prediction method, in which one or more linear models can be used to exploit the similarities between luma and chroma sample values, and the number of linear models used for a specific coding unit is adaptively determined at both encoder and decoder sides in a consistent way, without signalling a syntax element. The neighbouring samples are classified into two or three groups based on a K-means algorithm. Moreover, DDCCLM can be combined with normal intra-prediction modes such as DM mode. The proposed method can be well incorporated with the state-of-the-art CCLM intra-prediction in the Versatile Video Coding standard. Experimental results show that the proposed method provides an overall average bitrate saving of 0.52% for All Intra configurations under the JVET common test conditions, with negligible runtime change. On sequences with rich chroma information, the coding gain is up to 2.07%.
本文提出了一种解码器衍生的交叉分量线性模型(DD-CCLM)内预测方法,该方法可以使用一个或多个线性模型来利用亮度和色度样本值之间的相似性,并且用于特定编码单元的线性模型的数量在编码器和解码器双方以一致的方式自适应确定,而不需要标记语法元素。根据K-means算法将相邻样本分为两组或三组。此外,DDCCLM还可以与DM模式等正常的内部预测模式相结合。该方法可以很好地与通用视频编码标准中最先进的CCLM内预测相结合。实验结果表明,在JVET通用测试条件下,该方法在所有Intra配置下的总体平均比特率节省为0.52%,运行时变化可以忽略不计。对于色度信息丰富的序列,编码增益可达2.07%。
{"title":"Decoder Derived Cross-Component Linear Model Intra-Prediction for Video Coding","authors":"Z. Deng, Kai Zhang, Li Zhang","doi":"10.1109/ICIP42928.2021.9506173","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506173","url":null,"abstract":"This paper presents a decoder derived cross-component linear model (DD-CCLM) intra-prediction method, in which one or more linear models can be used to exploit the similarities between luma and chroma sample values, and the number of linear models used for a specific coding unit is adaptively determined at both encoder and decoder sides in a consistent way, without signalling a syntax element. The neighbouring samples are classified into two or three groups based on a K-means algorithm. Moreover, DDCCLM can be combined with normal intra-prediction modes such as DM mode. The proposed method can be well incorporated with the state-of-the-art CCLM intra-prediction in the Versatile Video Coding standard. Experimental results show that the proposed method provides an overall average bitrate saving of 0.52% for All Intra configurations under the JVET common test conditions, with negligible runtime change. On sequences with rich chroma information, the coding gain is up to 2.07%.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115241604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
WarpingFusion: Accurate Multi-View TSDF Fusion with Local Perspective Warp WarpingFusion:精确的多视图TSDF融合与本地视角翘曲
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506166
Jiwoo Kang, Seongmin Lee, Mingyu Jang, H. Yoon, Sanghoon Lee
In this paper, we propose the novel 3D reconstruction framework, where the surface of a target object is reconstructed accurately and robustly from multi-view depth maps. A depth map of a moving object tends to have the spatially-varying perspective warps due to motion blur and rolling shutter artifacts. Incorporating those misaligned points from the views into the world coordinate leads to significant artifacts in the reconstructed shape. We address the mismatches by the patch-based depth-to-surface alignment using implicit surface-based distance measurement. The patch-based minimization finds spatial warps on the depth map fast and accurately with the global transformation preserved. The proposed framework efficiently optimizes the local alignments against depth occlusions and local variants thanks to the point to surface distance based on an implicit representation. The proposed method shows significant improvements over the other reconstruction methods, demonstrating efficiency and benefits of our method in the multi-view reconstruction.
在本文中,我们提出了一种新的三维重建框架,该框架可以从多视图深度图中准确而稳健地重建目标物体的表面。移动对象的深度图往往由于运动模糊和滚动快门伪影而具有空间变化的透视扭曲。将这些不对齐的点从视图中合并到世界坐标中会导致重建形状中的重要伪影。我们使用隐式的基于表面的距离测量,通过基于补丁的深度到表面对齐来解决不匹配问题。基于patch的最小化方法在保留全局变换的情况下快速准确地找到深度图上的空间扭曲。该框架基于隐式表示的点到面距离,有效地优化了深度遮挡和局部变量的局部对齐。与其他重建方法相比,该方法有了显著的改进,证明了该方法在多视图重建中的效率和优势。
{"title":"WarpingFusion: Accurate Multi-View TSDF Fusion with Local Perspective Warp","authors":"Jiwoo Kang, Seongmin Lee, Mingyu Jang, H. Yoon, Sanghoon Lee","doi":"10.1109/ICIP42928.2021.9506166","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506166","url":null,"abstract":"In this paper, we propose the novel 3D reconstruction framework, where the surface of a target object is reconstructed accurately and robustly from multi-view depth maps. A depth map of a moving object tends to have the spatially-varying perspective warps due to motion blur and rolling shutter artifacts. Incorporating those misaligned points from the views into the world coordinate leads to significant artifacts in the reconstructed shape. We address the mismatches by the patch-based depth-to-surface alignment using implicit surface-based distance measurement. The patch-based minimization finds spatial warps on the depth map fast and accurately with the global transformation preserved. The proposed framework efficiently optimizes the local alignments against depth occlusions and local variants thanks to the point to surface distance based on an implicit representation. The proposed method shows significant improvements over the other reconstruction methods, demonstrating efficiency and benefits of our method in the multi-view reconstruction.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114679695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Plug-And-Play Image Reconstruction Meets Stochastic Variance-Reduced Gradient Methods 即插即用图像重建满足随机方差减少梯度方法
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506021
Vincent Monardo, A. Iyer, S. Donegan, M. Graef, Yuejie Chi
Plug-and-play (PnP) methods have recently emerged as a powerful framework for image reconstruction that can flexibly combine different physics-based observation models with data-driven image priors in the form of denoisers, and achieve state-of-the-art image reconstruction quality in many applications. In this paper, we aim to further improve the computational efficacy of PnP methods by designing a new algorithm that makes use of stochastic variance-reduced gradients (SVRG), a nascent idea to accelerate runtime in stochastic optimization. Compared with existing PnP methods using batch gradients or stochastic gradients, the new algorithm, called PnP-SVRG, achieves comparable or better accuracy of image reconstruction at a much faster computational speed. Extensive numerical experiments are provided to demonstrate the benefits of the proposed algorithm through the application of compressive imaging using partial Fourier measurements in conjunction with a wide variety of popular image denoisers.
即插即用(PnP)方法最近成为一种强大的图像重建框架,它可以灵活地将不同的基于物理的观测模型与数据驱动的图像先验以去噪的形式结合起来,并在许多应用中实现最先进的图像重建质量。为了进一步提高PnP方法的计算效率,我们设计了一种新的算法,该算法利用随机方差减少梯度(SVRG)来加速随机优化的运行时间。与现有的批处理梯度或随机梯度的PnP方法相比,新算法PnP- svrg以更快的计算速度实现了相当或更好的图像重建精度。广泛的数值实验提供了证明通过使用部分傅立叶测量结合各种流行的图像去噪压缩成像的应用所提出的算法的好处。
{"title":"Plug-And-Play Image Reconstruction Meets Stochastic Variance-Reduced Gradient Methods","authors":"Vincent Monardo, A. Iyer, S. Donegan, M. Graef, Yuejie Chi","doi":"10.1109/ICIP42928.2021.9506021","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506021","url":null,"abstract":"Plug-and-play (PnP) methods have recently emerged as a powerful framework for image reconstruction that can flexibly combine different physics-based observation models with data-driven image priors in the form of denoisers, and achieve state-of-the-art image reconstruction quality in many applications. In this paper, we aim to further improve the computational efficacy of PnP methods by designing a new algorithm that makes use of stochastic variance-reduced gradients (SVRG), a nascent idea to accelerate runtime in stochastic optimization. Compared with existing PnP methods using batch gradients or stochastic gradients, the new algorithm, called PnP-SVRG, achieves comparable or better accuracy of image reconstruction at a much faster computational speed. Extensive numerical experiments are provided to demonstrate the benefits of the proposed algorithm through the application of compressive imaging using partial Fourier measurements in conjunction with a wide variety of popular image denoisers.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116866553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Solving Fourier Phase Retrieval with a Reference Image as a Sequence of Linear Inverse Problems 以线性逆问题序列求解参考图像的傅里叶相位检索
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506095
M. Salman Asif
Fourier phase retrieval problem is equivalent to the recovery of a two-dimensional image from its autocorrelation measurements. This problem is generally nonlinear and nonconvex. Good initialization and prior information about the support or sparsity of the target image are often critical for a robust recovery. In this paper, we show that the presence of a known reference image can help us solve the nonlinear phase retrieval problem as a sequence of small linear inverse problems. Instead of recovering the entire image at once, our sequential method recovers a small number of rows or columns by solving a linear deconvolution problem at every step. Existing methods for the reference-based (holographic) phase retrieval either assume that the reference and target images are sufficiently separated so that the recovery problem is linear or recover the image via nonlinear optimization. In contrast, our proposed method does not require the separation condition. We performed an extensive set of simulations to demonstrate that our proposed method can successfully recover images from autocorrelation data under different settings of reference placement and noise.
傅里叶相位恢复问题相当于从二维图像的自相关测量中恢复图像。这个问题通常是非线性和非凸的。良好的初始化和关于目标图像的支持度或稀疏度的先验信息对于稳健恢复通常是至关重要的。在本文中,我们证明了已知参考图像的存在可以帮助我们将非线性相位恢复问题解决为一系列小线性逆问题。我们的顺序方法不是一次恢复整个图像,而是通过在每一步解决线性反卷积问题来恢复少量的行或列。现有的基于参考点(全息)的相位恢复方法要么假设参考点和目标点图像充分分离,从而恢复问题是线性的,要么通过非线性优化来恢复图像。相比之下,我们提出的方法不需要分离条件。我们进行了大量的模拟,以证明我们提出的方法可以在不同的参考位置和噪声设置下成功地从自相关数据中恢复图像。
{"title":"Solving Fourier Phase Retrieval with a Reference Image as a Sequence of Linear Inverse Problems","authors":"M. Salman Asif","doi":"10.1109/ICIP42928.2021.9506095","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506095","url":null,"abstract":"Fourier phase retrieval problem is equivalent to the recovery of a two-dimensional image from its autocorrelation measurements. This problem is generally nonlinear and nonconvex. Good initialization and prior information about the support or sparsity of the target image are often critical for a robust recovery. In this paper, we show that the presence of a known reference image can help us solve the nonlinear phase retrieval problem as a sequence of small linear inverse problems. Instead of recovering the entire image at once, our sequential method recovers a small number of rows or columns by solving a linear deconvolution problem at every step. Existing methods for the reference-based (holographic) phase retrieval either assume that the reference and target images are sufficiently separated so that the recovery problem is linear or recover the image via nonlinear optimization. In contrast, our proposed method does not require the separation condition. We performed an extensive set of simulations to demonstrate that our proposed method can successfully recover images from autocorrelation data under different settings of reference placement and noise.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"16 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120903571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Temporal Statistics Model For UGC Video Quality Prediction UGC视频质量预测的时间统计模型
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506669
Zhengzhong Tu, Chia-Ju Chen, Yilin Wang, N. Birkbeck, Balu Adsumilli, A. Bovik
Blind video quality assessment of user-generated content (UGC) has become a trending and challenging problem. Previous studies have shown the efficacy of natural scene statistics for capturing spatial distortions. The exploration of temporal video statistics on UGC, however, is relatively limited. Here we propose the first general, effective and efficient temporal statistics model accounting for temporal- or motion-related distortions for UGC video quality assessment, by analyzing regularities in the temporal bandpass domain. The proposed temporal model can serve as a plug-in module to boost existing no-reference video quality predictors that lack motion-relevant features. Our experimental results on recent large-scale UGC video databases show that the proposed model can significantly improve the performances of existing methods, at a very reasonable computational expense.
用户生成内容(UGC)视频质量的盲目评估已经成为一个趋势和挑战问题。以前的研究已经证明了自然场景统计在捕捉空间扭曲方面的有效性。然而,对UGC的时间视频统计的探索相对有限。本文通过分析时间带通域的规律,提出了第一个通用的、有效的、高效的用于UGC视频质量评估的时间或运动相关失真的时间统计模型。提出的时间模型可以作为一个插件模块来增强现有的缺乏运动相关特征的无参考视频质量预测器。我们在最近的大规模UGC视频数据库上的实验结果表明,所提出的模型可以在非常合理的计算成本下显著提高现有方法的性能。
{"title":"A Temporal Statistics Model For UGC Video Quality Prediction","authors":"Zhengzhong Tu, Chia-Ju Chen, Yilin Wang, N. Birkbeck, Balu Adsumilli, A. Bovik","doi":"10.1109/ICIP42928.2021.9506669","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506669","url":null,"abstract":"Blind video quality assessment of user-generated content (UGC) has become a trending and challenging problem. Previous studies have shown the efficacy of natural scene statistics for capturing spatial distortions. The exploration of temporal video statistics on UGC, however, is relatively limited. Here we propose the first general, effective and efficient temporal statistics model accounting for temporal- or motion-related distortions for UGC video quality assessment, by analyzing regularities in the temporal bandpass domain. The proposed temporal model can serve as a plug-in module to boost existing no-reference video quality predictors that lack motion-relevant features. Our experimental results on recent large-scale UGC video databases show that the proposed model can significantly improve the performances of existing methods, at a very reasonable computational expense.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127077205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Violence Detection from Video under 2D Spatio-Temporal Representations 基于二维时空表征的视频暴力检测
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506142
Mohamed Chelali, Camille Kurtz, N. Vincent
Action recognition in videos, especially for violence detection, is now a hot topic in computer vision. The interest of this task is related to the multiplication of videos from surveillance cameras or live television content producing complex $2D+t$ data. State-of-the-art methods rely on end-to-end learning from 3D neural network approaches that should be trained with a large amount of data to obtain discriminating features. To face these limitations, we present in this article a method to classify videos for violence recognition purpose, by using a classical 2D convolutional neural network (CNN). The strategy of the method is two-fold: (1) we start by building several 2D spatio-temporal representations from an input video, (2) the new representations are considered to feed the CNN to the train/test process. The classification decision of the video is carried out by aggregating the individual decisions from its different 2D spatio-temporal representations. An experimental study on public datasets containing violent videos highlights the interest of the presented method.
视频中的动作识别,尤其是暴力检测,是当前计算机视觉领域的研究热点。这项任务的兴趣与来自监控摄像机或直播电视内容的视频相乘有关,这些视频产生复杂的2D+t数据。最先进的方法依赖于3D神经网络方法的端到端学习,这些方法需要经过大量数据的训练才能获得判别特征。为了面对这些限制,我们在本文中提出了一种方法,通过使用经典的二维卷积神经网络(CNN)来对视频进行暴力识别。该方法的策略是双重的:(1)我们首先从输入视频中构建几个二维时空表示,(2)新的表示被认为将CNN馈送到训练/测试过程。视频的分类决策是通过汇总来自不同二维时空表征的单个决策来完成的。对包含暴力视频的公共数据集的实验研究突出了所提出方法的兴趣。
{"title":"Violence Detection from Video under 2D Spatio-Temporal Representations","authors":"Mohamed Chelali, Camille Kurtz, N. Vincent","doi":"10.1109/ICIP42928.2021.9506142","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506142","url":null,"abstract":"Action recognition in videos, especially for violence detection, is now a hot topic in computer vision. The interest of this task is related to the multiplication of videos from surveillance cameras or live television content producing complex $2D+t$ data. State-of-the-art methods rely on end-to-end learning from 3D neural network approaches that should be trained with a large amount of data to obtain discriminating features. To face these limitations, we present in this article a method to classify videos for violence recognition purpose, by using a classical 2D convolutional neural network (CNN). The strategy of the method is two-fold: (1) we start by building several 2D spatio-temporal representations from an input video, (2) the new representations are considered to feed the CNN to the train/test process. The classification decision of the video is carried out by aggregating the individual decisions from its different 2D spatio-temporal representations. An experimental study on public datasets containing violent videos highlights the interest of the presented method.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126089843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
GIID-NET: Generalizable Image Inpainting Detection Network GIID-NET:通用图像修补检测网络
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506778
Haiwei Wu, Jiantao Zhou
Deep learning (DL) has demonstrated its powerful capabilities in the field of image inpainting, which could produce visually plausible results. Meanwhile, the malicious use of advanced image inpainting tools (e.g. removing key objects to report fake news) has led to increasing threats to the reliability of image data. To fight against the inpainting forgeries, in this work, we propose a novel end-to-end Generalizable Image Inpainting Detection Network (GIID-Net), to detect the inpainted regions at pixel accuracy. Extensive experimental results are presented to validate the superiority of the proposed GIID-Net, compared with the state-of-the-art competitors. Our results would suggest that common artifacts are shared across diverse image inpainting methods.
深度学习(DL)在图像绘制领域已经展示了其强大的能力,可以产生视觉上可信的结果。同时,恶意利用先进的图像绘制工具(如删除关键对象来报道假新闻)对图像数据的可靠性造成了越来越大的威胁。为了打击伪造图像,本文提出了一种新颖的端到端通用图像补漆检测网络(GIID-Net),以像素精度检测补漆区域。与最先进的竞争对手相比,提出了广泛的实验结果来验证所提出的GIID-Net的优越性。我们的研究结果表明,在不同的图像绘制方法中,共同的人工制品是共享的。
{"title":"GIID-NET: Generalizable Image Inpainting Detection Network","authors":"Haiwei Wu, Jiantao Zhou","doi":"10.1109/ICIP42928.2021.9506778","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506778","url":null,"abstract":"Deep learning (DL) has demonstrated its powerful capabilities in the field of image inpainting, which could produce visually plausible results. Meanwhile, the malicious use of advanced image inpainting tools (e.g. removing key objects to report fake news) has led to increasing threats to the reliability of image data. To fight against the inpainting forgeries, in this work, we propose a novel end-to-end Generalizable Image Inpainting Detection Network (GIID-Net), to detect the inpainted regions at pixel accuracy. Extensive experimental results are presented to validate the superiority of the proposed GIID-Net, compared with the state-of-the-art competitors. Our results would suggest that common artifacts are shared across diverse image inpainting methods.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125423805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Lightweight Connectivity In Graph Convolutional Networks For Skeleton-Based Recognition 基于骨架识别的图卷积网络中的轻量级连通性
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506774
H. Sahbi
Graph convolutional networks (GCNs) aim at extending deep learning to arbitrary irregular domains, namely graphs. Their success is highly dependent on how the topology of input graphs is defined and most of the existing GCN architectures rely on predefined or handcrafted graph structures. In this paper, we introduce a novel method that learns the topology (or connectivity) of input graphs as a part of GCN design. The main contribution of our method resides in building an orthogonal connectivity basis that optimally aggregates nodes, through their neighborhood, prior to achieve convolution. Our method also considers a stochasticity criterion which acts as a regularizer that makes the learned basis and the underlying GCNs lightweight while still being highly effective. Experiments conducted on the challenging task of skeleton-based hand-gesture recognition show the high effectiveness of the learned GCNs w.r.t. the related work.
图卷积网络(GCNs)旨在将深度学习扩展到任意不规则域,即图。它们的成功高度依赖于如何定义输入图的拓扑结构,大多数现有的GCN架构依赖于预定义的或手工制作的图结构。在本文中,我们介绍了一种学习输入图的拓扑(或连通性)的新方法,作为GCN设计的一部分。我们的方法的主要贡献在于建立一个正交连接基,通过它们的邻域,在实现卷积之前最优地聚集节点。我们的方法还考虑了一个随机准则,它作为一个正则化器,使学习到的基和底层的GCNs轻量级,同时仍然非常有效。在具有挑战性的基于骨骼的手势识别任务中进行的实验表明,学习到的GCNs与相关工作相结合,具有很高的有效性。
{"title":"Lightweight Connectivity In Graph Convolutional Networks For Skeleton-Based Recognition","authors":"H. Sahbi","doi":"10.1109/ICIP42928.2021.9506774","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506774","url":null,"abstract":"Graph convolutional networks (GCNs) aim at extending deep learning to arbitrary irregular domains, namely graphs. Their success is highly dependent on how the topology of input graphs is defined and most of the existing GCN architectures rely on predefined or handcrafted graph structures. In this paper, we introduce a novel method that learns the topology (or connectivity) of input graphs as a part of GCN design. The main contribution of our method resides in building an orthogonal connectivity basis that optimally aggregates nodes, through their neighborhood, prior to achieve convolution. Our method also considers a stochasticity criterion which acts as a regularizer that makes the learned basis and the underlying GCNs lightweight while still being highly effective. Experiments conducted on the challenging task of skeleton-based hand-gesture recognition show the high effectiveness of the learned GCNs w.r.t. the related work.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Fast Smart-Cropping Method and Dataset for Video Retargeting 视频重定向的快速智能裁剪方法和数据集
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506390
Konstantinos Apostolidis, V. Mezaris
In this paper a method that re-targets a video to a different aspect ratio using cropping is presented. We argue that cropping methods are more suitable for video aspect ratio transformation when the minimization of semantic distortions is a prerequisite. For our method, we utilize visual saliency to find the image regions of attention, and we employ a filtering-through-clustering technique to select the main region of focus. We additionally introduce the first publicly available benchmark dataset for video cropping, annotated by 6 human subjects. Experimental evaluation on the introduced dataset shows the competitiveness of our method.
本文提出了一种利用裁剪将视频重新定位到不同宽高比的方法。我们认为当语义失真最小化是前提条件时,裁剪方法更适合视频宽高比变换。在我们的方法中,我们利用视觉显著性来寻找图像的关注区域,并采用通过聚类过滤的技术来选择主要的焦点区域。我们还介绍了第一个公开可用的视频裁剪基准数据集,由6名人类受试者注释。对引入数据集的实验评估表明了我们的方法的竞争力。
{"title":"A Fast Smart-Cropping Method and Dataset for Video Retargeting","authors":"Konstantinos Apostolidis, V. Mezaris","doi":"10.1109/ICIP42928.2021.9506390","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506390","url":null,"abstract":"In this paper a method that re-targets a video to a different aspect ratio using cropping is presented. We argue that cropping methods are more suitable for video aspect ratio transformation when the minimization of semantic distortions is a prerequisite. For our method, we utilize visual saliency to find the image regions of attention, and we employ a filtering-through-clustering technique to select the main region of focus. We additionally introduce the first publicly available benchmark dataset for video cropping, annotated by 6 human subjects. Experimental evaluation on the introduced dataset shows the competitiveness of our method.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116436203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sphererpn: Learning Spheres For High-Quality Region Proposals On 3d Point Clouds Object Detection Sphererpn:在3d点云目标检测上学习高质量区域建议的球体
Pub Date : 2021-09-19 DOI: 10.1109/ICIP42928.2021.9506249
Thang Vu, Kookhoi Kim, Haeyong Kang, Xuan Thanh Nguyen, T. Luu, C. Yoo
A bounding box commonly serves as the proxy for 2D object detection. However, extending this practice to 3D detection raises sensitivity to localization error. This problem is acute on flat objects since small localization error may lead to low overlaps between the prediction and ground truth. To address this problem, this paper proposes Sphere Region Proposal Network (SphereRPN) which detects objects by learning spheres as opposed to bounding boxes. We demonstrate that spherical proposals are more robust to localization error compared to bounding boxes. The proposed SphereRPN is not only accurate but also fast. Experiment results on the standard ScanNet dataset show that the proposed SphereRPN outperforms the previous state-of-the-art methods by a large margin while being $2 times$ to $7 times$ faster. The code will be made publicly available.
边界框通常作为2D对象检测的代理。然而,将这种做法扩展到3D检测会增加对定位错误的敏感性。这个问题在平面物体上很严重,因为小的定位误差可能导致预测与地面真实值之间的低重叠。为了解决这个问题,本文提出了球体区域建议网络(SphereRPN),它通过学习球体而不是边界框来检测物体。我们证明了与边界框相比,球面建议对定位误差具有更强的鲁棒性。所提出的SphereRPN不仅准确,而且速度快。在标准ScanNet数据集上的实验结果表明,所提出的SphereRPN比以前的最先进的方法性能要好得多,同时速度快2到7倍。该准则将向公众开放。
{"title":"Sphererpn: Learning Spheres For High-Quality Region Proposals On 3d Point Clouds Object Detection","authors":"Thang Vu, Kookhoi Kim, Haeyong Kang, Xuan Thanh Nguyen, T. Luu, C. Yoo","doi":"10.1109/ICIP42928.2021.9506249","DOIUrl":"https://doi.org/10.1109/ICIP42928.2021.9506249","url":null,"abstract":"A bounding box commonly serves as the proxy for 2D object detection. However, extending this practice to 3D detection raises sensitivity to localization error. This problem is acute on flat objects since small localization error may lead to low overlaps between the prediction and ground truth. To address this problem, this paper proposes Sphere Region Proposal Network (SphereRPN) which detects objects by learning spheres as opposed to bounding boxes. We demonstrate that spherical proposals are more robust to localization error compared to bounding boxes. The proposed SphereRPN is not only accurate but also fast. Experiment results on the standard ScanNet dataset show that the proposed SphereRPN outperforms the previous state-of-the-art methods by a large margin while being $2 times$ to $7 times$ faster. The code will be made publicly available.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122296251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1