首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
MPEG Immersive Video tools for Light Field Head Mounted Displays MPEG沉浸式视频工具的光场头戴式显示器
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675317
Daniele Bonatto, Grégoire Hirt, Alexander Kvasov, Sarah Fachada, G. Lafruit
Light field displays project hundreds of micro-parallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth Image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.
光场显示器投射出数百个微视差视图,供用户在不戴眼镜的情况下感知3D。如果要传输所有的视图,即使使用传统的每个视图的视频压缩,也会导致巨大的带宽需求。MPEG沉浸式视频(MIV)采用一种更智能的策略,只传输关键图像和一些元数据来合成所有缺失的视图。我们开发了(并将演示)一个基于深度图像的实时渲染软件,该软件遵循这种方法,从几个RGBD输入视图中合成所有光场微视差视图。
{"title":"MPEG Immersive Video tools for Light Field Head Mounted Displays","authors":"Daniele Bonatto, Grégoire Hirt, Alexander Kvasov, Sarah Fachada, G. Lafruit","doi":"10.1109/VCIP53242.2021.9675317","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675317","url":null,"abstract":"Light field displays project hundreds of micro-parallax views for users to perceive 3D without wearing glasses. It results in gigantic bandwidth requirements if all views would be transmitted, even using conventional video compression per view. MPEG Immersive Video (MIV) follows a smarter strategy by transmitting only key images and some metadata to synthesize all the missing views. We developed (and will demonstrate) a real-time Depth Image Based Rendering software that follows this approach for synthesizing all light field micro-parallax views from a couple of RGBD input views.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning in Compressed Domain for Faster Machine Vision Tasks 基于压缩域的快速机器视觉学习
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675369
Jinming Liu, Heming Sun, J. Katto
Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACs) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.
学习图像压缩(LIC)在重建质量驱动的任务(如PSNR, MS-SSIM)和机器视觉任务(如图像理解)中表现出良好的能力。然而,大多数LIC框架都是基于像素域的,这需要解码过程。在本文中,我们开发了一个用于机器视觉任务的学习压缩域框架。1)通过将压缩后的隐表示直接发送到任务网络,可以消除解码计算,降低复杂度。2)通过对潜在信道进行熵排序,只将有选择的信道传输到任务网络,从而降低比特率。结果表明,与传统的像素域方法相比,在保持相同精度的情况下,我们可以减少约1/3的乘法加运算(mac)和1/5的推理时间。此外,所提出的信道选择最多可以节省6.8%的比特率。
{"title":"Learning in Compressed Domain for Faster Machine Vision Tasks","authors":"Jinming Liu, Heming Sun, J. Katto","doi":"10.1109/VCIP53242.2021.9675369","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675369","url":null,"abstract":"Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACs) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115869058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Evaluation Of Bitrate Ladders For Versatile Video Coder 多用途视频编码器的位率阶梯评价
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675425
Reda Kaafarani, Médéric Blestel, Thomas Maugey, M. Ropert, A. Roumy
Many video service providers take advantage of bitrate ladders in adaptive HTTP video streaming to account for different network states and user display specifications by providing bitrate/resolution pairs that best fit client's network conditions and display capabilities. These bitrate ladders, however, differ when using different codecs and thus the couples bitrate/resolution differ as well. In addition, bitrate ladders are based on previously available codecs (H.264/MPEG4-AVC, HEVC, etc.), i.e. codecs that are already in service, hence the introduction of new codecs e.g. Versatile Video Coding (VVC) requires re-analyzing these ladders. For that matter, we will analyze the evolution of the bitrate ladder when using VVC. We show how VVC impacts this ladder when compared to HEVC and H.264/AVC and in particular, that there is no need to switch to lower resolutions at the lower bitrates defined in the Call for Evidence on Transcoding for Network Distributed Video Coding (CfE).
许多视频服务提供商利用自适应HTTP视频流中的比特率阶梯,通过提供最适合客户端网络条件和显示能力的比特率/分辨率对来考虑不同的网络状态和用户显示规范。然而,当使用不同的编解码器时,这些比特率阶梯是不同的,因此对比特率/分辨率也不同。此外,比特率阶梯是基于先前可用的编解码器(H.264/MPEG4-AVC, HEVC等),即已经在使用的编解码器,因此引入新的编解码器,例如通用视频编码(VVC)需要重新分析这些阶梯。为此,我们将分析使用VVC时比特率阶梯的演变。我们展示了与HEVC和H.264/AVC相比,VVC是如何影响这个阶梯的,特别是,没有必要在网络分布式视频编码转码证据(CfE)中定义的较低比特率下切换到较低的分辨率。
{"title":"Evaluation Of Bitrate Ladders For Versatile Video Coder","authors":"Reda Kaafarani, Médéric Blestel, Thomas Maugey, M. Ropert, A. Roumy","doi":"10.1109/VCIP53242.2021.9675425","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675425","url":null,"abstract":"Many video service providers take advantage of bitrate ladders in adaptive HTTP video streaming to account for different network states and user display specifications by providing bitrate/resolution pairs that best fit client's network conditions and display capabilities. These bitrate ladders, however, differ when using different codecs and thus the couples bitrate/resolution differ as well. In addition, bitrate ladders are based on previously available codecs (H.264/MPEG4-AVC, HEVC, etc.), i.e. codecs that are already in service, hence the introduction of new codecs e.g. Versatile Video Coding (VVC) requires re-analyzing these ladders. For that matter, we will analyze the evolution of the bitrate ladder when using VVC. We show how VVC impacts this ladder when compared to HEVC and H.264/AVC and in particular, that there is no need to switch to lower resolutions at the lower bitrates defined in the Call for Evidence on Transcoding for Network Distributed Video Coding (CfE).","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128403688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-camera system for placing the viewer between the players of a live sports match: Blind Review 多摄像机系统,放置观众之间的球员之间的实况体育比赛:盲目审查
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675336
We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can reduce 2K resolution views from 16 cameras to patches in a single 4K resolution texture atlas. We have created a real time, WebGL 2 based, playback application that renders an arbitrary view from the 4K atlas. The application allows a user to change viewpoint in real time. Additionally, to interpret the scene, a user can also remove objects such as a player or the ball. At the conference we will demonstrate both the automatic multi-camera conversion pipeline and the real-time rendering/object removal on a smartphone.
我们展示了一种新的捕获系统,该系统允许生成与运动场上球员之间放置的虚拟摄像机相对应的虚拟视图。我们的深度估计和分割管道可以将16台摄像机的2K分辨率视图减少到单个4K分辨率纹理图集中的补丁。我们已经创建了一个实时的、基于WebGL 2的回放应用程序,它可以从4K地图集中渲染任意视图。该应用程序允许用户实时更改视点。此外,为了解释场景,用户还可以移除球员或球等物体。在会议上,我们将展示自动多摄像头转换管道和智能手机上的实时渲染/对象移除。
{"title":"Multi-camera system for placing the viewer between the players of a live sports match: Blind Review","authors":"","doi":"10.1109/VCIP53242.2021.9675336","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675336","url":null,"abstract":"We demonstrate a new capture system that allows generation of virtual views corresponding with a virtual camera that is placed between the players on a sports field. Our depth estimation and segmentation pipeline can reduce 2K resolution views from 16 cameras to patches in a single 4K resolution texture atlas. We have created a real time, WebGL 2 based, playback application that renders an arbitrary view from the 4K atlas. The application allows a user to change viewpoint in real time. Additionally, to interpret the scene, a user can also remove objects such as a player or the ball. At the conference we will demonstrate both the automatic multi-camera conversion pipeline and the real-time rendering/object removal on a smartphone.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kalman filter-based prediction refinement and quality enhancement for geometry-based point cloud compression 基于卡尔曼滤波的几何点云压缩预测改进与质量增强
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675412
Lu Wang, Jianfeng Sun, Hui Yuan, R. Hamzaoui, Xiaohui Wang
A point cloud is a set of points representing a three-dimensional (3D) object or scene. To compress a point cloud, the Motion Picture Experts Group (MPEG) geometry-based point cloud compression (G-PCC) scheme may use three attribute coding methods: region adaptive hierarchical transform (RAHT), predicting transform (PT), and lifting transform (LT). To improve the coding efficiency of PT, we propose to use a Kalman filter to refine the predicted attribute values. We also apply a Kalman filter to improve the quality of the reconstructed attribute values at the decoder side. Experimental results show that the combination of the two proposed methods can achieve an average Bjøntegaard delta bitrate of −0.48%, −5.18%, and −6.27% for the Luma, Chroma Cb, and Chroma Cr components, respectively, compared with a recent G-PCC reference software.
点云是一组代表三维物体或场景的点。为了压缩点云,基于MPEG几何的点云压缩(G-PCC)方案可以使用三种属性编码方法:区域自适应分层变换(RAHT)、预测变换(PT)和提升变换(LT)。为了提高PT的编码效率,我们提出使用卡尔曼滤波对预测的属性值进行细化。我们还应用了卡尔曼滤波来提高解码侧重构属性值的质量。实验结果表明,与最新的G-PCC参考软件相比,两种方法的组合可以实现Luma、Chroma Cb和Chroma Cr分量的平均bj ~ ntegaard比特率分别为- 0.48%、- 5.18%和- 6.27%。
{"title":"Kalman filter-based prediction refinement and quality enhancement for geometry-based point cloud compression","authors":"Lu Wang, Jianfeng Sun, Hui Yuan, R. Hamzaoui, Xiaohui Wang","doi":"10.1109/VCIP53242.2021.9675412","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675412","url":null,"abstract":"A point cloud is a set of points representing a three-dimensional (3D) object or scene. To compress a point cloud, the Motion Picture Experts Group (MPEG) geometry-based point cloud compression (G-PCC) scheme may use three attribute coding methods: region adaptive hierarchical transform (RAHT), predicting transform (PT), and lifting transform (LT). To improve the coding efficiency of PT, we propose to use a Kalman filter to refine the predicted attribute values. We also apply a Kalman filter to improve the quality of the reconstructed attribute values at the decoder side. Experimental results show that the combination of the two proposed methods can achieve an average Bjøntegaard delta bitrate of −0.48%, −5.18%, and −6.27% for the Luma, Chroma Cb, and Chroma Cr components, respectively, compared with a recent G-PCC reference software.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Attention-guided Convolutional Neural Network for Lightweight JPEG Compression Artifacts Removal 轻量级JPEG压缩伪影去除的注意引导卷积神经网络
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675320
Gang Zhang, Haoquan Wang, Yedong Wang, Haijie Shen
JPEG compression artifacts seriously affect the viewing experience. While previous studies mainly focused on the deep convolutional networks for compression artifacts removal, of which the model size and inference speed limit their application prospects. In order to solve the above problems, this paper proposed two methods that can improve the training performance of the compact convolution network without slowing down its inference speed. Firstly, a fully explainable attention loss is designed to guide the network for training, which is calculated by local entropy to accurately locate compression artifacts. Secondly, Fully Expanded Block (FEB) is proposed to replace the convolutional layer in compact network, which can be contracted back to a normal convolutional layer after the training process is completed. Extensive experiments demonstrate that the proposed method outperforms the existing lightweight methods in terms of performance and inference speed.
JPEG压缩伪影严重影响观看体验。而以往的研究主要集中在深度卷积网络的压缩伪影去除上,其模型大小和推理速度限制了其应用前景。为了解决上述问题,本文提出了两种方法,可以在不降低紧凑卷积网络推理速度的前提下提高其训练性能。首先,设计一个完全可解释的注意力损失来指导网络进行训练,并通过局部熵来计算注意力损失,精确定位压缩伪信号;其次,提出了完全扩展块(Fully Expanded Block, FEB)来代替紧凑网络中的卷积层,在训练过程完成后,可以将其收缩回正常的卷积层。大量的实验表明,该方法在性能和推理速度方面都优于现有的轻量级方法。
{"title":"Attention-guided Convolutional Neural Network for Lightweight JPEG Compression Artifacts Removal","authors":"Gang Zhang, Haoquan Wang, Yedong Wang, Haijie Shen","doi":"10.1109/VCIP53242.2021.9675320","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675320","url":null,"abstract":"JPEG compression artifacts seriously affect the viewing experience. While previous studies mainly focused on the deep convolutional networks for compression artifacts removal, of which the model size and inference speed limit their application prospects. In order to solve the above problems, this paper proposed two methods that can improve the training performance of the compact convolution network without slowing down its inference speed. Firstly, a fully explainable attention loss is designed to guide the network for training, which is calculated by local entropy to accurately locate compression artifacts. Secondly, Fully Expanded Block (FEB) is proposed to replace the convolutional layer in compact network, which can be contracted back to a normal convolutional layer after the training process is completed. Extensive experiments demonstrate that the proposed method outperforms the existing lightweight methods in terms of performance and inference speed.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133344768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRC-Based Multi-Error Correction of H.265 Encoded Videos in Wireless Communications 无线通信中基于crc的H.265编码视频多纠错
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675400
Vivien Boussard, S. Coulombe, F. Coudoux, P. Corlay, Anthony Trioux
This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binary errors per packet. We first present a theoretical analysis of the CRC-EC candidate list while increasing the number of errors considered. We then analyze the candidate list reduction resulting from subsequent checksum validation and video decoding steps. Simulations conducted on two wireless networks show that the network considered has a huge impact on CRC-EC performance. Over a Bluetooth low energy (BLE) channel with Eb/No=8 dB, an average PSNR improvement of 4.4 dB on videos is achieved when CRC-EC corrects up to 5, rather than 3 errors per packet.
本文分析了在易出错的无线网络环境下,扩展基于crc的纠错(CRC-EC)以处理更多错误的好处。在文献中,CRC-EC已被用于纠正每个数据包多达3个二进制错误。我们首先对CRC-EC候选列表进行了理论分析,同时增加了考虑的错误数量。然后,我们分析了后续校验和验证和视频解码步骤产生的候选列表缩减。在两种无线网络上进行的仿真表明,所考虑的网络对CRC-EC性能有很大的影响。在Eb/No=8 dB的蓝牙低功耗(BLE)信道上,当CRC-EC纠正每个数据包多达5个错误而不是3个错误时,视频的平均PSNR提高了4.4 dB。
{"title":"CRC-Based Multi-Error Correction of H.265 Encoded Videos in Wireless Communications","authors":"Vivien Boussard, S. Coulombe, F. Coudoux, P. Corlay, Anthony Trioux","doi":"10.1109/VCIP53242.2021.9675400","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675400","url":null,"abstract":"This paper analyzes the benefits of extending CRC-based error correction (CRC-EC) to handle more errors in the context of error-prone wireless networks. In the literature, CRC-EC has been used to correct up to 3 binary errors per packet. We first present a theoretical analysis of the CRC-EC candidate list while increasing the number of errors considered. We then analyze the candidate list reduction resulting from subsequent checksum validation and video decoding steps. Simulations conducted on two wireless networks show that the network considered has a huge impact on CRC-EC performance. Over a Bluetooth low energy (BLE) channel with Eb/No=8 dB, an average PSNR improvement of 4.4 dB on videos is achieved when CRC-EC corrects up to 5, rather than 3 errors per packet.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132757244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-Block Difference Guided Fast CU Partition for VVC Intra Coding 跨块差分引导的VVC内编码快速CU划分
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675409
Hewei Liu, Shuyuan Zhu, Ruiqin Xiong, Guanghui Liu, B. Zeng
In this paper, we propose a new fast CU partition method for VVC intra coding based on the cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and is employed to guide the skipping of unnecessary horizontal and vertical partition modes. With this guidance, a fast determination of block partitions is accordingly achieved. Compared with VVC, our proposed method can save 41.64% (on average) encoding time with only 0.97% (on average) increase of BD-rate.
本文提出了一种基于跨块差分的VVC编码快速分块方法。这种差异是通过从分区中获得的梯度和子块的含量来测量的,并用于指导跳过不必要的水平和垂直分区模式。有了这个指导,就可以快速确定块分区。与VVC相比,该方法平均节省41.64%的编码时间,而bd率仅提高0.97%。
{"title":"Cross-Block Difference Guided Fast CU Partition for VVC Intra Coding","authors":"Hewei Liu, Shuyuan Zhu, Ruiqin Xiong, Guanghui Liu, B. Zeng","doi":"10.1109/VCIP53242.2021.9675409","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675409","url":null,"abstract":"In this paper, we propose a new fast CU partition method for VVC intra coding based on the cross-block difference. This difference is measured by the gradient and the content of sub-blocks obtained from partition and is employed to guide the skipping of unnecessary horizontal and vertical partition modes. With this guidance, a fast determination of block partitions is accordingly achieved. Compared with VVC, our proposed method can save 41.64% (on average) encoding time with only 0.97% (on average) increase of BD-rate.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129510306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Action Recognition Improved by Correlations and Attention of Subjects and Scene 基于主体和场景相关性和注意力的动作识别
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675340
Manh-Hung Ha, O. Chen
Comprehensive activity understanding of multiple subjects in a video requires subject detection, action identification, and behavior interpretation as well as the interactions among subjects and background. This work develops the action recognition of subject(s) based on the correlations and interactions of the whole scene and subject(s) by using the Deep Neural Network (DNN). The proposed DNN consists of 3D Convolutional Neural Network (CNN), Spatial Attention (SA) generation layer, mapping convolutional fused-depth layer, Transformer Encoder (TE), and two fully connected layers with late fusion for final classification. Especially, the attention mechanisms in SA and TE are implemented to find out meaningful action information on spatial and temporal domains for enhancing recognition performance, respectively. The experimental results reveal that the proposed DNN shows the superior accuracies of 97.8%, 98.4% and 85.6% in the datasets of traffic police, UCF101-24 and JHMDB-21, respectively. Therefore, our DNN is an outstanding classifier for various action recognitions involving one or multiple subjects.
对视频中多个主体的综合活动理解,需要主体检测、动作识别、行为解读以及主体与背景的相互作用。本研究利用深度神经网络(Deep Neural Network, DNN),基于整个场景和主体之间的相关性和相互作用,发展了主体的动作识别。提出的深度神经网络由三维卷积神经网络(CNN)、空间注意(SA)生成层、映射卷积融合深度层、变压器编码器(TE)和两个完全连接的后期融合层组成,用于最终分类。特别地,我们在情景识别和情景识别中分别利用注意机制在空间和时间域中发现有意义的动作信息,从而提高识别性能。实验结果表明,本文提出的深度神经网络在交通警察、UCF101-24和JHMDB-21数据集上的准确率分别达到了97.8%、98.4%和85.6%。因此,对于涉及一个或多个主题的各种动作识别,我们的DNN是一个出色的分类器。
{"title":"Action Recognition Improved by Correlations and Attention of Subjects and Scene","authors":"Manh-Hung Ha, O. Chen","doi":"10.1109/VCIP53242.2021.9675340","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675340","url":null,"abstract":"Comprehensive activity understanding of multiple subjects in a video requires subject detection, action identification, and behavior interpretation as well as the interactions among subjects and background. This work develops the action recognition of subject(s) based on the correlations and interactions of the whole scene and subject(s) by using the Deep Neural Network (DNN). The proposed DNN consists of 3D Convolutional Neural Network (CNN), Spatial Attention (SA) generation layer, mapping convolutional fused-depth layer, Transformer Encoder (TE), and two fully connected layers with late fusion for final classification. Especially, the attention mechanisms in SA and TE are implemented to find out meaningful action information on spatial and temporal domains for enhancing recognition performance, respectively. The experimental results reveal that the proposed DNN shows the superior accuracies of 97.8%, 98.4% and 85.6% in the datasets of traffic police, UCF101-24 and JHMDB-21, respectively. Therefore, our DNN is an outstanding classifier for various action recognitions involving one or multiple subjects.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131363603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Nearly Reversible Image-to-Image Translation Using Joint Inter-Frame Coding and Embedding 基于联合帧间编码和嵌入的近可逆图像到图像的转换
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675370
Xinzhu Cao, Yuanzhi Yao, Nenghai Yu
Image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.
生成对抗网络(GAN)的目的是将图像从源域映射到目标域,图像到图像的翻译任务已被广泛研究。平移后的图像可以反向映射到重构后的源图像。然而,现有的基于gan的方案缺乏实现可逆转换的能力。为了弥补这一缺点,本文提出了一种几乎可逆的图像到图像转换方案,其中重构的源图像与相应的源图像相比近似无失真。该方案综合考虑帧间编码和嵌入。首先,我们将gan生成的重构源图像和源图像组织成一个伪视频。将帧间编码得到的比特流可逆地嵌入到转换后的图像中,实现近无损的源图像重建。大量的实验结果和分析表明,该方案在图像质量和安全性方面达到了较高的性能水平。
{"title":"Nearly Reversible Image-to-Image Translation Using Joint Inter-Frame Coding and Embedding","authors":"Xinzhu Cao, Yuanzhi Yao, Nenghai Yu","doi":"10.1109/VCIP53242.2021.9675370","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675370","url":null,"abstract":"Image-to-image translation tasks which have been widely investigated with generative adversarial networks (GAN) aim to map an image from the source domain to the target domain. The translated image can be inversely mapped to the reconstructed source image. However, existing GAN-based schemes lack the ability to accomplish reversible translation. To remedy this drawback, a nearly reversible image-to-image translation scheme where the reconstructed source image is approximately distortion-free compared with the corresponding source image is proposed in this paper. The proposed scheme jointly considers inter-frame coding and embedding. Firstly, we organize the GAN-generated reconstructed source image and the source image into a pseudo video. Furthermore, the bitstream obtained by inter-frame coding is reversibly embedded in the translated image for nearly lossless source image reconstruction. Extensive experimental results and analysis demonstrate that the proposed scheme can achieve a high level of performance in image quality and security.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124234169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1