首页 > 最新文献

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning FaME-ML:使用机器学习的HTTP自适应流的快速多速率编码
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301850
Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari
HTTP Adaptive Streaming (HAS) is the most common approach for delivering video content over the Internet. The requirement to encode the same content at different quality levels (i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios (41% in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.
HTTP自适应流(HAS)是通过Internet传送视频内容的最常用方法。在HAS中以不同质量级别(即表示)对相同内容进行编码的需求对内容提供者来说是一个具有挑战性的问题。快速多速率编码方法试图通过重用先前编码表示中的信息来加速这一过程。在本文中,我们提出使用卷积神经网络(cnn)来加速多个表示的编码,并特别关注并行编码。在并行编码中,总时间复杂度被限制为并行编码的其中一个表示的最大时间复杂度。因此,不是降低所有表示的时间复杂度,而是降低最高的时间复杂度。实验结果表明,与HEVC参考软件相比,FaME-ML在并行编码场景下显著节省了时间复杂度(平均41%),比特率和质量下降略有增加。
{"title":"FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning","authors":"Ekrem Çetinkaya, Hadi Amirpour, C. Timmerer, M. Ghanbari","doi":"10.1109/VCIP49819.2020.9301850","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301850","url":null,"abstract":"HTTP Adaptive Streaming (HAS) is the most common approach for delivering video content over the Internet. The requirement to encode the same content at different quality levels (i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios (41% in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133000090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation 一种新的基于边界框的语义分割伪标注生成方法
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301833
Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen
This paper proposes a fusion-based method to generate pseudo-annotations from bounding boxes for semantic segmentation. The idea is to first generate diverse foreground masks by multiple bounding box segmentation methods, and then combine these masks to generate pseudo-annotations. Existing methods generate foreground masks from bounding boxes by classical segmentation methods driving by low-level features and own local information, which is hard to generate accurate and diverse results for the fusion. Different from the traditional methods, multiple class-agnostic models are modeled to learn the objectiveness cues by using existing labeled pixel-level annotations and then to fuse. Firstly, the classical Fully Convolutional Network (FCN) that densely predicts the pixels’ labels is used. Then, two new sparse prediction based class-agnostic models are proposed, which simplify the segmentation task as sparsely predicting the boundary points through predicting the distance from the bounding box border to the object boundary in Cartesian Coordinate System and the Polar Coordinate System, respectively. Finally, a voting-based strategy is proposed to combine these segmentation results to form better pseudo-annotations. We conduct experiments on PASCAL VOC 2012 dataset. The mIoU of the proposed method is 68.7%, which outperforms the state-of-the-art method by 1.9%.
提出了一种基于融合的边界框伪标注生成方法,用于语义分割。其思想是首先通过多种边界框分割方法生成不同的前景蒙版,然后将这些蒙版组合起来生成伪注释。现有方法是利用底层特征和自身局部信息驱动的经典分割方法从边界框中生成前景蒙版,难以生成准确多样的融合结果。与传统方法不同,该方法对多个类别不可知模型进行建模,利用已有的标记像素级注释学习客观性线索,然后进行融合。首先,使用经典的全卷积网络(Fully Convolutional Network, FCN)密集预测像素的标签。然后,提出了两种新的基于稀疏预测的类不可知模型,将分割任务简化为分别在直角坐标系和极坐标系下通过预测边界框边界到目标边界的距离来稀疏预测边界点。最后,提出了一种基于投票的策略,将这些分割结果结合起来,形成更好的伪标注。我们在PASCAL VOC 2012数据集上进行实验。该方法的mIoU为68.7%,比目前最先进的方法高出1.9%。
{"title":"A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation","authors":"Xiaolong Xu, Fanman Meng, Hongliang Li, Q. Wu, King Ngi Ngan, Shuai Chen","doi":"10.1109/VCIP49819.2020.9301833","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301833","url":null,"abstract":"This paper proposes a fusion-based method to generate pseudo-annotations from bounding boxes for semantic segmentation. The idea is to first generate diverse foreground masks by multiple bounding box segmentation methods, and then combine these masks to generate pseudo-annotations. Existing methods generate foreground masks from bounding boxes by classical segmentation methods driving by low-level features and own local information, which is hard to generate accurate and diverse results for the fusion. Different from the traditional methods, multiple class-agnostic models are modeled to learn the objectiveness cues by using existing labeled pixel-level annotations and then to fuse. Firstly, the classical Fully Convolutional Network (FCN) that densely predicts the pixels’ labels is used. Then, two new sparse prediction based class-agnostic models are proposed, which simplify the segmentation task as sparsely predicting the boundary points through predicting the distance from the bounding box border to the object boundary in Cartesian Coordinate System and the Polar Coordinate System, respectively. Finally, a voting-based strategy is proposed to combine these segmentation results to form better pseudo-annotations. We conduct experiments on PASCAL VOC 2012 dataset. The mIoU of the proposed method is 68.7%, which outperforms the state-of-the-art method by 1.9%.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization 从配对和非配对数据学习:交替训练的CycleGAN近红外图像着色
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301791
Zaifeng Yang, Zhenghua Chen
This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). A Cycle-Consistent Generative Adversarial Network (CycleGAN) with cross-scale dense connections is developed to learn the color translation from the NIR domain to the RGB domain based on both paired and unpaired data. Due to the limited number of paired NIR-RGB images, data augmentation via cropping, scaling, contrast and mirroring operations have been adopted to increase the variations of the NIR domain. An alternating training strategy has been designed, such that CycleGAN can efficiently and alternately learn the explicit pixel-level mappings from the paired NIR-RGB data, as well as the implicit domain mappings from the unpaired ones. Based on the validation data, we have evaluated our method and compared it with conventional CycleGAN method in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and angular error (AE). The experimental results validate the proposed colorization framework.
本文提出了一种新的近红外(NIR)图像着色方法,用于2020年IEEE视觉通信与图像处理国际会议(VCIP)举办的大挑战。提出了一种具有跨尺度密集连接的循环一致生成对抗网络(CycleGAN),用于学习基于成对和非成对数据从NIR域到RGB域的颜色转换。由于配对的NIR- rgb图像数量有限,通过裁剪、缩放、对比度和镜像操作进行数据增强来增加NIR域的变化。设计了一种交替训练策略,使CycleGAN能够有效地交替学习成对NIR-RGB数据的显式像素级映射和未成对NIR-RGB数据的隐式域映射。基于验证数据,我们对我们的方法进行了评估,并在峰值信噪比(PSNR)、结构相似性(SSIM)和角误差(AE)方面与传统的CycleGAN方法进行了比较。实验结果验证了所提出的着色框架。
{"title":"Learning From Paired and Unpaired Data: Alternately Trained CycleGAN for Near Infrared Image Colorization","authors":"Zaifeng Yang, Zhenghua Chen","doi":"10.1109/VCIP49819.2020.9301791","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301791","url":null,"abstract":"This paper presents a novel near infrared (NIR) image colorization approach for the Grand Challenge held by 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). A Cycle-Consistent Generative Adversarial Network (CycleGAN) with cross-scale dense connections is developed to learn the color translation from the NIR domain to the RGB domain based on both paired and unpaired data. Due to the limited number of paired NIR-RGB images, data augmentation via cropping, scaling, contrast and mirroring operations have been adopted to increase the variations of the NIR domain. An alternating training strategy has been designed, such that CycleGAN can efficiently and alternately learn the explicit pixel-level mappings from the paired NIR-RGB data, as well as the implicit domain mappings from the unpaired ones. Based on the validation data, we have evaluated our method and compared it with conventional CycleGAN method in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and angular error (AE). The experimental results validate the proposed colorization framework.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128775734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening 基于自动顶点筛选的上颌窦膜分割
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301845
K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein
The purpose of this study is to develop an automatic technique to segment the membrane of the maxillary sinus with morphological changes (e.g. thickened membrane and cysts) for the detection of abnormalities. The first step is to segment the sinus bone cavity in the CBCT image using fuzzy C-mean algorithm. Then, the vertices of inner bone walls of sinus in the mesh model are screened with vertex normal direction and angular based mean-distance filtering. The resulted vertices are then used to generate the bony sinus cavity mesh model by using Poisson surface reconstruction. Finally, the sinus membrane morphological changes are segmented by subtracting the air sinus segmentation from the reconstructed bony sinus cavity. The proposed method has been applied on 5 maxillary sinuses with mucosal thickening and has demonstrated that it can segment thin membrane thickening (< 2 mm) successfully within 4.1% and 3.5% error in volume and surface area respectively. Existing methods have issues of leakages at openings and thin bones, and inaccuracy with irregular contours commonly seen in maxillary sinus. The current method overcomes these shortcomings.
本研究的目的是开发一种自动分割上颌窦膜的形态学改变(如增厚的膜和囊肿),以检测异常的技术。第一步是利用模糊c均值算法在CBCT图像中分割窦性骨腔。然后,利用顶点法向和基于角度的平均距离滤波对网格模型中窦骨内壁的顶点进行筛选。得到的顶点通过泊松曲面重建生成骨窦腔网格模型。最后,通过从重建的骨窦腔中减去气窦分割,对窦膜形态学变化进行分割。将该方法应用于5个粘膜增厚的上颌窦,结果表明,该方法可以成功分割薄膜增厚(< 2 mm),体积和表面积误差分别为4.1%和3.5%。现有方法存在开口渗漏、骨薄、上颌窦轮廓不规则等问题。目前的方法克服了这些缺点。
{"title":"On Segmentation of Maxillary Sinus Membrane using Automatic Vertex Screening","authors":"K. Li, Tai-Chiu Hsung, A. Yeung, M. Bornstein","doi":"10.1109/VCIP49819.2020.9301845","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301845","url":null,"abstract":"The purpose of this study is to develop an automatic technique to segment the membrane of the maxillary sinus with morphological changes (e.g. thickened membrane and cysts) for the detection of abnormalities. The first step is to segment the sinus bone cavity in the CBCT image using fuzzy C-mean algorithm. Then, the vertices of inner bone walls of sinus in the mesh model are screened with vertex normal direction and angular based mean-distance filtering. The resulted vertices are then used to generate the bony sinus cavity mesh model by using Poisson surface reconstruction. Finally, the sinus membrane morphological changes are segmented by subtracting the air sinus segmentation from the reconstructed bony sinus cavity. The proposed method has been applied on 5 maxillary sinuses with mucosal thickening and has demonstrated that it can segment thin membrane thickening (< 2 mm) successfully within 4.1% and 3.5% error in volume and surface area respectively. Existing methods have issues of leakages at openings and thin bones, and inaccuracy with irregular contours commonly seen in maxillary sinus. The current method overcomes these shortcomings.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression 基于链码的占用地图编码在视频点云压缩中的应用
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301867
Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu
In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.
在基于视频的点云压缩(V-PCC)中,利用占用地图视频来指示二维像素是否对应于有效的三维点。在当前的V-PCC设计中,利用高效视频编码(High Efficiency video Coding, HEVC)直接对占用地图视频进行无损压缩。然而,HEVC中的编码工具是专门为自然图像设计的,因此不适合占用地图。本文提出了一种基于四叉树的无损占用图编码方案。在该方案中,首先将占用图划分为多个编码树单元(ctu)。然后,使用四叉树递归地将CTU划分为多个编码单元。当满足三个条件之一时,四叉树分区终止。首先,所有像素具有相同的值。其次,CU中的像素只有两种值,它们可以通过端点位于CU一侧的连续边缘来分离。然后使用链编码对连续边缘进行编码。第三,CU达到最小尺寸。该方案简化了HEVC中的块划分设计,设计了更简单但更有效的编码工具。实验结果表明,与V-PCC占位图编码方案相比,该编码方案的码率和复杂度都有显著降低。此外,该方案对语义映射的压缩也非常有效。
{"title":"Chain Code-Based Occupancy Map Coding for Video-Based Point Cloud Compression","authors":"Runyu Yang, Ning Yan, Li Li, Dong Liu, Feng Wu","doi":"10.1109/VCIP49819.2020.9301867","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301867","url":null,"abstract":"In video-based point cloud compression (V-PCC), occupancy map video is utilized to indicate whether a 2-D pixel corresponds to a valid 3-D point or not. In the current design of V-PCC, the occupancy map video is directly compressed losslessly with High Efficiency Video Coding (HEVC). However, the coding tools in HEVC are specifically designed for natural images, thus unsuitable for the occupancy map. In this paper, we present a novel quadtree-based scheme for lossless occupancy map coding. In this scheme, the occupancy map is firstly divided into several coding tree units (CTUs). Then, the CTU is divided into coding units (CUs) recursively using a quadtree. The quadtree partition is terminated when one of the three conditions is satisfied. Firstly, all the pixels have the same value. Secondly, the pixels in the CU only have two kinds of values and they can be separated by a continuous edge whose endpoints lie on the side of the CU. The continuous edge is then coded using chain code. Thirdly, the CU reaches the minimum size. This scheme simplifies the design of block partitioning in HEVC and designs simpler yet more effective coding tools. Experimental results show significant reduction of bit-rate and complexity compared with the occupancy map coding scheme in V-PCC. In addition, this scheme is also very efficient to compress the semantic map.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125501290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fast Geometry Estimation for Phase-coding Structured Light Field 相位编码结构光场的快速几何估计
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301777
Li Liu, S. Xiang, Huiping Deng, Jin Wu
Estimation scene geometry is an important and fundamental task in light field processing. In conventional light field, there exist homogeneous texture surfaces, which brings ambiguity and heavy computation load in estimating the depth. In this paper, we propose phase-coding structured light field (PSLF), which projects sinusoidal waveform patterns and the phase is assigned to every pixel as the code. With the EPI of PSLF, we propose a depth estimation method. To be specific, the cost is convex with respect to the inclination angle of the candidate line in the EPI, and we propose to iterate rotating the candidate line until it converges to the optimal one. In addition, to cope with problem that the candidate samples cover multiple depth layers, we propose a method to reject the outlier samples. Experimental results demonstrate that, compared with conventional LF, the proposed PSLF improves the depth quality with mean absolute error being 0.007 pixels. In addition, the proposed optimization-based depth estimation method improves efficiency obviously with the processing speed being about 2.71 times of the tradition method.
场景几何估计是光场处理中重要而基础的任务。常规光场中存在均匀纹理表面,在深度估计中存在模糊性,计算量大。本文提出了一种相位编码结构光场(PSLF)技术,该技术可以投影正弦波形,并将相位分配到每个像素作为编码。利用PSLF的EPI,提出了一种深度估计方法。具体来说,成本相对于EPI中候选线的倾角是凸的,我们建议迭代旋转候选线,直到它收敛到最优线。此外,针对候选样本覆盖多个深度层的问题,提出了一种拒绝离群样本的方法。实验结果表明,与传统LF相比,PSLF提高了深度质量,平均绝对误差为0.007像素。此外,基于优化的深度估计方法显著提高了效率,处理速度约为传统方法的2.71倍。
{"title":"Fast Geometry Estimation for Phase-coding Structured Light Field","authors":"Li Liu, S. Xiang, Huiping Deng, Jin Wu","doi":"10.1109/VCIP49819.2020.9301777","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301777","url":null,"abstract":"Estimation scene geometry is an important and fundamental task in light field processing. In conventional light field, there exist homogeneous texture surfaces, which brings ambiguity and heavy computation load in estimating the depth. In this paper, we propose phase-coding structured light field (PSLF), which projects sinusoidal waveform patterns and the phase is assigned to every pixel as the code. With the EPI of PSLF, we propose a depth estimation method. To be specific, the cost is convex with respect to the inclination angle of the candidate line in the EPI, and we propose to iterate rotating the candidate line until it converges to the optimal one. In addition, to cope with problem that the candidate samples cover multiple depth layers, we propose a method to reject the outlier samples. Experimental results demonstrate that, compared with conventional LF, the proposed PSLF improves the depth quality with mean absolute error being 0.007 pixels. In addition, the proposed optimization-based depth estimation method improves efficiency obviously with the processing speed being about 2.71 times of the tradition method.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121289868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Scale Video Inverse Tone Mapping with Deformable Alignment 具有可变形对齐的多尺度视频反色调映射
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301780
Jiaqi Zou, Ke Mei, Songlin Sun
Inverse tone mapping(iTM) is an operation to transform low-dynamic-range (LDR) content to high-dynamic-range (HDR) content, which is an effective technique to improve the visual experience. ITM has developed rapidly with deep learning algorithms in recent years. However, the great majority of deeplearning-based iTM methods are aimed at images and ignore the temporal correlations of consecutive frames in videos. In this paper, we propose a multi-scale video iTM network with deformable alignment, which increases time consistency in videos. We first a lign t he i nput c onsecutive L DR f rames a t t he feature level by deformable convolutions and then simultaneously use multi-frame information to generate the HDR frame. Additionally, we adopt a multi-scale iTM architecture with a pyramid pooling module, which enables our network to reconstruct details as well as global features. The proposed network achieves better performance compared to other iTM methods on quantitative metrics and gain a significant visual improvement.
反调映射(iTM)是一种将低动态范围(LDR)内容转换为高动态范围(HDR)内容的操作,是改善视觉体验的有效技术。近年来,随着深度学习算法的发展,ITM得到了迅速发展。然而,绝大多数基于深度学习的iTM方法都是针对图像的,忽略了视频中连续帧的时间相关性。本文提出了一种具有可变形对齐的多尺度视频iTM网络,提高了视频的时间一致性。我们首先通过可变形卷积将输入的3个连续的L - DR帧线性化到特征层,然后同时使用多帧信息生成HDR帧。此外,我们采用了多尺度iTM架构和金字塔池模块,使我们的网络能够重建细节和全局特征。与其他iTM方法相比,本文提出的网络在定量指标上取得了更好的性能,并且在视觉上有了显著的改善。
{"title":"Multi-Scale Video Inverse Tone Mapping with Deformable Alignment","authors":"Jiaqi Zou, Ke Mei, Songlin Sun","doi":"10.1109/VCIP49819.2020.9301780","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301780","url":null,"abstract":"Inverse tone mapping(iTM) is an operation to transform low-dynamic-range (LDR) content to high-dynamic-range (HDR) content, which is an effective technique to improve the visual experience. ITM has developed rapidly with deep learning algorithms in recent years. However, the great majority of deeplearning-based iTM methods are aimed at images and ignore the temporal correlations of consecutive frames in videos. In this paper, we propose a multi-scale video iTM network with deformable alignment, which increases time consistency in videos. We first a lign t he i nput c onsecutive L DR f rames a t t he feature level by deformable convolutions and then simultaneously use multi-frame information to generate the HDR frame. Additionally, we adopt a multi-scale iTM architecture with a pyramid pooling module, which enables our network to reconstruct details as well as global features. The proposed network achieves better performance compared to other iTM methods on quantitative metrics and gain a significant visual improvement.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123023411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Disparity compensation of light fields for improved efficiency in 4D transform-based encoders 提高4D变换编码器效率的光场视差补偿
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301829
João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria
Efficient light field en coders take advantage of the inherent 4D data structures to achieve high compression performance. This is accomplished by exploiting the redundancy of co-located pixels in different sub-aperture images (SAIs) through prediction and/or transform schemes to find a m ore compact representation of the signal. However, in image regions with higher disparity between SAIs, such scheme’s performance tends to decrease, thus reducing the compression efficiency. This paper introduces a reversible pre-processing algorithm for disparity compensation that operates on the SAI domain of light field data. The proposed method contributes to improve the transform efficiency of the encoder, since the disparity-compensated data presents higher correlation between co-located image blocks. The experimental results show significant improvements in the compression performance of 4D light fields, achieving Bjontegaard delta rate gains of about 44% on average for MuLE codec using the 4D discrete cosine transform, when encoding High Density Camera Arrays (HDCA) light field images.
高效光场编码器利用固有的四维数据结构来实现高压缩性能。这是通过预测和/或转换方案利用不同子孔径图像(SAIs)中共定位像素的冗余来实现的,以找到更紧凑的信号表示。然而,在sai之间差异较大的图像区域,该方案的性能趋于下降,从而降低了压缩效率。介绍了一种对光场数据SAI域进行视差补偿的可逆预处理算法。由于差分补偿后的数据在同位图像块之间具有较高的相关性,该方法有助于提高编码器的变换效率。实验结果表明,在编码高密度相机阵列(HDCA)光场图像时,采用四维离散余弦变换的MuLE编解码器实现了约44%的Bjontegaard δ速率增益,显著改善了四维光场压缩性能。
{"title":"Disparity compensation of light fields for improved efficiency in 4D transform-based encoders","authors":"João M. Santos, Lucas A. Thomaz, P. Assunção, L. Cruz, Luis M. N. Tavora, S. Faria","doi":"10.1109/VCIP49819.2020.9301829","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301829","url":null,"abstract":"Efficient light field en coders take advantage of the inherent 4D data structures to achieve high compression performance. This is accomplished by exploiting the redundancy of co-located pixels in different sub-aperture images (SAIs) through prediction and/or transform schemes to find a m ore compact representation of the signal. However, in image regions with higher disparity between SAIs, such scheme’s performance tends to decrease, thus reducing the compression efficiency. This paper introduces a reversible pre-processing algorithm for disparity compensation that operates on the SAI domain of light field data. The proposed method contributes to improve the transform efficiency of the encoder, since the disparity-compensated data presents higher correlation between co-located image blocks. The experimental results show significant improvements in the compression performance of 4D light fields, achieving Bjontegaard delta rate gains of about 44% on average for MuLE codec using the 4D discrete cosine transform, when encoding High Density Camera Arrays (HDCA) light field images.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127623329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Graph Topology Representation with Attention Networks 用注意网络学习图拓扑表示
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301864
Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang
Contextualized neural language models have gained much attention in Information Retrieval (IR) with its ability to achieve better word understanding by capturing contextual structure on sentence level. However, to understand a document better, it is necessary to involve contextual structure from document level. Moreover, some words contributes more information to delivering the meaning of a document. Motivated by this, in this paper, we take the advantages of Graph Convolutional Networks (GCN) and Graph Attention Networks (GAN) to model global word-relation structure of a document with attention mechanism to improve context-aware document ranking. We propose to build a graph for a document to model the global contextual structure. The nodes and edges of the graph are constructed from contextual embeddings. We first apply graph convolution on the graph and then use attention networks to explore the influence of more informative words to obtain a new representation. This representation covers both local contextual and global structure information. The experimental results show that our method outperforms the state-of-the-art contextual language models, which demonstrate that incorporating contextual structure is useful for improving document ranking.
上下文化神经语言模型由于能够在句子层面捕捉上下文结构,从而更好地理解单词,在信息检索领域受到了广泛的关注。然而,为了更好地理解文档,有必要从文档级别考虑上下文结构。此外,有些词有助于传递文件的意思更多的信息。基于此,本文利用图卷积网络(GCN)和图注意网络(GAN)的优势,利用注意机制对文档的全局词关系结构进行建模,以提高上下文感知的文档排名。我们建议为文档构建一个图来建模全局上下文结构。图的节点和边是由上下文嵌入构建的。我们首先在图上应用图卷积,然后使用注意网络来探索更多信息词的影响,以获得新的表示。这种表示涵盖了本地上下文和全局结构信息。实验结果表明,我们的方法优于最先进的上下文语言模型,这表明结合上下文结构有助于提高文档排名。
{"title":"Learning Graph Topology Representation with Attention Networks","authors":"Yuanyuan Qi, Jiayue Zhang, Weiran Xu, Jun Guo, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301864","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301864","url":null,"abstract":"Contextualized neural language models have gained much attention in Information Retrieval (IR) with its ability to achieve better word understanding by capturing contextual structure on sentence level. However, to understand a document better, it is necessary to involve contextual structure from document level. Moreover, some words contributes more information to delivering the meaning of a document. Motivated by this, in this paper, we take the advantages of Graph Convolutional Networks (GCN) and Graph Attention Networks (GAN) to model global word-relation structure of a document with attention mechanism to improve context-aware document ranking. We propose to build a graph for a document to model the global contextual structure. The nodes and edges of the graph are constructed from contextual embeddings. We first apply graph convolution on the graph and then use attention networks to explore the influence of more informative words to obtain a new representation. This representation covers both local contextual and global structure information. The experimental results show that our method outperforms the state-of-the-art contextual language models, which demonstrate that incorporating contextual structure is useful for improving document ranking.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Versatile Video Coding (VVC) Arrives 多功能视频编码(VVC)的到来
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301847
G. Sullivan
Seven years after the development of the first version of the High Efficiency Video Coding (HEVC) standard, the major international organizations in the world of video coding have completed the next major generation, called Versatile Video Coding (VVC). The VVC standard, formally designated as ITU-T H.266 and ISO/IEC 23090-3, promises a major improvement in video compression relative to its predecessors. It can offer roughly double the coding efficiency – i.e., it can be used to encode video content to the same level of visual quality while using about 50% fewer bits than HEVC and thus using about 75% fewer bits than H.264/AVC, today’s most widely used format. Thus it can ease the burden on worldwide networks, where video now comprises about 80% of all internet traffic. Moreover, VVC has enhanced features in its syntax for supporting an unprecedented breadth of applications, giving meaning to the word "versatility" used in its title. Completed in July 2020, VVC has begun to emerge in practical implementations and is undergoing testing to characterize its subjective performance.
在第一版高效视频编码(HEVC)标准开发7年后,视频编码领域的主要国际组织已经完成了下一代标准,称为多功能视频编码(VVC)。VVC标准,正式命名为ITU-T H.266和ISO/IEC 23090-3,相对于其前身,承诺在视频压缩方面有重大改进。它可以提供大约两倍的编码效率——也就是说,它可以用于将视频内容编码到相同的视觉质量水平,而使用的比特数比HEVC少50%,因此使用的比特数比H.264/AVC(当今最广泛使用的格式)少75%。因此,它可以减轻全球网络的负担,目前视频占所有互联网流量的80%左右。此外,VVC在其语法中增强了功能,以支持前所未有的应用广度,赋予其标题中使用的“多功能性”一词以意义。VVC于2020年7月完成,已开始在实际应用中出现,并正在进行测试以表征其主观性能。
{"title":"Versatile Video Coding (VVC) Arrives","authors":"G. Sullivan","doi":"10.1109/VCIP49819.2020.9301847","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301847","url":null,"abstract":"Seven years after the development of the first version of the High Efficiency Video Coding (HEVC) standard, the major international organizations in the world of video coding have completed the next major generation, called Versatile Video Coding (VVC). The VVC standard, formally designated as ITU-T H.266 and ISO/IEC 23090-3, promises a major improvement in video compression relative to its predecessors. It can offer roughly double the coding efficiency – i.e., it can be used to encode video content to the same level of visual quality while using about 50% fewer bits than HEVC and thus using about 75% fewer bits than H.264/AVC, today’s most widely used format. Thus it can ease the burden on worldwide networks, where video now comprises about 80% of all internet traffic. Moreover, VVC has enhanced features in its syntax for supporting an unprecedented breadth of applications, giving meaning to the word \"versatility\" used in its title. Completed in July 2020, VVC has begun to emerge in practical implementations and is undergoing testing to characterize its subjective performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133908843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1