首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Data Transformer for Anomalous Trajectory Detection 异常轨迹检测的数据转换器
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675322
Hsuan-Jen Psan, Wen-Jiin Tsai
Anomaly detection is an important task in many traffic applications. Methods based on deep learning networks reach high accuracy; however, they typically rely on supervised training with large annotated data. Considering that anomalous data are not easy to obtain, we present data transformation methods which convert the data obtained from one intersection to other intersections to mitigate the effort of collecting training data. The proposed methods are demonstrated on the task of anomalous trajectory detection. A General model and a Universal model are proposed. The former focuses on saving data collection effort; the latter further reduces the network training effort. We evaluated the methods on the dataset with trajectories from four intersections in GTA V virtual world. The experimental results show that with significant reduction in data collecting and network training efforts, the proposed anomalous trajectory detection still achieves state-of-the-art accuracy.
异常检测在许多流量应用中都是一项重要的任务。基于深度学习网络的方法准确率高;然而,它们通常依赖于带有大量注释数据的监督训练。考虑到异常数据不易获取的特点,本文提出了一种数据转换方法,将从一个交叉口获得的数据转换为其他交叉口的数据,以减轻采集训练数据的工作量。在异常轨迹检测任务中对所提出的方法进行了验证。提出了通用模型和通用模型。前者侧重于节省数据收集工作;后者进一步减少了网络训练的工作量。我们在GTA V虚拟世界的四个路口的轨迹数据集上评估了这些方法。实验结果表明,在显著减少数据收集和网络训练工作量的情况下,所提出的异常轨迹检测方法仍能达到最先进的精度。
{"title":"Data Transformer for Anomalous Trajectory Detection","authors":"Hsuan-Jen Psan, Wen-Jiin Tsai","doi":"10.1109/VCIP53242.2021.9675322","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675322","url":null,"abstract":"Anomaly detection is an important task in many traffic applications. Methods based on deep learning networks reach high accuracy; however, they typically rely on supervised training with large annotated data. Considering that anomalous data are not easy to obtain, we present data transformation methods which convert the data obtained from one intersection to other intersections to mitigate the effort of collecting training data. The proposed methods are demonstrated on the task of anomalous trajectory detection. A General model and a Universal model are proposed. The former focuses on saving data collection effort; the latter further reduces the network training effort. We evaluated the methods on the dataset with trajectories from four intersections in GTA V virtual world. The experimental results show that with significant reduction in data collecting and network training efforts, the proposed anomalous trajectory detection still achieves state-of-the-art accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122379890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Fly with a Video Generator 用视频生成器学习飞行
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675414
Chia-Chun Chung, Wen-Hsiao Peng, Teng-Hu Cheng, Chin-Feng Yu
This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.
本文演示了一种用于训练自动飞行无人机的基于模型的强化学习框架。我们将之前工作中提出的“梦想者”作为一个环境模型来实现,该模型通过预测下一个视频帧作为新的状态信号来响应无人机所采取的行动。梦想者是一个条件视频序列发生器。这种基于模型的环境避免了智能体和环境之间耗时的交互,极大地加快了训练过程。这个演示首次展示了“梦想者”在Airsim模拟器中训练一个可以完成赛车任务的代理的应用。
{"title":"Learning to Fly with a Video Generator","authors":"Chia-Chun Chung, Wen-Hsiao Peng, Teng-Hu Cheng, Chin-Feng Yu","doi":"10.1109/VCIP53242.2021.9675414","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675414","url":null,"abstract":"This paper demonstrates a model-based reinforcement learning framework for training a self-flying drone. We implement the Dreamer proposed in a prior work as an environment model that responds to the action taken by the drone by predicting the next video frame as a new state signal. The Dreamer is a conditional video sequence generator. This model-based environment avoids the time-consuming interactions between the agent and the environment, speeding up largely the training process. This demonstration showcases for the first time the application of the Dreamer to train an agent that can finish the racing task in the Airsim simulator.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122381129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient Compression with a Variational Coding Scheme for Federated Learning 基于变分编码的梯度压缩联邦学习
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675436
B. Kathariya, Zhu Li, Jianle Chen, G. V. D. Auwera
Federated Learning (FL), a distributed machine learning architecture, emerged to solve the intelligent data analysis on massive data generated at network edge-devices. With this paradigm, a model is jointly learned in parallel at edge-devices without needing to send voluminous data to a central FL server. This not only allows a model to learn in a feasible duration by reducing network latency but also preserves data privacy. Nonetheless, when thousands of edge-devices are attached to an FL framework, limited network resources inevitably impose intolerable training latency. In this work, we propose model-update compression to solve this issue in a very novel way. The proposed method learns multiple Gaussian distributions that best describe the high dimensional gradient parameters. In the FL server, high dimensional gradients are repopulated from Gaussian distributions utilizing likelihood function parameters which are communicated to the server. Since the distribution information parameters constitute a very small percentage of values compared to the high dimensional gradients themselves, our proposed method is able to save significant uplink band-width while preserving the model accuracy. Experimental results validated our claim.
联邦学习(FL)是一种分布式机器学习架构,旨在解决网络边缘设备产生的海量数据的智能数据分析问题。使用这种范例,可以在边缘设备上并行地联合学习模型,而无需将大量数据发送到中央FL服务器。这不仅允许模型通过减少网络延迟在可行的持续时间内学习,而且还保护了数据隐私。尽管如此,当数千个边缘设备连接到FL框架时,有限的网络资源不可避免地会造成无法忍受的训练延迟。在这项工作中,我们提出了模型更新压缩以一种非常新颖的方式来解决这个问题。该方法学习最能描述高维梯度参数的多个高斯分布。在FL服务器中,利用传递给服务器的似然函数参数从高斯分布重新填充高维梯度。由于与高维梯度本身相比,分布信息参数占值的百分比非常小,因此我们提出的方法能够在保持模型精度的同时节省大量上行带宽。实验结果证实了我们的说法。
{"title":"Gradient Compression with a Variational Coding Scheme for Federated Learning","authors":"B. Kathariya, Zhu Li, Jianle Chen, G. V. D. Auwera","doi":"10.1109/VCIP53242.2021.9675436","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675436","url":null,"abstract":"Federated Learning (FL), a distributed machine learning architecture, emerged to solve the intelligent data analysis on massive data generated at network edge-devices. With this paradigm, a model is jointly learned in parallel at edge-devices without needing to send voluminous data to a central FL server. This not only allows a model to learn in a feasible duration by reducing network latency but also preserves data privacy. Nonetheless, when thousands of edge-devices are attached to an FL framework, limited network resources inevitably impose intolerable training latency. In this work, we propose model-update compression to solve this issue in a very novel way. The proposed method learns multiple Gaussian distributions that best describe the high dimensional gradient parameters. In the FL server, high dimensional gradients are repopulated from Gaussian distributions utilizing likelihood function parameters which are communicated to the server. Since the distribution information parameters constitute a very small percentage of values compared to the high dimensional gradients themselves, our proposed method is able to save significant uplink band-width while preserving the model accuracy. Experimental results validated our claim.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122423129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn A Compression for Objection Detection - VAE with a Bridge 学习一个压缩目标检测- VAE与桥
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675387
Yixin Mei, Fan Li, Li Li, Zhu Li
Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.
传感器技术的最新进展和视觉传感器的广泛部署导致了一种新的应用,而图像压缩主要不是为了人类消费的像素恢复,而是为了与云端的机器视觉任务(如分类、识别、检测和跟踪)进行通信。这为基于学习的压缩开辟了新的研究维度,直接优化视觉任务中的损失函数,从而获得更好的压缩性能,相对于像素恢复,然后执行视觉任务计算。在这项工作中,我们开发了一种基于学习的压缩方案,该方案为视觉目标检测任务学习紧凑的特征表示和适当的比特流。采用变分自编码器(VAE)框架学习压缩表示,训练桥式网络驱动检测损失函数。仿真结果表明,与基于学习和手工制作的解决方案的像素恢复方法相比,该方法在任务驱动的压缩效率方面达到了新的水平。
{"title":"Learn A Compression for Objection Detection - VAE with a Bridge","authors":"Yixin Mei, Fan Li, Li Li, Zhu Li","doi":"10.1109/VCIP53242.2021.9675387","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675387","url":null,"abstract":"Recent advances in sensor technology and wide deployment of visual sensors lead to a new application whereas compression of images are not mainly for pixel recovery for human consumption, instead it is for communication to cloud side machine vision tasks like classification, identification, detection and tracking. This opens up new research dimensions for a learning based compression that directly optimizes loss function in vision tasks, and therefore achieves better compression performance vis-a-vis the pixel recovery and then performing vision tasks computing. In this work, we developed a learning based compression scheme that learns a compact feature representation and appropriate bitstreams for the task of visual object detection. Variational Auto-Encoder (VAE) framework is adopted for learning a compact representation, while a bridge network is trained to drive the detection loss function. Simulation results demonstrate that this approach is achieving a new state-of-the-art in task driven compression efficiency, compared with pixel recovery approaches, including both learning based and handcrafted solutions.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124856899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Linear Regression Mode of Intra Prediction for Screen Content Coding 屏幕内容编码内预测的线性回归模型
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675382
Wei Peng, Hongkui Wang, Li Yu
High Efficiency Video Coding - Screen Content Coding (HEVC-SCC) follows the traditional angular intra prediction technique in HEVC. However, the Planar mode and the DC mode are somewhat repetitive for screen content video with features such as no senor noise. Hence, this paper proposes a new intra prediction mode called linear regression (LR) mode, which combines the Planar mode and the DC mode into one mode. The LR mode improves the prediction accuracy of intra prediction for fading regions in screen content video. Besides, by optimizing the most probable mode (MPM) construction, the hit rate of the best mode in the MPM list is improved. The experimental results show that the proposed method can achieve 0.57% BD-BR reduction compared with HM $16.20+text{SCM} 8.8$, while the coding time remains largely the same.
高效视频编码-屏幕内容编码(HEVC- scc)继承了HEVC中传统的角内预测技术。然而,平面模式和直流模式对于屏幕内容视频来说有些重复,具有无传感器噪声等特点。因此,本文提出了一种新的内预测模式,即线性回归(LR)模式,它将平面模式和直流模式结合为一种模式。LR模式提高了对屏幕内容视频中衰落区域的帧内预测精度。此外,通过优化最可能模式(MPM)的构造,提高了MPM列表中最佳模式的命中率。实验结果表明,与HM $16.20+text{SCM} 8.8$相比,该方法在编码时间基本保持不变的情况下,可以实现0.57%的BD-BR减少。
{"title":"Linear Regression Mode of Intra Prediction for Screen Content Coding","authors":"Wei Peng, Hongkui Wang, Li Yu","doi":"10.1109/VCIP53242.2021.9675382","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675382","url":null,"abstract":"High Efficiency Video Coding - Screen Content Coding (HEVC-SCC) follows the traditional angular intra prediction technique in HEVC. However, the Planar mode and the DC mode are somewhat repetitive for screen content video with features such as no senor noise. Hence, this paper proposes a new intra prediction mode called linear regression (LR) mode, which combines the Planar mode and the DC mode into one mode. The LR mode improves the prediction accuracy of intra prediction for fading regions in screen content video. Besides, by optimizing the most probable mode (MPM) construction, the hit rate of the best mode in the MPM list is improved. The experimental results show that the proposed method can achieve 0.57% BD-BR reduction compared with HM $16.20+text{SCM} 8.8$, while the coding time remains largely the same.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126094622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the General and Technical Program Chairs 来自一般和技术项目主席的信息
Pub Date : 2021-12-05 DOI: 10.1109/vcip53242.2021.9675415
{"title":"Message from the General and Technical Program Chairs","authors":"","doi":"10.1109/vcip53242.2021.9675415","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675415","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125450390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction 学习环顾四周:视频显著性预测的深度强化学习代理
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675397
Yiran Tao, Yaosi Hu, Zhenzhong Chen
In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.
在视频显著性预测任务中,关键帧的时间上下文信息的利用是一个关键问题。本文提出了一种用于视频显著性预测的深度强化学习智能体,旨在查看相邻帧并自适应生成包含关键帧最相关信息的显著性上下文窗口以进行显著性预测。具体来说,一步一步的动作集决定是否扩大窗口,同时状态集和奖励函数评估当前窗口的有效性。遵循深度q -学习算法来训练代理学习策略以实现其目标。所提出的智能体可视为即插即用,与一般的视频显著性预测模型兼容。在各种数据集上的实验结果表明,我们的方法可以达到更高的性能。
{"title":"Learn to Look Around: Deep Reinforcement Learning Agent for Video Saliency Prediction","authors":"Yiran Tao, Yaosi Hu, Zhenzhong Chen","doi":"10.1109/VCIP53242.2021.9675397","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675397","url":null,"abstract":"In the video saliency prediction task, one of the key issues is the utilization of temporal contextual information of keyframes. In this paper, a deep reinforcement learning agent for video saliency prediction is proposed, designed to look around adjacent frames and adaptively generate a salient contextual window that contains the most correlated information of keyframe for saliency prediction. More specifically, an action set step by step decides whether to expand the window, meanwhile a state set and reward function evaluate the effectiveness of the current window. The deep Q-learning algorithm is followed to train the agent to learn a policy to achieve its goal. The proposed agent can be regarded as plug-and-play which is compatible with generic video saliency prediction models. Experimental results on various datasets demonstrate that our method can achieve an advanced performance.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116920216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DWS-BEAM: Decoder-Wise Subpicture Bitstream Extracting and Merging for MPEG Immersive Video DWS-BEAM:基于解码器的MPEG沉浸式视频子图像比特流提取和合并
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675419
Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu
With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.
随着新的沉浸式视频编码标准MPEG(沉浸式视频编码)和通用视频编码(VVC)的出现,计算机生成和自然内容视频的六自由度虚拟现实(VR)流媒体技术正在兴起。本文讨论了用于MIV的解码器智能子图像比特流提取和合并(DWS-BEAM)方法,并提出了两个主要思想:(i)使用运动约束贴图集(MCTS)的选择性流感知子图像分配方法,(ii)用于单次解码的解码器智能子图像提取和合并方法。在使用VVC测试模型(VTM)的实验中,与VTM锚点相比,所提出的方法在沉浸式视频PSNR (IV-PSNR)上节省了1.23%的bd率,在解码时间上节省了15.78%。此外,MIV测试模型需要四个解码器,而本文方法只需要一个解码器。
{"title":"DWS-BEAM: Decoder-Wise Subpicture Bitstream Extracting and Merging for MPEG Immersive Video","authors":"Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu","doi":"10.1109/VCIP53242.2021.9675419","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675419","url":null,"abstract":"With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SMRD: A Local Feature Descriptor for Multi-modal Image Registration SMRD:多模态图像配准的局部特征描述符
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675401
Jiayu Xie, Xin Jin, Hongkun Cao
Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
多模态图像配准在计算机视觉和计算摄影领域受到越来越多的关注。然而,非线性的强度变化限制了不同模态图像对之间特征点的精确匹配。为此,提出了一种用于多模态图像配准的鲁棒图像描述符,即基于shearlet的模态鲁棒描述符(SMRD)。利用多尺度边缘和纹理信息的各向异性特征,基于离散shearlet变换对感兴趣点周围区域进行编码。我们在四种不同的多模态数据集上与几种最先进的多模态/多光谱描述符进行了实验,以验证所提出的SMRD。实验结果表明,该方法在查全率、查全率和f1分数方面均优于其他方法。
{"title":"SMRD: A Local Feature Descriptor for Multi-modal Image Registration","authors":"Jiayu Xie, Xin Jin, Hongkun Cao","doi":"10.1109/VCIP53242.2021.9675401","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675401","url":null,"abstract":"Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Human Action Recognition on Raw Depth Maps 基于原始深度图的人类动作识别
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675349
Jacek Trelinski, B. Kwolek
We propose an effective framework for human action recognition on raw depth maps. We leverage a convolutional autoencoder to extract on sequences of deep maps the frame-features that are then fed to a 1D-CNN responsible for embedding action features. A Siamese neural network trained on repre-sentative single depth map for each sequence extracts features, which are then processed by shapelets algorithm to extract action features. These features are then concatenated with features extracted by a BiLSTM with TimeDistributed wrapper. Given the learned individual models on such features we perform a selection of a subset of models. We demonstrate experimentally that on SYSU 3DHOI dataset the proposed algorithm outperforms considerably all recent algorithms including skeleton-based ones.
我们提出了一个有效的基于原始深度图的人体动作识别框架。我们利用卷积自编码器在深度映射序列上提取帧特征,然后将其馈送到负责嵌入动作特征的1D-CNN。Siamese神经网络对每个序列的代表性单深度图进行训练,提取特征,然后通过shapelets算法对特征进行处理,提取动作特征。然后将这些特征与使用timedidistributed包装器的BiLSTM提取的特征连接起来。给定在这些特征上学习到的单个模型,我们执行模型子集的选择。实验证明,在SYSU 3DHOI数据集上,本文提出的算法大大优于所有最近的算法,包括基于骨架的算法。
{"title":"Human Action Recognition on Raw Depth Maps","authors":"Jacek Trelinski, B. Kwolek","doi":"10.1109/VCIP53242.2021.9675349","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675349","url":null,"abstract":"We propose an effective framework for human action recognition on raw depth maps. We leverage a convolutional autoencoder to extract on sequences of deep maps the frame-features that are then fed to a 1D-CNN responsible for embedding action features. A Siamese neural network trained on repre-sentative single depth map for each sequence extracts features, which are then processed by shapelets algorithm to extract action features. These features are then concatenated with features extracted by a BiLSTM with TimeDistributed wrapper. Given the learned individual models on such features we perform a selection of a subset of models. We demonstrate experimentally that on SYSU 3DHOI dataset the proposed algorithm outperforms considerably all recent algorithms including skeleton-based ones.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131748864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1