首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献

英文 中文
Scene-Text-Detection Method Robust Against Orientation and Discontiguous Components of Characters 一种抗方向和字符不连续成分的场景文本检测方法
Rei Endo, Yoshihiko Kawai, H. Sumiyoshi, Masanori Sano
Scene-text detection in natural-scene images is an important technique because scene texts contain location information such as names of places and buildings, but many difficulties still remain regarding practical use. In this paper, we tackle two problems of scene-text detection. The first is the discontiguous component problem in specific languages that contain characters consisting of discontiguous components. The second is the multi-orientation problem in all languages. To solve these two problems, we propose a connected-component-based scene-text-detection method. Our proposed method involves our novel neighbor-character search method using a synthesizable descriptor for the discontiguous-component problems and our novel region descriptor called the rotated bounding box descriptors (RBBs) for rotated characters. We also evaluated our proposed scene-text-detection method by using the well-known MSRA-TD500 dataset that includes rotated characters with discontiguous components.
自然场景图像中的场景文本检测是一项重要的技术,因为场景文本包含地点和建筑物的名称等位置信息,但在实际应用中仍存在许多困难。本文主要研究了场景文本检测中的两个问题。首先是包含由不连续组件组成的字符的特定语言中的不连续组件问题。二是所有语言的多方位问题。为了解决这两个问题,我们提出了一种基于连接组件的场景文本检测方法。我们提出的方法包括使用可合成描述符的新的邻字符搜索方法来解决不连续分量问题,以及使用称为旋转边界框描述符(RBBs)的新的区域描述符来解决旋转字符。我们还通过使用著名的MSRA-TD500数据集评估了我们提出的场景文本检测方法,该数据集包括具有不连续成分的旋转字符。
{"title":"Scene-Text-Detection Method Robust Against Orientation and Discontiguous Components of Characters","authors":"Rei Endo, Yoshihiko Kawai, H. Sumiyoshi, Masanori Sano","doi":"10.1109/CVPRW.2017.130","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.130","url":null,"abstract":"Scene-text detection in natural-scene images is an important technique because scene texts contain location information such as names of places and buildings, but many difficulties still remain regarding practical use. In this paper, we tackle two problems of scene-text detection. The first is the discontiguous component problem in specific languages that contain characters consisting of discontiguous components. The second is the multi-orientation problem in all languages. To solve these two problems, we propose a connected-component-based scene-text-detection method. Our proposed method involves our novel neighbor-character search method using a synthesizable descriptor for the discontiguous-component problems and our novel region descriptor called the rotated bounding box descriptors (RBBs) for rotated characters. We also evaluated our proposed scene-text-detection method by using the well-known MSRA-TD500 dataset that includes rotated characters with discontiguous components.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"61 1","pages":"941-949"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74711076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Image Denoising via CNNs: An Adversarial Approach cnn图像去噪:一种对抗方法
Nithish Divakar, R. Venkatesh Babu
Is it possible to recover an image from its noisy version using convolutional neural networks? This is an interesting problem as convolutional layers are generally used as feature detectors for tasks like classification, segmentation and object detection. We present a new CNN architecture for blind image denoising which synergically combines three architecture components, a multi-scale feature extraction layer which helps in reducing the effect of noise on feature maps, an ℓp regularizer which helps in selecting only the appropriate feature maps for the task of reconstruction, and finally a three step training approach which leverages adversarial training to give the final performance boost to the model. The proposed model shows competitive denoising performance when compared to the state-of-the-art approaches.
有可能使用卷积神经网络从噪声版本中恢复图像吗?这是一个有趣的问题,因为卷积层通常被用作分类、分割和目标检测等任务的特征检测器。我们提出了一种新的用于盲图像去噪的CNN架构,该架构协同结合了三个架构组件,一个有助于减少噪声对特征映射的影响的多尺度特征提取层,一个有助于只选择合适的特征映射来完成重建任务的正则化器,最后一个利用对抗训练的三步训练方法来提高模型的最终性能。与最先进的方法相比,所提出的模型显示出具有竞争力的去噪性能。
{"title":"Image Denoising via CNNs: An Adversarial Approach","authors":"Nithish Divakar, R. Venkatesh Babu","doi":"10.1109/CVPRW.2017.145","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.145","url":null,"abstract":"Is it possible to recover an image from its noisy version using convolutional neural networks? This is an interesting problem as convolutional layers are generally used as feature detectors for tasks like classification, segmentation and object detection. We present a new CNN architecture for blind image denoising which synergically combines three architecture components, a multi-scale feature extraction layer which helps in reducing the effect of noise on feature maps, an ℓp regularizer which helps in selecting only the appropriate feature maps for the task of reconstruction, and finally a three step training approach which leverages adversarial training to give the final performance boost to the model. The proposed model shows competitive denoising performance when compared to the state-of-the-art approaches.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"153 1","pages":"1076-1083"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73166165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
ASePPI: Robust Privacy Protection Against De-Anonymization Attacks ASePPI:针对去匿名化攻击的强大隐私保护
Natacha Ruchaud, J. Dugelay
The evolution of the video surveillance systems generates questions concerning protection of individual privacy. In this paper, we design ASePPI, an Adaptive Scrambling enabling Privacy Protection and Intelligibility method operating in the H.264/AVC stream with the aim to be robust against de-anonymization attacks targeting the restoration of the original image and the re-identification of people. The proposed approach automatically adapts the level of protection according to the resolution of the region of interest. Compared to existing methods, our framework provides a better trade-off between the privacy protection and the visibility of the scene with robustness against de-anonymization attacks. Moreover, the impact on the source coding stream is negligible.
随着视频监控系统的发展,对个人隐私的保护提出了新的问题。在本文中,我们设计了一种在H.264/AVC流中实现隐私保护和可理解性的自适应置乱方法ASePPI,其目的是对以恢复原始图像和重新识别人为目标的去匿名化攻击具有鲁棒性。该方法根据感兴趣区域的分辨率自动调整保护水平。与现有方法相比,我们的框架在隐私保护和场景可见性之间提供了更好的权衡,并具有抗去匿名化攻击的鲁棒性。此外,对源编码流的影响可以忽略不计。
{"title":"ASePPI: Robust Privacy Protection Against De-Anonymization Attacks","authors":"Natacha Ruchaud, J. Dugelay","doi":"10.1109/CVPRW.2017.177","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.177","url":null,"abstract":"The evolution of the video surveillance systems generates questions concerning protection of individual privacy. In this paper, we design ASePPI, an Adaptive Scrambling enabling Privacy Protection and Intelligibility method operating in the H.264/AVC stream with the aim to be robust against de-anonymization attacks targeting the restoration of the original image and the re-identification of people. The proposed approach automatically adapts the level of protection according to the resolution of the region of interest. Compared to existing methods, our framework provides a better trade-off between the privacy protection and the visibility of the scene with robustness against de-anonymization attacks. Moreover, the impact on the source coding stream is negligible.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"22 1","pages":"1352-1359"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74447913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Self-Supervised Neural Aggregation Networks for Human Parsing 用于人类语法分析的自监督神经聚合网络
Jian Zhao, Jianshu Li, Xuecheng Nie, F. Zhao, Yunpeng Chen, Zhecan Wang, Jiashi Feng, Shuicheng Yan
In this paper, we present a Self-Supervised Neural Aggregation Network (SS-NAN) for human parsing. SS-NAN adaptively learns to aggregate the multi-scale features at each pixel "address". In order to further improve the feature discriminative capacity, a self-supervised joint loss is adopted as an auxiliary learning strategy, which imposes human joint structures into parsing results without resorting to extra supervision. The proposed SS-NAN is end-to-end trainable. SS-NAN can be integrated into any advanced neural networks to help aggregate features regarding the importance at different positions and scales and incorporate rich high-level knowledge regarding human joint structures from a global perspective, which in turn improve the parsing results. Comprehensive evaluations on the recent Look into Person (LIP) and the PASCAL-Person-Part benchmark datasets demonstrate the significant superiority of our method over other state-of-the-arts.
在本文中,我们提出了一个自监督神经聚合网络(SS-NAN)用于人类解析。SS-NAN自适应学习在每个像素“地址”上聚合多尺度特征。为了进一步提高特征判别能力,采用自监督联合损失作为辅助学习策略,在不需要额外监督的情况下,将人类的联合结构强加到解析结果中。提出的SS-NAN是端到端可训练的。SS-NAN可以集成到任何高级神经网络中,以帮助聚合关于不同位置和尺度的重要性的特征,并从全局角度吸收关于人体关节结构的丰富高级知识,从而提高解析结果。对最近的Person (LIP)和PASCAL-Person-Part基准数据集的综合评估表明,我们的方法比其他最先进的方法具有显著的优越性。
{"title":"Self-Supervised Neural Aggregation Networks for Human Parsing","authors":"Jian Zhao, Jianshu Li, Xuecheng Nie, F. Zhao, Yunpeng Chen, Zhecan Wang, Jiashi Feng, Shuicheng Yan","doi":"10.1109/CVPRW.2017.204","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.204","url":null,"abstract":"In this paper, we present a Self-Supervised Neural Aggregation Network (SS-NAN) for human parsing. SS-NAN adaptively learns to aggregate the multi-scale features at each pixel \"address\". In order to further improve the feature discriminative capacity, a self-supervised joint loss is adopted as an auxiliary learning strategy, which imposes human joint structures into parsing results without resorting to extra supervision. The proposed SS-NAN is end-to-end trainable. SS-NAN can be integrated into any advanced neural networks to help aggregate features regarding the importance at different positions and scales and incorporate rich high-level knowledge regarding human joint structures from a global perspective, which in turn improve the parsing results. Comprehensive evaluations on the recent Look into Person (LIP) and the PASCAL-Person-Part benchmark datasets demonstrate the significant superiority of our method over other state-of-the-arts.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"2012 1","pages":"1595-1603"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73689989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Looking Under the Hood: Deep Neural Network Visualization to Interpret Whole-Slide Image Analysis Outcomes for Colorectal Polyps 透视:深度神经网络可视化解释结肠直肠息肉的全幻灯片图像分析结果
Bruno Korbar, Andrea M. Olofson, Allen P. Miraflor, Catherine M. Nicka, M. Suriawinata, L. Torresani, A. Suriawinata, S. Hassanpour
Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. The process of characterization is time-intensive and requires years of specialized medical training. In this work, we propose a deep-learning-based image analysis approach that not only can accurately classify different types of polyps in whole-slide images, but also generates major regions and features on the slide through a model visualization approach. We argue that this visualization approach will make sense of the underlying reasons for the classification outcomes, significantly reduce the cognitive burden on clinicians, and improve the diagnostic accuracy for whole-slide image characterization tasks. Our results show the efficacy of this network visualization approach in recovering decisive regions and features for different types of polyps on whole-slide images according to the domain expert pathologists.
结直肠息肉的组织病理学特征是确定结直肠癌风险和患者未来监测率的重要原则。鉴定过程耗时,需要多年的专业医学培训。在这项工作中,我们提出了一种基于深度学习的图像分析方法,该方法不仅可以准确地对整个幻灯片图像中的不同类型的息肉进行分类,而且可以通过模型可视化方法生成幻灯片上的主要区域和特征。我们认为,这种可视化方法将使分类结果的潜在原因有意义,显着减轻临床医生的认知负担,并提高整个幻灯片图像表征任务的诊断准确性。我们的结果表明,根据领域专家病理学家的意见,这种网络可视化方法在恢复不同类型息肉的全片图像上的决定性区域和特征方面是有效的。
{"title":"Looking Under the Hood: Deep Neural Network Visualization to Interpret Whole-Slide Image Analysis Outcomes for Colorectal Polyps","authors":"Bruno Korbar, Andrea M. Olofson, Allen P. Miraflor, Catherine M. Nicka, M. Suriawinata, L. Torresani, A. Suriawinata, S. Hassanpour","doi":"10.1109/CVPRW.2017.114","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.114","url":null,"abstract":"Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. The process of characterization is time-intensive and requires years of specialized medical training. In this work, we propose a deep-learning-based image analysis approach that not only can accurately classify different types of polyps in whole-slide images, but also generates major regions and features on the slide through a model visualization approach. We argue that this visualization approach will make sense of the underlying reasons for the classification outcomes, significantly reduce the cognitive burden on clinicians, and improve the diagnostic accuracy for whole-slide image characterization tasks. Our results show the efficacy of this network visualization approach in recovering decisive regions and features for different types of polyps on whole-slide images according to the domain expert pathologists.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"17 30 1","pages":"821-827"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85032966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Protecting Visual Secrets Using Adversarial Nets 使用对抗网络保护视觉秘密
Nisarg Raval, Ashwin Machanavajjhala, Landon P. Cox
Protecting visual secrets is an important problem due to the prevalence of cameras that continuously monitor our surroundings. Any viable solution to this problem should also minimize the impact on the utility of applications that use images. In this work, we build on the existing work of adversarial learning to design a perturbation mechanism that jointly optimizes privacy and utility objectives. We provide a feasibility study of the proposed mechanism and present ideas on developing a privacy framework based on the adversarial perturbation mechanism.
由于不断监控我们周围环境的摄像机的普及,保护视觉秘密是一个重要的问题。这个问题的任何可行的解决方案都应该尽量减少对使用映像的应用程序的影响。在这项工作中,我们以现有的对抗性学习工作为基础,设计了一种共同优化隐私和效用目标的扰动机制。我们对所提出的机制进行了可行性研究,并提出了基于对抗性摄动机制开发隐私框架的想法。
{"title":"Protecting Visual Secrets Using Adversarial Nets","authors":"Nisarg Raval, Ashwin Machanavajjhala, Landon P. Cox","doi":"10.1109/CVPRW.2017.174","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.174","url":null,"abstract":"Protecting visual secrets is an important problem due to the prevalence of cameras that continuously monitor our surroundings. Any viable solution to this problem should also minimize the impact on the utility of applications that use images. In this work, we build on the existing work of adversarial learning to design a perturbation mechanism that jointly optimizes privacy and utility objectives. We provide a feasibility study of the proposed mechanism and present ideas on developing a privacy framework based on the adversarial perturbation mechanism.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"10 1","pages":"1329-1332"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84188206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Object State Recognition for Automatic AR-Based Maintenance Guidance 基于ar的自动维修引导对象状态识别
P. Dvorák, Radovan Josth, Elisabetta Delponte
This paper describes a component of an Augmented Reality (AR) based system focused on supporting workers in manufacturing and maintenance industry. Particularly, it describes a component responsible for verification of performed steps. Correct handling is crucial in both manufacturing and maintenance industries and deviations may cause problems in later stages of the production and assembly. The primary aim of such support systems is making the training of new employees faster and more efficient and reducing the error rate. We present a method for automatically recognizing an object's state with the objective of verifying a set of tasks performed by a user. The novelty of our approach is that the system can automatically recognize the state of the object and provide immediate feedback to the operator using an AR visualization enabling fully automatic step-by-step instructions.
本文描述了一个基于增强现实(AR)的系统的组件,该系统主要用于支持制造和维修行业的工人。特别地,它描述了一个负责验证已执行步骤的组件。在制造和维修行业中,正确的处理是至关重要的,偏差可能会在生产和组装的后期阶段造成问题。这种支持系统的主要目的是使新员工的培训更快、更有效,并减少错误率。我们提出了一种自动识别对象状态的方法,目的是验证用户执行的一组任务。该方法的新颖之处在于,系统可以自动识别物体的状态,并使用AR可视化功能向操作员提供即时反馈,从而实现全自动分步指令。
{"title":"Object State Recognition for Automatic AR-Based Maintenance Guidance","authors":"P. Dvorák, Radovan Josth, Elisabetta Delponte","doi":"10.1109/CVPRW.2017.164","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.164","url":null,"abstract":"This paper describes a component of an Augmented Reality (AR) based system focused on supporting workers in manufacturing and maintenance industry. Particularly, it describes a component responsible for verification of performed steps. Correct handling is crucial in both manufacturing and maintenance industries and deviations may cause problems in later stages of the production and assembly. The primary aim of such support systems is making the training of new employees faster and more efficient and reducing the error rate. We present a method for automatically recognizing an object's state with the objective of verifying a set of tasks performed by a user. The novelty of our approach is that the system can automatically recognize the state of the object and provide immediate feedback to the operator using an AR visualization enabling fully automatic step-by-step instructions.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"6 2 1","pages":"1244-1250"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84940027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint Mobile-Cloud Video Stabilization 联合移动云视频稳定
G. S. Adesoye, Oliver Wang
In this work we analyze the complex trade-off between data transfer, computation time, and power consumption when a multi-stage data-intensive algorithm (in this case video stabilization) is split between a low power mobile device and high power cloud server. We evaluate design choices in terms of which intermediate representations should be transferred to the server and back to the mobile device, and present a graph-based solution that can update the optimal joint mobile-cloud computation separation as the hardware configuration or user's requirements change. The practices we employ in this work can be extended to other mobile computer vision applications.
在这项工作中,我们分析了当多阶段数据密集型算法(在本例中为视频稳定)在低功率移动设备和高功率云服务器之间分离时,数据传输,计算时间和功耗之间的复杂权衡。我们根据中间表示应该转移到服务器和返回到移动设备来评估设计选择,并提出了一个基于图形的解决方案,该解决方案可以随着硬件配置或用户需求的变化而更新最佳联合移动云计算分离。我们在这项工作中采用的实践可以扩展到其他移动计算机视觉应用中。
{"title":"Joint Mobile-Cloud Video Stabilization","authors":"G. S. Adesoye, Oliver Wang","doi":"10.1109/CVPRW.2017.49","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.49","url":null,"abstract":"In this work we analyze the complex trade-off between data transfer, computation time, and power consumption when a multi-stage data-intensive algorithm (in this case video stabilization) is split between a low power mobile device and high power cloud server. We evaluate design choices in terms of which intermediate representations should be transferred to the server and back to the mobile device, and present a graph-based solution that can update the optimal joint mobile-cloud computation separation as the hardware configuration or user's requirements change. The practices we employ in this work can be extended to other mobile computer vision applications.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"61 1","pages":"353-360"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85490683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Domain Neural Encoder for Video Representation Learning 用于视频表示学习的时域神经编码器
Hao Hu, Zhaowen Wang, Joon-Young Lee, Zhe L. Lin, Guo-Jun Qi
We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.
我们通过明确地建模时间空间中视觉概念之间的关系来解决学习良好视频表示的挑战。我们提出了一种新的时域保持递归神经网络(TPRNN),它以帧级特征作为输入提取和编码视觉动态。所提出的网络架构通过跟踪同时发生的视觉概念的顺序关系来捕获时间动态,并构建具有时间顺序模式的视频表示。所得到的视频表示有效地编码了动态模式的时间信息,使其对不同动作模式序列下的人类动作更具辨别能力。我们在几个真实的视频数据集上对所提出的模型进行了评估,结果表明它成功地优于基线模型。特别是,我们观察到操作类的显著改进,这些操作类只能通过捕获操作模式的时间顺序来区分。
{"title":"Temporal Domain Neural Encoder for Video Representation Learning","authors":"Hao Hu, Zhaowen Wang, Joon-Young Lee, Zhe L. Lin, Guo-Jun Qi","doi":"10.1109/CVPRW.2017.272","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.272","url":null,"abstract":"We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"7 1","pages":"2192-2199"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83789830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DyadGAN: Generating Facial Expressions in Dyadic Interactions DyadGAN:在Dyadic交互中生成面部表情
Yuchi Huang, Saad M. Khan
Generative Adversarial Networks (GANs) have been shown to produce synthetic face images of compelling realism. In this work, we present a conditional GAN approach to generate contextually valid facial expressions in dyadic human interactions. In contrast to previous work employing conditions related to facial attributes of generated identities, we focused on dyads in an attempt to model the relationship and influence of one person’s facial expressions in the reaction of the other. To this end, we introduced a two level optimization of GANs in interviewerinterviewee dyadic interactions. In the first stage we generate face sketches of the interviewer conditioned on facial expressions of the interviewee. The second stage synthesizes complete face images conditioned on the face sketches generated in the first stage. We demonstrated that our model is effective at generating visually compelling face images in dyadic interactions. Moreover we quantitatively showed that the facial expressions depicted in the generated interviewer face images reflect valid emotional reactions to the interviewee behavior.
生成对抗网络(GANs)已被证明可以产生令人信服的真实感合成人脸图像。在这项工作中,我们提出了一种条件GAN方法来在二元人类互动中生成上下文有效的面部表情。与之前使用与生成的身份的面部属性相关的条件的工作相反,我们专注于二人组,试图模拟一个人的面部表情在另一个人的反应中的关系和影响。为此,我们在访谈者与受访者二元互动中引入了gan的两级优化。在第一阶段,我们根据被采访者的面部表情生成采访者的面部草图。第二阶段以第一阶段生成的人脸草图为条件合成完整的人脸图像。我们证明了我们的模型在二元交互中有效地生成视觉上引人注目的人脸图像。此外,我们定量地表明,在生成的采访者面部图像中描绘的面部表情反映了对采访者行为的有效情绪反应。
{"title":"DyadGAN: Generating Facial Expressions in Dyadic Interactions","authors":"Yuchi Huang, Saad M. Khan","doi":"10.1109/CVPRW.2017.280","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.280","url":null,"abstract":"Generative Adversarial Networks (GANs) have been shown to produce synthetic face images of compelling realism. In this work, we present a conditional GAN approach to generate contextually valid facial expressions in dyadic human interactions. In contrast to previous work employing conditions related to facial attributes of generated identities, we focused on dyads in an attempt to model the relationship and influence of one person’s facial expressions in the reaction of the other. To this end, we introduced a two level optimization of GANs in interviewerinterviewee dyadic interactions. In the first stage we generate face sketches of the interviewer conditioned on facial expressions of the interviewee. The second stage synthesizes complete face images conditioned on the face sketches generated in the first stage. We demonstrated that our model is effective at generating visually compelling face images in dyadic interactions. Moreover we quantitatively showed that the facial expressions depicted in the generated interviewer face images reflect valid emotional reactions to the interviewee behavior.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"30 1","pages":"2259-2266"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80287997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1