首页 > 最新文献

Proceedings of the 30th ACM International Conference on Multimedia最新文献

英文 中文
Uncertainty-Aware Semi-Supervised Learning of 3D Face Rigging from Single Image 单幅图像三维人脸装配的不确定性感知半监督学习
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3548285
Yong Zhao, Haifeng Chen, H. Sahli, Ke Lu, D. Jiang
We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. Existing 3D methods for face synthesis and animation rely heavily on 3D morphable model (3DMM), which was built on 3D data and cannot provide intuitive expression parameters, while AU-driven 2D methods cannot handle head pose and lighting effect. We bridge the gap by integrating a recent 3D reconstruction method with 2D AU-driven method in a semi-supervised fashion. Built upon the auto-encoding 3D face reconstruction model that decouples depth, albedo, viewpoint and light without any supervision, we further decouple expression from identity for depth and albedo with a novel conditional feature translation module and pretrained critics for AU intensity estimation and image classification. Novel objective functions are designed using unlabeled in-the-wild images and in-door images with AU labels. We also leverage uncertainty losses to model the probably changing AU region of images as input noise for synthesis, and model the noisy AU intensity labels for intensity estimation of the AU critic. Experiments with face editing and animation on four datasets show that, compared with six state-of-the-art methods, our proposed method is superior and effective on expression consistency, identity similarity and pose similarity.
我们提出了一种从单输入图像中通过动作单元(au)、视点和光方向来装配三维人脸的方法。现有人脸合成和动画的3D方法严重依赖3D变形模型(3DMM),该模型建立在3D数据基础上,无法提供直观的表情参数,而au驱动的2D方法无法处理头部姿态和光照效果。我们通过以半监督的方式将最近的3D重建方法与2D au驱动方法集成在一起,弥合了这一差距。在自动编码的3D人脸重建模型的基础上,我们在没有任何监督的情况下解耦了深度、反照率、视点和光线,我们进一步用一个新的条件特征翻译模块和预训练的批评来解耦深度和反照率的身份。利用未标记的野外图像和带AU标签的室内图像设计了新的目标函数。我们还利用不确定性损失对图像中可能变化的AU区域进行建模,作为合成的输入噪声,并对有噪声的AU强度标签进行建模,用于AU评论家的强度估计。在4个数据集上进行的人脸编辑和动画实验表明,与现有的6种方法相比,本文提出的方法在表情一致性、身份相似度和姿态相似度方面具有优越性和有效性。
{"title":"Uncertainty-Aware Semi-Supervised Learning of 3D Face Rigging from Single Image","authors":"Yong Zhao, Haifeng Chen, H. Sahli, Ke Lu, D. Jiang","doi":"10.1145/3503161.3548285","DOIUrl":"https://doi.org/10.1145/3503161.3548285","url":null,"abstract":"We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. Existing 3D methods for face synthesis and animation rely heavily on 3D morphable model (3DMM), which was built on 3D data and cannot provide intuitive expression parameters, while AU-driven 2D methods cannot handle head pose and lighting effect. We bridge the gap by integrating a recent 3D reconstruction method with 2D AU-driven method in a semi-supervised fashion. Built upon the auto-encoding 3D face reconstruction model that decouples depth, albedo, viewpoint and light without any supervision, we further decouple expression from identity for depth and albedo with a novel conditional feature translation module and pretrained critics for AU intensity estimation and image classification. Novel objective functions are designed using unlabeled in-the-wild images and in-door images with AU labels. We also leverage uncertainty losses to model the probably changing AU region of images as input noise for synthesis, and model the noisy AU intensity labels for intensity estimation of the AU critic. Experiments with face editing and animation on four datasets show that, compared with six state-of-the-art methods, our proposed method is superior and effective on expression consistency, identity similarity and pose similarity.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"2021 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114506554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compute to Tell the Tale: Goal-Driven Narrative Generation 计算来讲述故事:目标驱动的叙事生成
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3549202
Yongkang Wong, Shaojing Fan, Yangyang Guo, Ziwei Xu, Karen Stephen, Rishabh Sheoran, Anusha Bhamidipati, Vivek Barsopia, Jianquan Liu, Mohan S. Kankanhalli
Man is by nature a social animal. One important facet of human evolution is through narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The factual narrative, such as news, journalism, field report, etc., is based on real-world events and often requires extensive human efforts to create. In the era of big data where video capture devices are commonly available everywhere, a massive amount of raw videos (including life-logging, dashcam or surveillance footage) are generated daily. As a result, it is rather impossible for humans to digest and analyze these video data. This paper reviews the problem of computational narrative generation where a goal-driven narrative (in the form of text with or without video) is generated from a single or multiple long videos. Importantly, the narrative generation problem makes itself distinguished from the existing literature by its focus on a comprehensive understanding of user goal, narrative structure and open-domain input. We tentatively outline a general narrative generation framework and discuss the potential research problems and challenges in this direction. Informed by the real-world impact of narrative generation, we then illustrate several practical use cases in Video Logging as a Service platform which enables users to get more out of the data through a goal-driven intelligent storytelling AI agent.
人天生就是群居动物。人类进化的一个重要方面是通过叙事想象力,无论是虚构的还是真实的,并把故事告诉其他人。事实叙述,如新闻、新闻、实地报道等,是基于现实世界的事件,通常需要大量的人力来创造。在视频捕捉设备随处可见的大数据时代,每天都会产生大量的原始视频(包括生活记录、行车记录仪或监控录像)。因此,人类很难消化和分析这些视频数据。本文回顾了从单个或多个长视频生成目标驱动的叙事(带有或不带有视频的文本形式)的计算叙事生成问题。重要的是,叙事生成问题与现有文献的不同之处在于,它注重对用户目标、叙事结构和开放域输入的全面理解。我们初步勾勒了一个一般的叙事生成框架,并讨论了在这个方向上潜在的研究问题和挑战。根据故事生成的现实世界影响,我们随后说明了视频记录即服务平台中的几个实际用例,该平台使用户能够通过目标驱动的智能故事AI代理从数据中获取更多信息。
{"title":"Compute to Tell the Tale: Goal-Driven Narrative Generation","authors":"Yongkang Wong, Shaojing Fan, Yangyang Guo, Ziwei Xu, Karen Stephen, Rishabh Sheoran, Anusha Bhamidipati, Vivek Barsopia, Jianquan Liu, Mohan S. Kankanhalli","doi":"10.1145/3503161.3549202","DOIUrl":"https://doi.org/10.1145/3503161.3549202","url":null,"abstract":"Man is by nature a social animal. One important facet of human evolution is through narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The factual narrative, such as news, journalism, field report, etc., is based on real-world events and often requires extensive human efforts to create. In the era of big data where video capture devices are commonly available everywhere, a massive amount of raw videos (including life-logging, dashcam or surveillance footage) are generated daily. As a result, it is rather impossible for humans to digest and analyze these video data. This paper reviews the problem of computational narrative generation where a goal-driven narrative (in the form of text with or without video) is generated from a single or multiple long videos. Importantly, the narrative generation problem makes itself distinguished from the existing literature by its focus on a comprehensive understanding of user goal, narrative structure and open-domain input. We tentatively outline a general narrative generation framework and discuss the potential research problems and challenges in this direction. Informed by the real-world impact of narrative generation, we then illustrate several practical use cases in Video Logging as a Service platform which enables users to get more out of the data through a goal-driven intelligent storytelling AI agent.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122018503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Self-Supervised Human Pose based Multi-Camera Video Synchronization 基于多摄像机视频同步的自监督人体姿态
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3547766
Liqiang Yin, Ruize Han, Wei Feng, Song Wang
Multi-view video collaborative analysis is an important task and has many applications in multimedia community. However, it always requires the given multiple videos to be temporally synchronized. Existing methods commonly synchronize the videos by the wired communication, which may hinder the practical application in real world, especially for moving cameras. In this paper, we focus on the human-centric video analysis and propose a self-supervised framework for the automatic multi-camera video synchronization. Specifically, we develop SeSyn-Net with the 2D human pose as input for feature embedding and design a series of self-supervised losses to effectively extract the view-invariant but time-discriminative representation for video synchronization. We also build two new datasets for the performance evaluation. Extensive experimental results verify the effectiveness of our method, which achieves the superior performance compared to both the classical and state-of-the-art methods.
多视点视频协同分析是多媒体社区中的一项重要任务,具有广泛的应用前景。然而,它总是要求给定的多个视频暂时同步。现有的视频同步方法通常采用有线通信方式,这可能会阻碍视频同步在现实世界中的实际应用,特别是在移动摄像机中。本文针对以人为中心的视频分析,提出了一种多摄像机视频自动同步的自监督框架。具体而言,我们开发了sessyn - net,将2D人体姿态作为特征嵌入的输入,并设计了一系列自监督损失,以有效地提取视频同步的视图不变但时间区分的表示。我们还建立了两个新的数据集用于性能评估。大量的实验结果验证了该方法的有效性,与经典方法和最先进的方法相比,该方法具有优越的性能。
{"title":"Self-Supervised Human Pose based Multi-Camera Video Synchronization","authors":"Liqiang Yin, Ruize Han, Wei Feng, Song Wang","doi":"10.1145/3503161.3547766","DOIUrl":"https://doi.org/10.1145/3503161.3547766","url":null,"abstract":"Multi-view video collaborative analysis is an important task and has many applications in multimedia community. However, it always requires the given multiple videos to be temporally synchronized. Existing methods commonly synchronize the videos by the wired communication, which may hinder the practical application in real world, especially for moving cameras. In this paper, we focus on the human-centric video analysis and propose a self-supervised framework for the automatic multi-camera video synchronization. Specifically, we develop SeSyn-Net with the 2D human pose as input for feature embedding and design a series of self-supervised losses to effectively extract the view-invariant but time-discriminative representation for video synchronization. We also build two new datasets for the performance evaluation. Extensive experimental results verify the effectiveness of our method, which achieves the superior performance compared to both the classical and state-of-the-art methods.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"20 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Efficient Multi-View Multimodal Data Processing Framework for Social Media Popularity Prediction 面向社交媒体流行度预测的高效多视图多模态数据处理框架
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3551607
Yunpeng Tan, Fang Liu, Bowei Li, Zheng Zhang, Bo Zhang
Popularity of social media is an important symbol of its communication power. Predictions of social media popularity have tremendous business and social value. In this paper, we propose an efficient multimodal data processing framework, which can comprehensively extract the multi-view features from multimodal social media data and achieve accurate popularity prediction. We utilize Transformer and sliding window average to extract time series features of posts, utilize CatBoost to calculate the importance of different features, and integrate important features extracted from multiple views for accurate prediction of social media popularity. We evaluate our proposed approach with the Social Media Prediction Dataset. Experimental results show that our approach achieves excellent performance in the social media popularity prediction task.
社交媒体的受欢迎程度是其传播能力的重要标志。对社交媒体受欢迎程度的预测具有巨大的商业和社会价值。本文提出了一种高效的多模态数据处理框架,可以从多模态社交媒体数据中全面提取多视图特征,实现准确的人气预测。我们利用Transformer和滑动窗口平均提取帖子的时间序列特征,利用CatBoost计算不同特征的重要性,并整合从多个视图中提取的重要特征,以准确预测社交媒体的流行度。我们用社交媒体预测数据集评估了我们提出的方法。实验结果表明,我们的方法在社交媒体人气预测任务中取得了优异的性能。
{"title":"An Efficient Multi-View Multimodal Data Processing Framework for Social Media Popularity Prediction","authors":"Yunpeng Tan, Fang Liu, Bowei Li, Zheng Zhang, Bo Zhang","doi":"10.1145/3503161.3551607","DOIUrl":"https://doi.org/10.1145/3503161.3551607","url":null,"abstract":"Popularity of social media is an important symbol of its communication power. Predictions of social media popularity have tremendous business and social value. In this paper, we propose an efficient multimodal data processing framework, which can comprehensively extract the multi-view features from multimodal social media data and achieve accurate popularity prediction. We utilize Transformer and sliding window average to extract time series features of posts, utilize CatBoost to calculate the importance of different features, and integrate important features extracted from multiple views for accurate prediction of social media popularity. We evaluate our proposed approach with the Social Media Prediction Dataset. Experimental results show that our approach achieves excellent performance in the social media popularity prediction task.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116848040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Rethinking Optical Flow Methods for Micro-Expression Spotting 微表情识别光流方法的再思考
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3551602
Yuan Zhao, Xin Tong, Zichong Zhu, Jianda Sheng, Lei Dai, Lingling Xu, Xuehai Xia, Y. Jiang, Jiao Li
Micro-expressions (MEs) spotting is popular in some fields, for example, criminal investigation and business communication. But it is still a challenging task to spot the onset and offset of MEs accurately in long videos. This paper refines every step of the workflow before feature extraction, which can reduce error propagation. The workflow takes the advantage of high-quality alignment method, more accurate landmark detector, and also more robust optical flow estimation. Besides, Bayesian optimization hybrid with Nash equilibrium is constructed to search for the optimal parameters. It uses two players to optimize two types of parameters, one player is used to control the ME peak spotting, and another for optical flow field extraction. The algorithm can reduce the search space for each player with better generalization. Finally, our spotting method is evaluated on MEGC2022 spotting task, and achieves F1-score 0.3564 on CAS(ME)3-UNSEEN and F1-score 0.3265 on SAMM-UNSEEN.
微表情识别在刑事侦查、商务沟通等领域非常流行。但是,在长视频中准确地发现微信号的开始和偏移仍然是一项具有挑战性的任务。在特征提取之前,对工作流程的每一步都进行了细化,减少了误差的传播。该工作流程具有高质量的对准方法、更精确的地标检测器和更鲁棒的光流估计等优点。在此基础上,构造了贝叶斯优化与纳什均衡相结合的混合优化算法来寻找最优参数。它使用两个播放器来优化两类参数,一个播放器用于控制ME峰定位,另一个播放器用于光流场提取。该算法可以减少每个玩家的搜索空间,具有较好的泛化性。最后,对我们的方法在MEGC2022上的定位任务进行了评估,在CAS(ME)3-UNSEEN上取得了f1 -得分0.3564,在SAMM-UNSEEN上取得了f1 -得分0.3265。
{"title":"Rethinking Optical Flow Methods for Micro-Expression Spotting","authors":"Yuan Zhao, Xin Tong, Zichong Zhu, Jianda Sheng, Lei Dai, Lingling Xu, Xuehai Xia, Y. Jiang, Jiao Li","doi":"10.1145/3503161.3551602","DOIUrl":"https://doi.org/10.1145/3503161.3551602","url":null,"abstract":"Micro-expressions (MEs) spotting is popular in some fields, for example, criminal investigation and business communication. But it is still a challenging task to spot the onset and offset of MEs accurately in long videos. This paper refines every step of the workflow before feature extraction, which can reduce error propagation. The workflow takes the advantage of high-quality alignment method, more accurate landmark detector, and also more robust optical flow estimation. Besides, Bayesian optimization hybrid with Nash equilibrium is constructed to search for the optimal parameters. It uses two players to optimize two types of parameters, one player is used to control the ME peak spotting, and another for optical flow field extraction. The algorithm can reduce the search space for each player with better generalization. Finally, our spotting method is evaluated on MEGC2022 spotting task, and achieves F1-score 0.3564 on CAS(ME)3-UNSEEN and F1-score 0.3265 on SAMM-UNSEEN.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"69 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128793264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
CRNet: Unsupervised Color Retention Network for Blind Motion Deblurring 用于盲运动去模糊的无监督色彩保留网络
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3547962
Suiyi Zhao, Zhao Zhang, Richang Hong, Mingliang Xu, Haijun Zhang, Meng Wang, Shuicheng Yan
Blind image deblurring is still a challenging problem due to the inherent ill-posed properties. To improve the deblurring performance, many supervised methods have been proposed. However, obtaining labeled samples from a specific distribution (or a domain) is usually expensive, and the data-driven training-based model also cannot be generalized to the blurry images in all domains. These challenges have given birth to certain unsupervised deblurring methods. However, there is a great chromatic aberration between the latent and original images, directly degrading the performance. In this paper, we therefore propose a novel unsupervised color retention network termed CRNet to perform blind motion deblurring. In addition, new concepts of blur offset estimation and adaptive blur correction are proposed to retain the color information when deblurring. As a result, unlike the previous studies, CRNet does not learn a mapping directly from the blurry image to the restored latent image, but from the blurry image to a motion offset. An adaptive blur correction operation is then performed on the blurry image to restore the latent image, thereby retaining the color information of the original image to the greatest extent. To further effectively retain the color information and extract the blur information, we also propose a new module called pyramid global blur feature perception (PGBFP). To quantitatively prove the effectiveness of our network in color retention, we propose a novel chromatic aberration quantization metrics in line with the human perception. Extensive quantitative and visualization experiments show that CRNet can obtain the state-of-the-art performance in unsupervised deblurring tasks.
由于图像固有的病态性,盲图像去模糊仍然是一个具有挑战性的问题。为了提高去模糊性能,人们提出了许多监督方法。然而,从一个特定的分布(或一个领域)中获取有标签的样本通常是昂贵的,并且基于数据驱动的训练模型也不能推广到所有领域的模糊图像。这些挑战催生了一些无监督的去模糊方法。但是,潜在图像和原始图像之间存在较大的色差,直接降低了性能。因此,在本文中,我们提出了一种称为CRNet的新型无监督颜色保留网络来执行盲运动去模糊。此外,还提出了模糊偏移估计和自适应模糊校正的新概念,以在去模糊时保留颜色信息。因此,与以往的研究不同,CRNet不是直接从模糊图像到恢复的潜在图像学习映射,而是从模糊图像到运动偏移。然后对模糊图像进行自适应模糊校正操作,恢复潜像,从而最大程度地保留原始图像的颜色信息。为了进一步有效地保留颜色信息和提取模糊信息,我们还提出了一个新的模块金字塔全局模糊特征感知(PGBFP)。为了定量地证明我们的网络在颜色保留方面的有效性,我们提出了一种新的符合人类感知的色差量化度量。大量的定量和可视化实验表明,CRNet可以在无监督去模糊任务中获得最先进的性能。
{"title":"CRNet: Unsupervised Color Retention Network for Blind Motion Deblurring","authors":"Suiyi Zhao, Zhao Zhang, Richang Hong, Mingliang Xu, Haijun Zhang, Meng Wang, Shuicheng Yan","doi":"10.1145/3503161.3547962","DOIUrl":"https://doi.org/10.1145/3503161.3547962","url":null,"abstract":"Blind image deblurring is still a challenging problem due to the inherent ill-posed properties. To improve the deblurring performance, many supervised methods have been proposed. However, obtaining labeled samples from a specific distribution (or a domain) is usually expensive, and the data-driven training-based model also cannot be generalized to the blurry images in all domains. These challenges have given birth to certain unsupervised deblurring methods. However, there is a great chromatic aberration between the latent and original images, directly degrading the performance. In this paper, we therefore propose a novel unsupervised color retention network termed CRNet to perform blind motion deblurring. In addition, new concepts of blur offset estimation and adaptive blur correction are proposed to retain the color information when deblurring. As a result, unlike the previous studies, CRNet does not learn a mapping directly from the blurry image to the restored latent image, but from the blurry image to a motion offset. An adaptive blur correction operation is then performed on the blurry image to restore the latent image, thereby retaining the color information of the original image to the greatest extent. To further effectively retain the color information and extract the blur information, we also propose a new module called pyramid global blur feature perception (PGBFP). To quantitatively prove the effectiveness of our network in color retention, we propose a novel chromatic aberration quantization metrics in line with the human perception. Extensive quantitative and visualization experiments show that CRNet can obtain the state-of-the-art performance in unsupervised deblurring tasks.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128567410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Machine Unlearning for Image Retrieval: A Generative Scrubbing Approach 图像检索的机器学习:一种生成式擦洗方法
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3548378
P. Zhang, Guangdong Bai, Zi Huang, Xin-Shun Xu
Data owners have the right to request for deleting their data from a machine learning (ML) model. In response, a naïve way is to retrain the model with the original dataset excluding the data to forget, which is however unrealistic as the required dataset may no longer be available and the retraining process is usually computationally expensive. To cope with this reality, machine unlearning has recently attained much attention, which aims to enable data removal from a trained ML model responding to deletion requests, without retraining the model from scratch or full access to the original training dataset. Existing unlearning methods mainly focus on handling conventional ML methods, while unlearning deep neural networks (DNNs) based models remains underexplored, especially for the ones trained on large-scale datasets. In this paper, we make the first attempt to realize data forgetting on deep models for image retrieval. Image retrieval targets at searching relevant data to the query according to similarity measures. Intuitively, unlearning a deep image retrieval model can be achieved by breaking down its ability of similarity modeling on the data to forget. To this end, we propose a generative scrubbing (GS) method that learns a generator to craft noisy data to manipulate the model weights. A novel framework is designed consisting of the generator and the target retrieval model, where a pair of coupled static and dynamic learning procedures are performed simultaneously. This novel learning strategy effectively enables the generated noisy data to fade away the memory of the model on the data to forget whilst retaining the information of the remaining data. Extensive experiments on three widely-used datasets have successfully verified the effectiveness of the proposed method.
数据所有者有权要求从机器学习(ML)模型中删除其数据。作为回应,naïve的一种方法是用原始数据集重新训练模型,不包括要忘记的数据,然而这是不现实的,因为所需的数据集可能不再可用,而且重新训练过程通常在计算上很昂贵。为了应对这一现实,机器学习最近得到了很多关注,其目的是在不重新训练模型或完全访问原始训练数据集的情况下,从响应删除请求的训练ML模型中删除数据。现有的学习方法主要集中在处理传统的机器学习方法上,而基于深度神经网络(dnn)的模型的学习仍然没有得到充分的探索,特别是对于那些在大规模数据集上训练的模型。本文首次尝试在深度模型上实现用于图像检索的数据遗忘。图像检索的目标是根据相似度度量来搜索与查询相关的数据。直观地,可以通过分解其对数据的相似性建模能力来实现深度图像检索模型的遗忘。为此,我们提出了一种生成洗涤(GS)方法,该方法学习生成器来制作有噪声的数据以操纵模型权重。设计了一个由生成器和目标检索模型组成的框架,其中一对耦合的静态和动态学习过程同时进行。这种新颖的学习策略有效地使生成的带噪数据在保留剩余数据信息的同时,逐渐消除模型对数据的记忆。在三个广泛使用的数据集上进行的大量实验成功地验证了该方法的有效性。
{"title":"Machine Unlearning for Image Retrieval: A Generative Scrubbing Approach","authors":"P. Zhang, Guangdong Bai, Zi Huang, Xin-Shun Xu","doi":"10.1145/3503161.3548378","DOIUrl":"https://doi.org/10.1145/3503161.3548378","url":null,"abstract":"Data owners have the right to request for deleting their data from a machine learning (ML) model. In response, a naïve way is to retrain the model with the original dataset excluding the data to forget, which is however unrealistic as the required dataset may no longer be available and the retraining process is usually computationally expensive. To cope with this reality, machine unlearning has recently attained much attention, which aims to enable data removal from a trained ML model responding to deletion requests, without retraining the model from scratch or full access to the original training dataset. Existing unlearning methods mainly focus on handling conventional ML methods, while unlearning deep neural networks (DNNs) based models remains underexplored, especially for the ones trained on large-scale datasets. In this paper, we make the first attempt to realize data forgetting on deep models for image retrieval. Image retrieval targets at searching relevant data to the query according to similarity measures. Intuitively, unlearning a deep image retrieval model can be achieved by breaking down its ability of similarity modeling on the data to forget. To this end, we propose a generative scrubbing (GS) method that learns a generator to craft noisy data to manipulate the model weights. A novel framework is designed consisting of the generator and the target retrieval model, where a pair of coupled static and dynamic learning procedures are performed simultaneously. This novel learning strategy effectively enables the generated noisy data to fade away the memory of the model on the data to forget whilst retaining the information of the remaining data. Extensive experiments on three widely-used datasets have successfully verified the effectiveness of the proposed method.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128593452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Talk2Face: A Unified Sequence-based Framework for Diverse Face Generation and Analysis Tasks Talk2Face:一个统一的基于序列的框架,用于多种人脸生成和分析任务
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3548205
Yudong Li, Xianxu Hou, Zhe Zhao, Linlin Shen, Xuefeng Yang, Kimmo Yan
Facial analysis is an important domain in computer vision and has received extensive research attention. For numerous downstream tasks with different input/output formats and modalities, existing methods usually design task-specific architectures and train them using face datasets collected in the particular task domain. In this work, we proposed a single model, Talk2Face, to simultaneously tackle a large number of face generation and analysis tasks, e.g. text guided face synthesis, face captioning and age estimation. Specifically, we cast different tasks into a sequence-to-sequence format with the same architecture, parameters and objectives. While text and facial images are tokenized to sequences, the annotation labels of faces for different tasks are also converted to natural languages for unified representation. We collect a set of 2.3M face-text pairs from available datasets across different tasks, to train the proposed model. Uniform templates are then designed to enable the model to perform different downstream tasks, according to the task context and target. Experiments on different tasks show that our model achieves better face generation and caption performances than SOTA approaches. On age estimation and multi-attribute classification, our model reaches competitive performance with those models specially designed and trained for these particular tasks. In practice, our model is much easier to be deployed to different facial analysis related tasks. Code and dataset will be available at https://github.com/ydli-ai/Talk2Face.
人脸分析是计算机视觉的一个重要研究领域,受到了广泛的关注。对于具有不同输入/输出格式和模式的众多下游任务,现有方法通常是设计特定于任务的架构,并使用在特定任务域中收集的人脸数据集对其进行训练。在这项工作中,我们提出了一个单一的模型,Talk2Face,同时处理大量的人脸生成和分析任务,如文本引导的人脸合成,人脸字幕和年龄估计。具体来说,我们将不同的任务转换为具有相同架构、参数和目标的序列到序列格式。在将文本和面部图像标记为序列的同时,还将不同任务的面部标注标签转换为自然语言进行统一表示。我们从不同任务的可用数据集中收集了一组230万对人脸文本对,以训练所提出的模型。然后设计统一模板,使模型能够根据任务上下文和目标执行不同的下游任务。在不同任务上的实验表明,我们的模型比SOTA方法获得了更好的人脸生成和标题性能。在年龄估计和多属性分类方面,我们的模型与那些专门为这些特定任务设计和训练的模型达到了竞争性能。在实践中,我们的模型更容易部署到不同的面部分析相关任务中。代码和数据集可在https://github.com/ydli-ai/Talk2Face上获得。
{"title":"Talk2Face: A Unified Sequence-based Framework for Diverse Face Generation and Analysis Tasks","authors":"Yudong Li, Xianxu Hou, Zhe Zhao, Linlin Shen, Xuefeng Yang, Kimmo Yan","doi":"10.1145/3503161.3548205","DOIUrl":"https://doi.org/10.1145/3503161.3548205","url":null,"abstract":"Facial analysis is an important domain in computer vision and has received extensive research attention. For numerous downstream tasks with different input/output formats and modalities, existing methods usually design task-specific architectures and train them using face datasets collected in the particular task domain. In this work, we proposed a single model, Talk2Face, to simultaneously tackle a large number of face generation and analysis tasks, e.g. text guided face synthesis, face captioning and age estimation. Specifically, we cast different tasks into a sequence-to-sequence format with the same architecture, parameters and objectives. While text and facial images are tokenized to sequences, the annotation labels of faces for different tasks are also converted to natural languages for unified representation. We collect a set of 2.3M face-text pairs from available datasets across different tasks, to train the proposed model. Uniform templates are then designed to enable the model to perform different downstream tasks, according to the task context and target. Experiments on different tasks show that our model achieves better face generation and caption performances than SOTA approaches. On age estimation and multi-attribute classification, our model reaches competitive performance with those models specially designed and trained for these particular tasks. In practice, our model is much easier to be deployed to different facial analysis related tasks. Code and dataset will be available at https://github.com/ydli-ai/Talk2Face.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128958841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PRO-Face: A Generic Framework for Privacy-preserving Recognizable Obfuscation of Face Images PRO-Face:一个保护隐私的人脸图像可识别混淆的通用框架
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3548202
Lin Yuan, Linguo Liu, Xiao Pu, Zhao Li, Hongbo Li, Xinbo Gao
A number of applications (e.g., video surveillance and authentication) rely on automated face recognition to guarantee functioning of secure services, and meanwhile, have to take into account the privacy of individuals exposed under camera systems. This is the so-called Privacy-Utility trade-off. However, most existing approaches to facial privacy protection focus on removing identifiable visual information from images, leaving protected face unrecognizable to machine, which sacrifice utility for privacy. To tackle the privacy-utility challenge, we propose a novel, generic, effective, yet lightweight framework for Privacy-preserving Recognizable Obfuscation of Face images (named as PRO-Face). The framework allows one to first process a face image using any preferred obfuscation, such as image blur, pixelate and face morphing. It then leverages a Siamese network to fuse the original image with its obfuscated form, generating the final protected image visually similar to the obfuscated one from human perception (for privacy) but still recognized as the original identity by machine (for utility). The framework supports various obfuscations for facial anonymization. The face recognition can be performed accurately not only across anonymized images but also between plain and anonymized ones, based on only pre-trained recognizers. Those feature the "generic" merit of the proposed framework. In-depth objective and subjective evaluations demonstrate the effectiveness of the proposed framework in both privacy protection and utility preservation under distinct scenarios. Our source code, models and any supplementary materials are made publicly available.
许多应用(例如,视频监控和身份验证)依赖于自动面部识别来保证安全服务的功能,同时,必须考虑到暴露在摄像机系统下的个人隐私。这就是所谓的隐私-效用权衡。然而,现有的面部隐私保护方法大多侧重于从图像中去除可识别的视觉信息,使被保护的面部无法被机器识别,从而牺牲了实用性。为了解决隐私实用性的挑战,我们提出了一种新颖、通用、有效且轻量级的框架,用于保护隐私的人脸图像可识别混淆(称为PRO-Face)。该框架允许人们首先使用任何首选的混淆处理人脸图像,例如图像模糊、像素化和人脸变形。然后,它利用Siamese网络将原始图像与其混淆形式融合在一起,生成最终的受保护图像,在视觉上与人类感知的混淆图像相似(为了隐私),但仍然被机器识别为原始身份(为了实用)。该框架支持面部匿名化的各种混淆。基于预训练的识别器不仅可以准确地识别匿名图像,而且可以在普通图像和匿名图像之间进行准确的识别。这些特点是拟议框架的“一般”优点。深入的客观和主观评价证明了该框架在不同场景下的隐私保护和效用保护的有效性。我们的源代码、模型和任何补充材料都是公开的。
{"title":"PRO-Face: A Generic Framework for Privacy-preserving Recognizable Obfuscation of Face Images","authors":"Lin Yuan, Linguo Liu, Xiao Pu, Zhao Li, Hongbo Li, Xinbo Gao","doi":"10.1145/3503161.3548202","DOIUrl":"https://doi.org/10.1145/3503161.3548202","url":null,"abstract":"A number of applications (e.g., video surveillance and authentication) rely on automated face recognition to guarantee functioning of secure services, and meanwhile, have to take into account the privacy of individuals exposed under camera systems. This is the so-called Privacy-Utility trade-off. However, most existing approaches to facial privacy protection focus on removing identifiable visual information from images, leaving protected face unrecognizable to machine, which sacrifice utility for privacy. To tackle the privacy-utility challenge, we propose a novel, generic, effective, yet lightweight framework for Privacy-preserving Recognizable Obfuscation of Face images (named as PRO-Face). The framework allows one to first process a face image using any preferred obfuscation, such as image blur, pixelate and face morphing. It then leverages a Siamese network to fuse the original image with its obfuscated form, generating the final protected image visually similar to the obfuscated one from human perception (for privacy) but still recognized as the original identity by machine (for utility). The framework supports various obfuscations for facial anonymization. The face recognition can be performed accurately not only across anonymized images but also between plain and anonymized ones, based on only pre-trained recognizers. Those feature the \"generic\" merit of the proposed framework. In-depth objective and subjective evaluations demonstrate the effectiveness of the proposed framework in both privacy protection and utility preservation under distinct scenarios. Our source code, models and any supplementary materials are made publicly available.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129003302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Probabilistic Model for Controlling Diversity and Accuracy of Ambiguous Medical Image Segmentation 一种控制模糊医学图像分割多样性和准确性的概率模型
Pub Date : 2022-10-10 DOI: 10.1145/3503161.3548115
Wei Zhang, Xiaohong Zhang, Sheng Huang, Yuting Lu, Kun Wang
Medical image segmentation tasks often have more than one plausible annotation for a given input image due to its inherent ambiguity. Generating multiple plausible predictions for a single image is of interest for medical critical applications. Many methods estimate the distribution of the annotation space by developing probabilistic models to generate multiple hypotheses. However, these methods aim to improve the diversity of predictions at the expense of the more important accuracy. In this paper, we propose a novel probabilistic segmentation model, called Joint Probabilistic U-net, which successfully achieves flexible control over the two abstract conceptions of diversity and accuracy. Specifically, we (i) model the joint distribution of images and annotations to learn a latent space, which is used to decouple diversity and accuracy, and (ii) transform the Gaussian distribution in the latent space to a complex distribution to improve model's expressiveness. In addition, we explore two strategies for preventing the latent space collapse, which are effective in improving the model's performance on datasets with limited annotation. We demonstrate the effectiveness of the proposed model on two medical image datasets, i.e. LIDC-IDRI and ISBI 2016, and achieved state-of-the-art results on several metrics.
医学图像分割任务由于其固有的模糊性,通常对给定的输入图像有多个合理的注释。为单个图像生成多个合理的预测是医学关键应用的兴趣。许多方法通过建立概率模型来产生多个假设来估计标注空间的分布。然而,这些方法旨在提高预测的多样性,而牺牲了更重要的准确性。本文提出了一种新的概率分割模型——联合概率U-net,该模型成功地实现了对多样性和准确性两个抽象概念的灵活控制。具体而言,我们(i)对图像和注释的联合分布进行建模,学习潜在空间,用于解耦多样性和准确性;(ii)将潜在空间中的高斯分布转换为复分布,以提高模型的表达性。此外,我们还探索了两种防止潜在空间崩溃的策略,这两种策略有效地提高了模型在有限注释数据集上的性能。我们在两个医学图像数据集(即LIDC-IDRI和ISBI 2016)上证明了所提出模型的有效性,并在几个指标上取得了最先进的结果。
{"title":"A Probabilistic Model for Controlling Diversity and Accuracy of Ambiguous Medical Image Segmentation","authors":"Wei Zhang, Xiaohong Zhang, Sheng Huang, Yuting Lu, Kun Wang","doi":"10.1145/3503161.3548115","DOIUrl":"https://doi.org/10.1145/3503161.3548115","url":null,"abstract":"Medical image segmentation tasks often have more than one plausible annotation for a given input image due to its inherent ambiguity. Generating multiple plausible predictions for a single image is of interest for medical critical applications. Many methods estimate the distribution of the annotation space by developing probabilistic models to generate multiple hypotheses. However, these methods aim to improve the diversity of predictions at the expense of the more important accuracy. In this paper, we propose a novel probabilistic segmentation model, called Joint Probabilistic U-net, which successfully achieves flexible control over the two abstract conceptions of diversity and accuracy. Specifically, we (i) model the joint distribution of images and annotations to learn a latent space, which is used to decouple diversity and accuracy, and (ii) transform the Gaussian distribution in the latent space to a complex distribution to improve model's expressiveness. In addition, we explore two strategies for preventing the latent space collapse, which are effective in improving the model's performance on datasets with limited annotation. We demonstrate the effectiveness of the proposed model on two medical image datasets, i.e. LIDC-IDRI and ISBI 2016, and achieved state-of-the-art results on several metrics.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129705054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 30th ACM International Conference on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1