Hui Lu, Xuan Cheng, Wentao Xia, Pan Deng, Minghui Liu, Tianshu Xie, Xiaomin Wang, Meilin Liu
In this paper, we propose a simple yet effective data augmentation strategy, dubbed CyclicShift, to enrich data patterns. The idea is to shift the image in a certain direction and then circularly refill the resultant out-of-frame part to the other side. Compared with previous related methods, Translation, and Shuffle, our proposed method is able to avoid losing pixels of the original image and preserve its semantic information as much as possible. Visually and emprically, we show that our method indeed brings new data patterns and thereby improves the generalization ability as well as the performance of models. Extensive experiments demonstrate our method's effectiveness in image classification and fine-grained recognition over multiple datasets and various network architectures. Furthermore, our method can also be superimposed on other data augmentation methods in a very simple way. CyclicMix, the simultaneous use of CyclicShift and CutMix, hits a new high in most cases. Our code is open-source and available at https://github.com/dejavunHui/CyclicShift.
{"title":"CyclicShift: A Data Augmentation Method For Enriching Data Patterns","authors":"Hui Lu, Xuan Cheng, Wentao Xia, Pan Deng, Minghui Liu, Tianshu Xie, Xiaomin Wang, Meilin Liu","doi":"10.1145/3503161.3548188","DOIUrl":"https://doi.org/10.1145/3503161.3548188","url":null,"abstract":"In this paper, we propose a simple yet effective data augmentation strategy, dubbed CyclicShift, to enrich data patterns. The idea is to shift the image in a certain direction and then circularly refill the resultant out-of-frame part to the other side. Compared with previous related methods, Translation, and Shuffle, our proposed method is able to avoid losing pixels of the original image and preserve its semantic information as much as possible. Visually and emprically, we show that our method indeed brings new data patterns and thereby improves the generalization ability as well as the performance of models. Extensive experiments demonstrate our method's effectiveness in image classification and fine-grained recognition over multiple datasets and various network architectures. Furthermore, our method can also be superimposed on other data augmentation methods in a very simple way. CyclicMix, the simultaneous use of CyclicShift and CutMix, hits a new high in most cases. Our code is open-source and available at https://github.com/dejavunHui/CyclicShift.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114632640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly. It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models. Prior works put great effort into tackling the issue. Unfortunately, these works always highly rely on strong assumptions or remain heuristic without theoretical guarantees. In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted. Specifically, we hypothesize that, before training data are corrupted by label noise, each class conforms to a multivariate Gaussian distribution at the feature level. Label noise produces outliers to shift the Gaussian distribution. During training, to calibrate the shifted distribution, we propose two methods based on the mean and covariance of multivariate Gaussian distribution respectively. The mean-based method works in a recursive dimension-reduction manner for robust mean estimation, which is theoretically guaranteed to train a high-quality model against label noise. The covariance-based method works in a distribution disturbance manner, which is experimentally verified to improve the model robustness. We demonstrate the utility and effectiveness of our methods on datasets with synthetic label noise and real-world unknown noise.
{"title":"Tackling Instance-Dependent Label Noise with Dynamic Distribution Calibration","authors":"Manyi Zhang, Yuxin Ren, Zihao Wang, C. Yuan","doi":"10.1145/3503161.3547984","DOIUrl":"https://doi.org/10.1145/3503161.3547984","url":null,"abstract":"Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly. It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models. Prior works put great effort into tackling the issue. Unfortunately, these works always highly rely on strong assumptions or remain heuristic without theoretical guarantees. In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted. Specifically, we hypothesize that, before training data are corrupted by label noise, each class conforms to a multivariate Gaussian distribution at the feature level. Label noise produces outliers to shift the Gaussian distribution. During training, to calibrate the shifted distribution, we propose two methods based on the mean and covariance of multivariate Gaussian distribution respectively. The mean-based method works in a recursive dimension-reduction manner for robust mean estimation, which is theoretically guaranteed to train a high-quality model against label noise. The covariance-based method works in a distribution disturbance manner, which is experimentally verified to improve the model robustness. We demonstrate the utility and effectiveness of our methods on datasets with synthetic label noise and real-world unknown noise.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117001230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimodal conversational agents are an ever expanding field which benefits from the introduction of large language models. Production-ready robust conversational assistants trade breadth of scope for higher accuracy and general dialogue quality. These conversational assistants must be able to maintain the conversation focused, respond appropriately to user requests, maintain a certain level of natural response generation, be robust to out-of-scope and chitchat attempts, and, of course, be accurate in assisting the user in reaching their domain-specific goals. This work discusses data-centric observations, alongside providing research hypothesis for future, and some of my already developed work, to be expanded throughout my PhD.
{"title":"Zero-shot Generalization of Multimodal Dialogue Agents","authors":"Diogo Tavares","doi":"10.1145/3503161.3548759","DOIUrl":"https://doi.org/10.1145/3503161.3548759","url":null,"abstract":"Multimodal conversational agents are an ever expanding field which benefits from the introduction of large language models. Production-ready robust conversational assistants trade breadth of scope for higher accuracy and general dialogue quality. These conversational assistants must be able to maintain the conversation focused, respond appropriately to user requests, maintain a certain level of natural response generation, be robust to out-of-scope and chitchat attempts, and, of course, be accurate in assisting the user in reaching their domain-specific goals. This work discusses data-centric observations, alongside providing research hypothesis for future, and some of my already developed work, to be expanded throughout my PhD.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikhil Bansal, Kartiki Gupta, Kiruthika Kannan, Sivani Pentapati, R. Sarvadevabhatla
Pictionary, the popular sketch-based guessing game, provides an opportunity to analyze shared goal cooperative game play in restricted communication settings. However, some players occasionally draw atypical sketch content. While such content is occasionally relevant in the game context, it sometimes represents a rule violation and impairs the game experience. To address such situations in a timely and scalable manner, we introduce DrawMon, a novel distributed framework for automatic detection of atypical sketch content in concurrently occurring Pictionary game sessions. We build specialized online interfaces to collect game session data and annotate atypical sketch content, resulting in AtyPict, the first ever atypical sketch content dataset. We use AtyPict to train CanvasNet, a deep neural atypical content detection network. We utilize CanvasNet as a core component of DrawMon. Our analysis of post deployment game session data indicates DrawMon's effectiveness for scalable monitoring and atypical sketch content detection. Beyond Pictionary, our contributions also serve as a design guide for customized atypical content response systems involving shared and interactive whiteboards. Code and datasets are available at https://drawm0n.github.io.
{"title":"DrawMon: A Distributed System for Detection of Atypical Sketch Content in Concurrent Pictionary Games","authors":"Nikhil Bansal, Kartiki Gupta, Kiruthika Kannan, Sivani Pentapati, R. Sarvadevabhatla","doi":"10.1145/3503161.3547747","DOIUrl":"https://doi.org/10.1145/3503161.3547747","url":null,"abstract":"Pictionary, the popular sketch-based guessing game, provides an opportunity to analyze shared goal cooperative game play in restricted communication settings. However, some players occasionally draw atypical sketch content. While such content is occasionally relevant in the game context, it sometimes represents a rule violation and impairs the game experience. To address such situations in a timely and scalable manner, we introduce DrawMon, a novel distributed framework for automatic detection of atypical sketch content in concurrently occurring Pictionary game sessions. We build specialized online interfaces to collect game session data and annotate atypical sketch content, resulting in AtyPict, the first ever atypical sketch content dataset. We use AtyPict to train CanvasNet, a deep neural atypical content detection network. We utilize CanvasNet as a core component of DrawMon. Our analysis of post deployment game session data indicates DrawMon's effectiveness for scalable monitoring and atypical sketch content detection. Beyond Pictionary, our contributions also serve as a design guide for customized atypical content response systems involving shared and interactive whiteboards. Code and datasets are available at https://drawm0n.github.io.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116143681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Cui, Lei Zhu, Jingjing Li, Zheng Zhang, Weili Guan
Recent studies have verified the success of deep hashing for efficient image retrieval. However, most existing methods require abundant human labeling data to optimize the large number of involved network parameters, which consequently restricts the scalability of deep image hashing. Alternatively, learning from freely available web images that inherently include rich semantics is a promising strategy. Nevertheless, the domain distribution gap will prevent transferring the semantics involved in the source web images to the target images. Besides, most existing deep image hashing methods suffer from excessive training time to achieve satisfactory performance without explicit supervision. How to efficiently train the deep image hashing network is another important problem that needs to be seriously considered. In this paper, we propose a Webly Supervised Image Hashing (WSIH) with a well-designed lightweight network. Our model enhances the semantics of unsupervised image hashing with the weak supervision from freely available web images, and simultaneously avoids involving over-abundant parameters in the deep network architecture. Particularly, we train a concept prototype learning network on the web images, learning well-trained network parameters and the prototype codes that hold the discriminative semantics of the potential visual concepts in target images. Further, we meticulously design a lightweight siamese network architecture and a dual-level transfer mechanism to efficiently translate the semantics learned from source web images to the target images. Experiments on two widely-tested image datasets show the superiority of the proposed method in both retrieval accuracy and training efficiency compared to state-of-the-art image hashing methods.The source codes of our method are available at: https://github.com/christinecui/WSIH.
{"title":"Webly Supervised Image Hashing with Lightweight Semantic Transfer Network","authors":"Hui Cui, Lei Zhu, Jingjing Li, Zheng Zhang, Weili Guan","doi":"10.1145/3503161.3548342","DOIUrl":"https://doi.org/10.1145/3503161.3548342","url":null,"abstract":"Recent studies have verified the success of deep hashing for efficient image retrieval. However, most existing methods require abundant human labeling data to optimize the large number of involved network parameters, which consequently restricts the scalability of deep image hashing. Alternatively, learning from freely available web images that inherently include rich semantics is a promising strategy. Nevertheless, the domain distribution gap will prevent transferring the semantics involved in the source web images to the target images. Besides, most existing deep image hashing methods suffer from excessive training time to achieve satisfactory performance without explicit supervision. How to efficiently train the deep image hashing network is another important problem that needs to be seriously considered. In this paper, we propose a Webly Supervised Image Hashing (WSIH) with a well-designed lightweight network. Our model enhances the semantics of unsupervised image hashing with the weak supervision from freely available web images, and simultaneously avoids involving over-abundant parameters in the deep network architecture. Particularly, we train a concept prototype learning network on the web images, learning well-trained network parameters and the prototype codes that hold the discriminative semantics of the potential visual concepts in target images. Further, we meticulously design a lightweight siamese network architecture and a dual-level transfer mechanism to efficiently translate the semantics learned from source web images to the target images. Experiments on two widely-tested image datasets show the superiority of the proposed method in both retrieval accuracy and training efficiency compared to state-of-the-art image hashing methods.The source codes of our method are available at: https://github.com/christinecui/WSIH.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123551233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big business in manufacturing (in the past) and, more recently, giants in the Internet and distribution industry have attracted the attention of both policymakers and economists because of, among other factors, their power to distort competition and to impose conditions on consumers. Market power is therefore a key concept in economics and a central, maybe unwelcome, presence in the economy. On these bases it would be logical to expect full attention to have been paid to the history of this concept and of its causes. But in fact, a gap in the literature seems to exist, and this book aims to fill it. The author’s choice is to do so by looking at the work of four major Italian economists: Vilfredo Pareto, Maffeo Pantaleoni, Antonio De Viti de Marco, and Enrico Barone. In so doing, the book takes on multiple tasks: not only to look for the roots of ideas on market power and competition but also to define these relevant figures and to explore and highlight their more general contribution to the development of economic ideas and to policymaking. The book is organized in five main chapters, completed by an introduction and a final section with some general conclusions. The introduction properly paves the way for the following analysis, putting the reader in the best position to navigate the volume and appreciate the logic behind the connections among the various parts of the book. In the first chapter the author sets the scene for an in-depth analysis of the work of the four Italian economists who are the protagonists of the book by reviewing, with a historical approach, the literature(s) engaging with the concept of monopoly power. Given the complex nature of this idea, the chapter moves along four different paths: the history of formal models of profit maximization under imperfect competition; the history of competition policy; the theory of competition; and the definition of the concept of entry barriers. The following chapter (chapter 2) deals more directly with the contribution by the Italian marginalists, focusing on what looks like one side of the debate on market power: the issue of competition and the conditions that might make it less than perfect. In this, a fundamental distinction is made between “static” lack of competition—resulting from structural barriers to the free movement of actors in the market —and “dynamic” (temporary) situations of limited competition caused by innovations and other factors, a theme central to the so-called classical school of economic thought, too. The view, stressed by Italian Book Reviews / 879
{"title":"MONOPOLY","authors":"Puneet Mathur, A. Neerkaje, Malika Chhibber, Ramit Sawhney, Fuming Guo, Franck Dernoncourt, Sanghamitra Dutta, Dinesh Manocha","doi":"10.1145/3503161.3548380","DOIUrl":"https://doi.org/10.1145/3503161.3548380","url":null,"abstract":"Big business in manufacturing (in the past) and, more recently, giants in the Internet and distribution industry have attracted the attention of both policymakers and economists because of, among other factors, their power to distort competition and to impose conditions on consumers. Market power is therefore a key concept in economics and a central, maybe unwelcome, presence in the economy. On these bases it would be logical to expect full attention to have been paid to the history of this concept and of its causes. But in fact, a gap in the literature seems to exist, and this book aims to fill it. The author’s choice is to do so by looking at the work of four major Italian economists: Vilfredo Pareto, Maffeo Pantaleoni, Antonio De Viti de Marco, and Enrico Barone. In so doing, the book takes on multiple tasks: not only to look for the roots of ideas on market power and competition but also to define these relevant figures and to explore and highlight their more general contribution to the development of economic ideas and to policymaking. The book is organized in five main chapters, completed by an introduction and a final section with some general conclusions. The introduction properly paves the way for the following analysis, putting the reader in the best position to navigate the volume and appreciate the logic behind the connections among the various parts of the book. In the first chapter the author sets the scene for an in-depth analysis of the work of the four Italian economists who are the protagonists of the book by reviewing, with a historical approach, the literature(s) engaging with the concept of monopoly power. Given the complex nature of this idea, the chapter moves along four different paths: the history of formal models of profit maximization under imperfect competition; the history of competition policy; the theory of competition; and the definition of the concept of entry barriers. The following chapter (chapter 2) deals more directly with the contribution by the Italian marginalists, focusing on what looks like one side of the debate on market power: the issue of competition and the conditions that might make it less than perfect. In this, a fundamental distinction is made between “static” lack of competition—resulting from structural barriers to the free movement of actors in the market —and “dynamic” (temporary) situations of limited competition caused by innovations and other factors, a theme central to the so-called classical school of economic thought, too. The view, stressed by Italian Book Reviews / 879","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"30 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116834094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blind image inpainting is extremely challenging due to the unknown and multi-property complexity of contamination in different contaminated images. Current mainstream work decomposes blind image inpainting into two stages: mask estimating from the contaminated image and image inpainting based on the estimated mask, and this two-stage solution involves two CNN-based encoder-decoder architectures for estimating and inpainting separately. In this work, we propose a novel one-stage Transformer-CNN Hybrid AutoEncoder (TransCNN-HAE) for blind image inpainting, which intuitively follows the inpainting-then-reconstructing pipeline by leveraging global long-range contextual modeling of Transformer to repair contaminated regions and local short-range contextual modeling of CNN to reconstruct the repaired image. Moreover, a Cross-layer Dissimilarity Prompt (CDP) is devised to accelerate the identifying and inpainting of contaminated regions. Ablation studies validate the efficacy of both TransCNN-HAE and CDP, and extensive experiments on various datasets with multi-property contaminations show that our method achieves state-of-the-art performance with much lower computational cost on blind image inpainting. Our code is available at https://github.com/zhenglab/TransCNN-HAE.
{"title":"TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting","authors":"Haoru Zhao, Zhaorui Gu, Bing Zheng, Haiyong Zheng","doi":"10.1145/3503161.3547848","DOIUrl":"https://doi.org/10.1145/3503161.3547848","url":null,"abstract":"Blind image inpainting is extremely challenging due to the unknown and multi-property complexity of contamination in different contaminated images. Current mainstream work decomposes blind image inpainting into two stages: mask estimating from the contaminated image and image inpainting based on the estimated mask, and this two-stage solution involves two CNN-based encoder-decoder architectures for estimating and inpainting separately. In this work, we propose a novel one-stage Transformer-CNN Hybrid AutoEncoder (TransCNN-HAE) for blind image inpainting, which intuitively follows the inpainting-then-reconstructing pipeline by leveraging global long-range contextual modeling of Transformer to repair contaminated regions and local short-range contextual modeling of CNN to reconstruct the repaired image. Moreover, a Cross-layer Dissimilarity Prompt (CDP) is devised to accelerate the identifying and inpainting of contaminated regions. Ablation studies validate the efficacy of both TransCNN-HAE and CDP, and extensive experiments on various datasets with multi-property contaminations show that our method achieves state-of-the-art performance with much lower computational cost on blind image inpainting. Our code is available at https://github.com/zhenglab/TransCNN-HAE.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123892107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visible infrared person re-identification (VI-ReID) aims at searching out the corresponding infrared (visible) images from a gallery set captured by other spectrum cameras. Recent works mainly focus on supervised VI-ReID methods that require plenty of cross-modality (visible-infrared) identity labels which are more expensive than the annotations in single-modality person ReID. For the unsupervised learning visible infrared re-identification (USL-VI-ReID), the large cross-modality discrepancies lead to difficulties in generating reliable cross-modality labels and learning modality-invariant features without any annotations. To address this problem, we propose a novel Augmented Dual-Contrastive Aggregation (ADCA) learning framework. Specifically, a dual-path contrastive learning framework with two modality-specific memories is proposed to learn the intra-modality person representation. To associate positive cross-modality identities, we design a cross-modality memory aggregation module with count priority to select highly associated positive samples, and aggregate their corresponding memory features at the cluster level, ensuring that the optimization is explicitly concentrated on the modality-irrelevant perspective. Extensive experiments demonstrate that our proposed ADCA significantly outperforms existing unsupervised methods under various settings, and even surpasses some supervised counterparts, facilitating VI-ReID to real-world deployment. Code is available at https://github.com/yangbincv/ADCA.
{"title":"Augmented Dual-Contrastive Aggregation Learning for Unsupervised Visible-Infrared Person Re-Identification","authors":"Bin Yang, Mang Ye, Jun Chen, Zesen Wu","doi":"10.1145/3503161.3548198","DOIUrl":"https://doi.org/10.1145/3503161.3548198","url":null,"abstract":"Visible infrared person re-identification (VI-ReID) aims at searching out the corresponding infrared (visible) images from a gallery set captured by other spectrum cameras. Recent works mainly focus on supervised VI-ReID methods that require plenty of cross-modality (visible-infrared) identity labels which are more expensive than the annotations in single-modality person ReID. For the unsupervised learning visible infrared re-identification (USL-VI-ReID), the large cross-modality discrepancies lead to difficulties in generating reliable cross-modality labels and learning modality-invariant features without any annotations. To address this problem, we propose a novel Augmented Dual-Contrastive Aggregation (ADCA) learning framework. Specifically, a dual-path contrastive learning framework with two modality-specific memories is proposed to learn the intra-modality person representation. To associate positive cross-modality identities, we design a cross-modality memory aggregation module with count priority to select highly associated positive samples, and aggregate their corresponding memory features at the cluster level, ensuring that the optimization is explicitly concentrated on the modality-irrelevant perspective. Extensive experiments demonstrate that our proposed ADCA significantly outperforms existing unsupervised methods under various settings, and even surpasses some supervised counterparts, facilitating VI-ReID to real-world deployment. Code is available at https://github.com/yangbincv/ADCA.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124053590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jieun Lee, Hyeonwoo Kim, Jong-Chae Shim, Eenjun Hwang
Photo cartoonization aims to convert photos of real-world scenes into cartoon-style images. Recently, generative adversarial network (GAN)-based methods for photo cartoonization have been proposed to generate pleasable cartoonized images. However, as these methods can transfer only learned cartoon styles to photos, they are limited in general-purpose applications where unlearned styles are often required. To address this limitation, an arbitrary style transfer (AST) method that transfers arbitrary artistic style into content images can be used. However, conventional AST methods do not perform satisfactorily in cartoonization for two reasons. First, they cannot capture the unique characteristics of cartoons that differ from common artistic styles. Second, they suffer from content leaks in which the semantic structure of the content is distorted. In this paper, to solve these problems, we propose a novel arbitrary-style photo cartoonization method, Cartoon-Flow. More specifically, we construct a new hybrid GAN with an invertible neural flow generator to effectively preserve content information. In addition, we introduce two new losses for cartoonization: (1) edge-promoting smooth loss to learn the unique characteristics of cartoons with smooth surfaces and clear edges, and (2) line loss to mimic the line drawing of cartoons. Extensive experiments demonstrate that the proposed method outperforms previous methods both quantitatively and qualitatively.
{"title":"Cartoon-Flow: A Flow-Based Generative Adversarial Network for Arbitrary-Style Photo Cartoonization","authors":"Jieun Lee, Hyeonwoo Kim, Jong-Chae Shim, Eenjun Hwang","doi":"10.1145/3503161.3548094","DOIUrl":"https://doi.org/10.1145/3503161.3548094","url":null,"abstract":"Photo cartoonization aims to convert photos of real-world scenes into cartoon-style images. Recently, generative adversarial network (GAN)-based methods for photo cartoonization have been proposed to generate pleasable cartoonized images. However, as these methods can transfer only learned cartoon styles to photos, they are limited in general-purpose applications where unlearned styles are often required. To address this limitation, an arbitrary style transfer (AST) method that transfers arbitrary artistic style into content images can be used. However, conventional AST methods do not perform satisfactorily in cartoonization for two reasons. First, they cannot capture the unique characteristics of cartoons that differ from common artistic styles. Second, they suffer from content leaks in which the semantic structure of the content is distorted. In this paper, to solve these problems, we propose a novel arbitrary-style photo cartoonization method, Cartoon-Flow. More specifically, we construct a new hybrid GAN with an invertible neural flow generator to effectively preserve content information. In addition, we introduce two new losses for cartoonization: (1) edge-promoting smooth loss to learn the unique characteristics of cartoons with smooth surfaces and clear edges, and (2) line loss to mimic the line drawing of cartoons. Extensive experiments demonstrate that the proposed method outperforms previous methods both quantitatively and qualitatively.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124620853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}