首页 > 最新文献

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

英文 中文
An Empirical Study of the Effects of Sample-Mixing Methods for Efficient Training of Generative Adversarial Networks 样本混合方法对生成对抗网络有效训练效果的实证研究
M. Takamoto, Yusuke Morishita
It is well-known that training of generative adversarial networks (GANs) requires huge iterations before the generator’s providing good-quality samples. Although there are several studies to tackle this problem, there is still no universal solution. In this paper, we investigated the effect of sample mixing methods, that is, Mixup, CutMix, and newly proposed Smoothed Regional Mix (SRMix), to alleviate this problem. The sample-mixing methods are known to enhance the accuracy and robustness in the wide range of classification problems, and can naturally be applicable to GANs because the role of the discriminator can be interpreted as the classification between real and fake samples. We also proposed a new formalism applying the sample-mixing methods to GANs with the saturated losses which do not have a clear "label" of real and fake. We performed a vast amount of numerical experiments using LSUN and CelebA datasets. The results showed that Mixup and SRMix improved the quality of the generated images in terms of FID in most cases, in particular, SRMix showed the best improvement in most cases. Our analysis indicates that the mixed-samples can provide different properties from the vanilla fake samples, and the mixing pattern strongly affects the decision of the discriminators. The generated images of Mixup have good high-level feature but low-level feature is not so impressible. On the other hand, CutMix showed the opposite tendency. Our SRMix showed the middle tendency, that is, showed good high and low level features. We believe that our finding provides a new perspective to accelerate the GANs convergence and improve the quality of generated samples.
众所周知,生成式对抗网络(GANs)的训练需要大量的迭代才能提供高质量的样本。虽然有一些研究来解决这个问题,但仍然没有一个通用的解决方案。本文研究了Mixup、CutMix和新提出的Smoothed Regional Mix (SRMix)三种样本混合方法的效果,以缓解这一问题。众所周知,样本混合方法可以在广泛的分类问题中提高准确性和鲁棒性,并且可以自然地适用于gan,因为鉴别器的作用可以解释为真实样本和假样本之间的分类。我们还提出了一种新的形式,将样本混合方法应用于具有饱和损失的gan,这种gan没有清晰的真假“标签”。我们使用LSUN和CelebA数据集进行了大量的数值实验。结果表明,在大多数情况下,Mixup和SRMix在FID方面提高了生成图像的质量,其中SRMix在大多数情况下改善效果最好。我们的分析表明,混合样品可以提供不同于香草假样品的特性,混合模式强烈影响鉴别器的决策。Mixup生成的图像具有良好的高级特征,但低级特征不那么令人印象深刻。另一方面,CutMix表现出相反的趋势。我们的SRMix表现出中等倾向,即表现出良好的高、低水平特征。我们相信我们的发现为加速gan的收敛和提高生成样本的质量提供了一个新的视角。
{"title":"An Empirical Study of the Effects of Sample-Mixing Methods for Efficient Training of Generative Adversarial Networks","authors":"M. Takamoto, Yusuke Morishita","doi":"10.1109/MIPR51284.2021.00015","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00015","url":null,"abstract":"It is well-known that training of generative adversarial networks (GANs) requires huge iterations before the generator’s providing good-quality samples. Although there are several studies to tackle this problem, there is still no universal solution. In this paper, we investigated the effect of sample mixing methods, that is, Mixup, CutMix, and newly proposed Smoothed Regional Mix (SRMix), to alleviate this problem. The sample-mixing methods are known to enhance the accuracy and robustness in the wide range of classification problems, and can naturally be applicable to GANs because the role of the discriminator can be interpreted as the classification between real and fake samples. We also proposed a new formalism applying the sample-mixing methods to GANs with the saturated losses which do not have a clear \"label\" of real and fake. We performed a vast amount of numerical experiments using LSUN and CelebA datasets. The results showed that Mixup and SRMix improved the quality of the generated images in terms of FID in most cases, in particular, SRMix showed the best improvement in most cases. Our analysis indicates that the mixed-samples can provide different properties from the vanilla fake samples, and the mixing pattern strongly affects the decision of the discriminators. The generated images of Mixup have good high-level feature but low-level feature is not so impressible. On the other hand, CutMix showed the opposite tendency. Our SRMix showed the middle tendency, that is, showed good high and low level features. We believe that our finding provides a new perspective to accelerate the GANs convergence and improve the quality of generated samples.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121926483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automated Video Labelling: Identifying Faces by Corroborative Evidence 自动视频标签:通过确凿证据识别人脸
Andrew Brown, Ernesto Coto, Andrew Zisserman
We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributions: (1) We provide a novel, simple, method for determining if a person is famous or not using image-search engines. In turn this enables a face-identity model to be built reliably and robustly, and used for high precision automatic labelling; (2) We show that even for less-famous people, image-search engines can then be used for corroborative evidence to accurately label faces that are named in the scene or the speech; (3) Finally, we quantitatively demonstrate the benefits of our approach on different video domains and test settings, such as TV shows and news broadcasts. Our method works across three disparate datasets without any explicit domain adaptation, and sets new state-of-the-art results on all the public benchmarks.
我们提出了一种自动标记视频档案(如电视广播)中所有面孔的方法,该方法结合了多个证据来源和多种模式(视觉和音频)。我们的目标是不断增长的在线视频档案的问题,其中一个有效的,可扩展的索引解决方案不能要求用户提供手动注释或监督。为此,我们做出了三个关键贡献:(1)我们提供了一种新颖、简单的方法,通过图像搜索引擎来确定一个人是否出名。反过来,这使得人脸识别模型能够可靠而稳健地构建,并用于高精度自动标记;(2)我们表明,即使对于不太出名的人,图像搜索引擎也可以作为确凿的证据来准确地标记在场景或演讲中被命名的面孔;(3)最后,我们定量地展示了我们的方法在不同视频域和测试设置(如电视节目和新闻广播)上的优势。我们的方法可以跨三个不同的数据集工作,没有任何显式的领域适应,并在所有公共基准上设置新的最先进的结果。
{"title":"Automated Video Labelling: Identifying Faces by Corroborative Evidence","authors":"Andrew Brown, Ernesto Coto, Andrew Zisserman","doi":"10.1109/MIPR51284.2021.00019","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00019","url":null,"abstract":"We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributions: (1) We provide a novel, simple, method for determining if a person is famous or not using image-search engines. In turn this enables a face-identity model to be built reliably and robustly, and used for high precision automatic labelling; (2) We show that even for less-famous people, image-search engines can then be used for corroborative evidence to accurately label faces that are named in the scene or the speech; (3) Finally, we quantitatively demonstrate the benefits of our approach on different video domains and test settings, such as TV shows and news broadcasts. Our method works across three disparate datasets without any explicit domain adaptation, and sets new state-of-the-art results on all the public benchmarks.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case 文化启发下的多模态调色板生成与色彩化:一个中国青年亚文化案例
Yufan Li, Jinggang Zhuo, Ling Fan, Harry J. Wang
Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In this paper, we contribute to this line of research by first constructing a unique color dataset inspired by a specific culture, i.e., Chinese Youth Subculture (CYS), which is an vibrant and trending cultural group especially for the Gen Z population. We show that the colors used in CYS have special aesthetic and semantic characteristics that are different from generic color theory. We then develop an interactive multi-modal generative framework to create CYS-styled color palettes, which can be used to put a CYS twist on images using our automatic colorization model. Our framework is illustrated via a demo system designed with the human-in-the-loop principle that constantly provides feedback to our algorithms. User studies are also conducted to evaluate our generation results.
色彩是平面设计的重要组成部分,它不仅是一种视觉因素,而且还承载着文化内涵。然而,现有的算法调色板生成和着色研究在很大程度上忽略了文化方面。在本文中,我们通过首先构建一个独特的颜色数据集来促进这一研究,该数据集的灵感来自一个特定的文化,即中国青年亚文化(CYS),这是一个充满活力和趋势的文化群体,特别是对于Z世代人口。我们发现CYS中使用的颜色具有不同于一般色彩理论的特殊的美学和语义特征。然后,我们开发了一个交互式多模态生成框架来创建CYS风格的调色板,该调色板可用于使用我们的自动着色模型对图像进行CYS扭曲。我们的框架通过一个演示系统来说明,该演示系统采用人在环原理设计,不断向我们的算法提供反馈。用户研究也被用来评估我们的生成结果。
{"title":"Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case","authors":"Yufan Li, Jinggang Zhuo, Ling Fan, Harry J. Wang","doi":"10.1109/MIPR51284.2021.00071","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00071","url":null,"abstract":"Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In this paper, we contribute to this line of research by first constructing a unique color dataset inspired by a specific culture, i.e., Chinese Youth Subculture (CYS), which is an vibrant and trending cultural group especially for the Gen Z population. We show that the colors used in CYS have special aesthetic and semantic characteristics that are different from generic color theory. We then develop an interactive multi-modal generative framework to create CYS-styled color palettes, which can be used to put a CYS twist on images using our automatic colorization model. Our framework is illustrated via a demo system designed with the human-in-the-loop principle that constantly provides feedback to our algorithms. User studies are also conducted to evaluate our generation results.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130541423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
MUSE: Textual Attributes Guided Portrait Painting Generation MUSE:文本属性引导肖像绘画生成
Xiaodan Hu, Pengfei Yu, Kevin Knight, Heng Ji, Bo Li, Humphrey Shi
We propose a novel approach, MUSE, to automatically generate portrait paintings guided by textual attributes. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. Then we design a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Frechet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.1
我们提出了一种新的方法,MUSE,在文本属性的引导下自动生成肖像画。除了从被摄者的照片中提取的面部特征作为输入外,MUSE还接受一组以文本形式书写的属性。我们提出了11种属性类型来代表来自主题的个人资料,情感,故事和环境的灵感。然后,通过扩展图像到图像的生成模型来接受文本属性,设计了一种新的堆叠神经网络架构。实验表明,我们的方法在不使用文本属性的情况下明显优于几种最先进的方法,其中Inception Score得分提高了6%,Frechet Inception Distance (FID)得分分别降低了11%。我们还提出了一种新的属性重建度量来评估生成的肖像是否保留了主体的属性。实验表明,我们的方法可以准确地描述78%的文本属性,这也有助于MUSE以更具创造性和表现力的方式捕捉主题
{"title":"MUSE: Textual Attributes Guided Portrait Painting Generation","authors":"Xiaodan Hu, Pengfei Yu, Kevin Knight, Heng Ji, Bo Li, Humphrey Shi","doi":"10.1109/MIPR51284.2021.00072","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00072","url":null,"abstract":"We propose a novel approach, MUSE, to automatically generate portrait paintings guided by textual attributes. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. Then we design a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Frechet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.1","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131813233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics 基于倒谱和双谱统计的人工智能合成语音检测
A. Singh, Priyanka Singh
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a model to detect AI synthesized speech.
数字技术已经使难以想象的应用成为可能。拥有一些易于编辑和操作的工具似乎令人兴奋,但它也引发了令人担忧的担忧,这些担忧可能会以语音克隆、复制或深度伪造的方式传播。验证语音的真实性是数字音频取证的主要问题之一。我们提出了一种利用双谱和倒谱分析来区分人类语音和人工智能合成语音的方法。与合成语音相比,高阶统计与人类语音的相关性较小。此外,倒谱分析揭示了人类语音中持久的功率成分,这是合成语音所缺少的。我们将这两种分析结合起来,提出了一种检测人工智能合成语音的模型。
{"title":"Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics","authors":"A. Singh, Priyanka Singh","doi":"10.1109/MIPR51284.2021.00076","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00076","url":null,"abstract":"Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a model to detect AI synthesized speech.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131094802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Robust Homomorphic Video Hashing 鲁棒同态视频哈希
Priyanka Singh
The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these platforms. On the other side, this creates a huge opportunity to carry out unchecked cybercrimes. This paper proposes a robust video hashing technique, scalable and efficient in chalking out matches from an enormous bulk of videos floating on these commercial platforms. The video hash is validated to be robust to common manipulations like scaling, corruptions by noise, compression, and contrast changes that are most probable to happen during transmission. It can also be transformed into the encrypted domain and work on top of encrypted videos without deciphering. Thus, it can serve as a potential forensic tool that can trace the illegal sharing of videos without knowing the underlying content. Hence, it can help preserve privacy and combat cybercrimes such as revenge porn, hateful content, child abuse, or illegal material propagated in a video.
互联网已被武器化,以前所未有的速度进行网络犯罪活动。在利用现代工具和技术的同时,保护个人数据隐私的担忧日益增加,这令人担忧。几乎所有商业平台都需要端到端加密解决方案。一方面,提供这样的解决方案并让人们信任可靠地使用这些平台似乎势在必行。另一方面,这也为不受控制的网络犯罪创造了巨大的机会。本文提出了一种鲁棒的视频哈希技术,可扩展且高效地从这些商业平台上浮动的大量视频中进行匹配。视频散列被验证对常见操作具有鲁棒性,例如在传输过程中最有可能发生的缩放、噪声损坏、压缩和对比度变化。它还可以转换为加密域,并在加密视频上工作而无需解密。因此,它可以作为一种潜在的法医工具,在不知道潜在内容的情况下追踪非法分享视频。因此,它可以帮助保护隐私和打击网络犯罪,如报复色情、仇恨内容、虐待儿童或在视频中传播的非法材料。
{"title":"Robust Homomorphic Video Hashing","authors":"Priyanka Singh","doi":"10.1109/MIPR51284.2021.00021","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00021","url":null,"abstract":"The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these platforms. On the other side, this creates a huge opportunity to carry out unchecked cybercrimes. This paper proposes a robust video hashing technique, scalable and efficient in chalking out matches from an enormous bulk of videos floating on these commercial platforms. The video hash is validated to be robust to common manipulations like scaling, corruptions by noise, compression, and contrast changes that are most probable to happen during transmission. It can also be transformed into the encrypted domain and work on top of encrypted videos without deciphering. Thus, it can serve as a potential forensic tool that can trace the illegal sharing of videos without knowing the underlying content. Hence, it can help preserve privacy and combat cybercrimes such as revenge porn, hateful content, child abuse, or illegal material propagated in a video.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Buy Me That Look: An Approach for Recommending Similar Fashion Products 给我买那个造型:一种推荐相似时尚产品的方法
Abhinav Ravi, Sandeep Repakula, U. Dutta, Maulik Parmar
Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought "Wish I could get a list of fashion items similar to the ones worn by the model!". This is what we address in this paper, where we propose a novel computer vision based technique called ShopLook to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.
你有没有看过Instagram上的模特,或者时尚电子商务网页上的模特,然后想“希望我能得到一份和模特穿的相似的时尚单品清单!”这就是我们在本文中要解决的问题,我们提出了一种新的基于计算机视觉的技术,称为ShopLook,以解决推荐类似时尚产品的挑战性问题。该方法已经在时尚电子商务平台Myntra (www.myntra.com)进行了验证。特别是,给定用户查询和对应的产品显示页面(Product Display Page, PDP),我们的方法的目标是推荐与PDP全裸图像(从头到脚展示整个模型的图像)中模特所穿的整套时尚物品相对应的类似时尚产品。我们的方法的新颖性和优势在于它能够为模特所穿的所有时尚单品推荐类似的物品,除了与查询相对应的主要物品。这不仅对促进交叉销售以增加收入很重要,而且对改善客户体验和用户粘性也很重要。此外,我们的方法还能够为用户生成内容(UGC)推荐类似的产品,例如:、用户上传的时尚文章图片。正式地,我们提出的方法由以下部分组成(顺序相同):i)人体关键点检测,ii)姿态分类,iii)文章定位和目标检测,以及主动学习反馈,iv)基于Triplet网络的图像嵌入模型。
{"title":"Buy Me That Look: An Approach for Recommending Similar Fashion Products","authors":"Abhinav Ravi, Sandeep Repakula, U. Dutta, Maulik Parmar","doi":"10.1109/MIPR51284.2021.00022","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00022","url":null,"abstract":"Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought \"Wish I could get a list of fashion items similar to the ones worn by the model!\". This is what we address in this paper, where we propose a novel computer vision based technique called ShopLook to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133718983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Predicting Human Behavior Using User’s Contextual Embedding by Convolution of Action Graph 基于动作图卷积的用户上下文嵌入预测人类行为
Aozora Inagaki, Shosuke Haji, Ryoko Nakamura, Ryoichi Osawa, T. Takagi, Isshu Munemasa
Predicting human behavior using logs that include user location information and categories of facilities visited is being actively researched. However, not enough research has focused on user behavioral embedding expressing user preferences. We have developed a behavior prediction model that uses an action graph with categories as nodes and transitions between categories as edges in order to capture the preference of transition on the basis of the context of the places visited by users. It uses the features of the action graph, which are extracted using a graph convolutional network. Experiments demonstrated that using user behavioral embedding extracted by graph convolution improves prediction accuracy. Quantitative and qualitative analyses demonstrated the effectiveness of action graph embedding representation.
目前正在积极研究使用包括用户位置信息和访问设施类别在内的日志来预测人类行为。然而,对用户行为嵌入表达用户偏好的研究还不够多。我们开发了一个行为预测模型,该模型使用一个动作图,将类别作为节点,将类别之间的转换作为边,以便根据用户访问的地点的上下文捕获转换的偏好。它使用动作图的特征,这些特征是使用图卷积网络提取的。实验表明,使用图卷积提取的用户行为嵌入提高了预测精度。定量和定性分析证明了行动图嵌入表示的有效性。
{"title":"Predicting Human Behavior Using User’s Contextual Embedding by Convolution of Action Graph","authors":"Aozora Inagaki, Shosuke Haji, Ryoko Nakamura, Ryoichi Osawa, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00028","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00028","url":null,"abstract":"Predicting human behavior using logs that include user location information and categories of facilities visited is being actively researched. However, not enough research has focused on user behavioral embedding expressing user preferences. We have developed a behavior prediction model that uses an action graph with categories as nodes and transitions between categories as edges in order to capture the preference of transition on the basis of the context of the places visited by users. It uses the features of the action graph, which are extracted using a graph convolutional network. Experiments demonstrated that using user behavioral embedding extracted by graph convolution improves prediction accuracy. Quantitative and qualitative analyses demonstrated the effectiveness of action graph embedding representation.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing a highly accurate price prediction model in real estate investment using LightGBM 利用LightGBM构建房地产投资价格高精度预测模型
Tianqi Li, T. Akiyama, Liang-Ying Wei
In this research, we propose a high-accuracy price prediction model for the purpose of constructing a support system for information collection and automatic analysis of profitable properties in the real estate investment market. In the traditional real estate investment process, investors needed to go through the following processes: 1) collect information on the Internet, 2) make price predictions based on their own judgement, 3) order, 4) negotiate and purchase. 1 and 2 in particular are inefficient because they seem simple, but are very time-consuming and must be repeated many times until a suitable property is found. Therefore, we aim to construct an efficient real estate investment support system by automating the information gathering process and substituting the price prediction process with a machine learning model. In this paper, we focus on the price prediction of part (2) and propose a highly accurate price prediction model using LightGBM. Specifically, the accuracy was improved by incorporating the condominium brand name, which is a price determining factor unique to Japan, and the Geo Data, a geographic factor, into the price prediction model.
在本研究中,我们提出了一个高精度的价格预测模型,旨在为房地产投资市场中盈利物业的信息收集和自动分析构建一个支持系统。在传统的房地产投资过程中,投资者需要经历以下几个过程:1)在网上收集信息,2)根据自己的判断进行价格预测,3)订购,4)协商购买。特别是1和2是低效的,因为它们看起来很简单,但非常耗时,必须重复多次,直到找到合适的属性。因此,我们的目标是通过自动化信息收集过程,用机器学习模型代替价格预测过程,构建一个高效的房地产投资支持系统。本文以第(2)部分的价格预测为重点,利用LightGBM提出了一个高精度的价格预测模型。具体来说,通过将日本特有的价格决定因素公寓品牌名称和地理因素Geo Data纳入价格预测模型,提高了准确性。
{"title":"Constructing a highly accurate price prediction model in real estate investment using LightGBM","authors":"Tianqi Li, T. Akiyama, Liang-Ying Wei","doi":"10.1109/MIPR51284.2021.00051","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00051","url":null,"abstract":"In this research, we propose a high-accuracy price prediction model for the purpose of constructing a support system for information collection and automatic analysis of profitable properties in the real estate investment market. In the traditional real estate investment process, investors needed to go through the following processes: 1) collect information on the Internet, 2) make price predictions based on their own judgement, 3) order, 4) negotiate and purchase. 1 and 2 in particular are inefficient because they seem simple, but are very time-consuming and must be repeated many times until a suitable property is found. Therefore, we aim to construct an efficient real estate investment support system by automating the information gathering process and substituting the price prediction process with a machine learning model. In this paper, we focus on the price prediction of part (2) and propose a highly accurate price prediction model using LightGBM. Specifically, the accuracy was improved by incorporating the condominium brand name, which is a price determining factor unique to Japan, and the Geo Data, a geographic factor, into the price prediction model.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117140178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1