Pub Date : 2021-04-08DOI: 10.1109/MIPR51284.2021.00015
M. Takamoto, Yusuke Morishita
It is well-known that training of generative adversarial networks (GANs) requires huge iterations before the generator’s providing good-quality samples. Although there are several studies to tackle this problem, there is still no universal solution. In this paper, we investigated the effect of sample mixing methods, that is, Mixup, CutMix, and newly proposed Smoothed Regional Mix (SRMix), to alleviate this problem. The sample-mixing methods are known to enhance the accuracy and robustness in the wide range of classification problems, and can naturally be applicable to GANs because the role of the discriminator can be interpreted as the classification between real and fake samples. We also proposed a new formalism applying the sample-mixing methods to GANs with the saturated losses which do not have a clear "label" of real and fake. We performed a vast amount of numerical experiments using LSUN and CelebA datasets. The results showed that Mixup and SRMix improved the quality of the generated images in terms of FID in most cases, in particular, SRMix showed the best improvement in most cases. Our analysis indicates that the mixed-samples can provide different properties from the vanilla fake samples, and the mixing pattern strongly affects the decision of the discriminators. The generated images of Mixup have good high-level feature but low-level feature is not so impressible. On the other hand, CutMix showed the opposite tendency. Our SRMix showed the middle tendency, that is, showed good high and low level features. We believe that our finding provides a new perspective to accelerate the GANs convergence and improve the quality of generated samples.
{"title":"An Empirical Study of the Effects of Sample-Mixing Methods for Efficient Training of Generative Adversarial Networks","authors":"M. Takamoto, Yusuke Morishita","doi":"10.1109/MIPR51284.2021.00015","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00015","url":null,"abstract":"It is well-known that training of generative adversarial networks (GANs) requires huge iterations before the generator’s providing good-quality samples. Although there are several studies to tackle this problem, there is still no universal solution. In this paper, we investigated the effect of sample mixing methods, that is, Mixup, CutMix, and newly proposed Smoothed Regional Mix (SRMix), to alleviate this problem. The sample-mixing methods are known to enhance the accuracy and robustness in the wide range of classification problems, and can naturally be applicable to GANs because the role of the discriminator can be interpreted as the classification between real and fake samples. We also proposed a new formalism applying the sample-mixing methods to GANs with the saturated losses which do not have a clear \"label\" of real and fake. We performed a vast amount of numerical experiments using LSUN and CelebA datasets. The results showed that Mixup and SRMix improved the quality of the generated images in terms of FID in most cases, in particular, SRMix showed the best improvement in most cases. Our analysis indicates that the mixed-samples can provide different properties from the vanilla fake samples, and the mixing pattern strongly affects the decision of the discriminators. The generated images of Mixup have good high-level feature but low-level feature is not so impressible. On the other hand, CutMix showed the opposite tendency. Our SRMix showed the middle tendency, that is, showed good high and low level features. We believe that our finding provides a new perspective to accelerate the GANs convergence and improve the quality of generated samples.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121926483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-10DOI: 10.1109/MIPR51284.2021.00019
Andrew Brown, Ernesto Coto, Andrew Zisserman
We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributions: (1) We provide a novel, simple, method for determining if a person is famous or not using image-search engines. In turn this enables a face-identity model to be built reliably and robustly, and used for high precision automatic labelling; (2) We show that even for less-famous people, image-search engines can then be used for corroborative evidence to accurately label faces that are named in the scene or the speech; (3) Finally, we quantitatively demonstrate the benefits of our approach on different video domains and test settings, such as TV shows and news broadcasts. Our method works across three disparate datasets without any explicit domain adaptation, and sets new state-of-the-art results on all the public benchmarks.
{"title":"Automated Video Labelling: Identifying Faces by Corroborative Evidence","authors":"Andrew Brown, Ernesto Coto, Andrew Zisserman","doi":"10.1109/MIPR51284.2021.00019","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00019","url":null,"abstract":"We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributions: (1) We provide a novel, simple, method for determining if a person is famous or not using image-search engines. In turn this enables a face-identity model to be built reliably and robustly, and used for high precision automatic labelling; (2) We show that even for less-famous people, image-search engines can then be used for corroborative evidence to accurately label faces that are named in the scene or the speech; (3) Finally, we quantitatively demonstrate the benefits of our approach on different video domains and test settings, such as TV shows and news broadcasts. Our method works across three disparate datasets without any explicit domain adaptation, and sets new state-of-the-art results on all the public benchmarks.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-10DOI: 10.1109/MIPR51284.2021.00071
Yufan Li, Jinggang Zhuo, Ling Fan, Harry J. Wang
Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In this paper, we contribute to this line of research by first constructing a unique color dataset inspired by a specific culture, i.e., Chinese Youth Subculture (CYS), which is an vibrant and trending cultural group especially for the Gen Z population. We show that the colors used in CYS have special aesthetic and semantic characteristics that are different from generic color theory. We then develop an interactive multi-modal generative framework to create CYS-styled color palettes, which can be used to put a CYS twist on images using our automatic colorization model. Our framework is illustrated via a demo system designed with the human-in-the-loop principle that constantly provides feedback to our algorithms. User studies are also conducted to evaluate our generation results.
{"title":"Culture-inspired Multi-modal Color Palette Generation and Colorization: A Chinese Youth Subculture Case","authors":"Yufan Li, Jinggang Zhuo, Ling Fan, Harry J. Wang","doi":"10.1109/MIPR51284.2021.00071","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00071","url":null,"abstract":"Color is an essential component of graphic design, acting not only as a visual factor but also carrying cultural implications. However, existing research on algorithmic color palette generation and colorization largely ignores the cultural aspect. In this paper, we contribute to this line of research by first constructing a unique color dataset inspired by a specific culture, i.e., Chinese Youth Subculture (CYS), which is an vibrant and trending cultural group especially for the Gen Z population. We show that the colors used in CYS have special aesthetic and semantic characteristics that are different from generic color theory. We then develop an interactive multi-modal generative framework to create CYS-styled color palettes, which can be used to put a CYS twist on images using our automatic colorization model. Our framework is illustrated via a demo system designed with the human-in-the-loop principle that constantly provides feedback to our algorithms. User studies are also conducted to evaluate our generation results.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130541423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-09DOI: 10.1109/MIPR51284.2021.00072
Xiaodan Hu, Pengfei Yu, Kevin Knight, Heng Ji, Bo Li, Humphrey Shi
We propose a novel approach, MUSE, to automatically generate portrait paintings guided by textual attributes. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. Then we design a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Frechet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.1
{"title":"MUSE: Textual Attributes Guided Portrait Painting Generation","authors":"Xiaodan Hu, Pengfei Yu, Kevin Knight, Heng Ji, Bo Li, Humphrey Shi","doi":"10.1109/MIPR51284.2021.00072","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00072","url":null,"abstract":"We propose a novel approach, MUSE, to automatically generate portrait paintings guided by textual attributes. MUSE takes a set of attributes written in text, in addition to facial features extracted from a photo of the subject as input. We propose 11 attribute types to represent inspirations from a subject's profile, emotion, story, and environment. Then we design a novel stacked neural network architecture by extending an image-to-image generative model to accept textual attributes. Experiments show that our approach significantly outperforms several state-of-the-art methods without using textual attributes, with Inception Score score increased by 6% and Frechet Inception Distance (FID) score decreased by 11%, respectively. We also propose a new attribute reconstruction metric to evaluate whether the generated portraits preserve the subject's attributes. Experiments show that our approach can accurately illustrate 78% textual attributes, which also help MUSE capture the subject in a more creative and expressive way.1","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131813233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-03DOI: 10.1109/MIPR51284.2021.00076
A. Singh, Priyanka Singh
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a model to detect AI synthesized speech.
{"title":"Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics","authors":"A. Singh, Priyanka Singh","doi":"10.1109/MIPR51284.2021.00076","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00076","url":null,"abstract":"Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a model to detect AI synthesized speech.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131094802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-03DOI: 10.1109/MIPR51284.2021.00021
Priyanka Singh
The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these platforms. On the other side, this creates a huge opportunity to carry out unchecked cybercrimes. This paper proposes a robust video hashing technique, scalable and efficient in chalking out matches from an enormous bulk of videos floating on these commercial platforms. The video hash is validated to be robust to common manipulations like scaling, corruptions by noise, compression, and contrast changes that are most probable to happen during transmission. It can also be transformed into the encrypted domain and work on top of encrypted videos without deciphering. Thus, it can serve as a potential forensic tool that can trace the illegal sharing of videos without knowing the underlying content. Hence, it can help preserve privacy and combat cybercrimes such as revenge porn, hateful content, child abuse, or illegal material propagated in a video.
{"title":"Robust Homomorphic Video Hashing","authors":"Priyanka Singh","doi":"10.1109/MIPR51284.2021.00021","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00021","url":null,"abstract":"The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these platforms. On the other side, this creates a huge opportunity to carry out unchecked cybercrimes. This paper proposes a robust video hashing technique, scalable and efficient in chalking out matches from an enormous bulk of videos floating on these commercial platforms. The video hash is validated to be robust to common manipulations like scaling, corruptions by noise, compression, and contrast changes that are most probable to happen during transmission. It can also be transformed into the encrypted domain and work on top of encrypted videos without deciphering. Thus, it can serve as a potential forensic tool that can trace the illegal sharing of videos without knowing the underlying content. Hence, it can help preserve privacy and combat cybercrimes such as revenge porn, hateful content, child abuse, or illegal material propagated in a video.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-08-26DOI: 10.1109/MIPR51284.2021.00022
Abhinav Ravi, Sandeep Repakula, U. Dutta, Maulik Parmar
Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought "Wish I could get a list of fashion items similar to the ones worn by the model!". This is what we address in this paper, where we propose a novel computer vision based technique called ShopLook to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.
{"title":"Buy Me That Look: An Approach for Recommending Similar Fashion Products","authors":"Abhinav Ravi, Sandeep Repakula, U. Dutta, Maulik Parmar","doi":"10.1109/MIPR51284.2021.00022","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00022","url":null,"abstract":"Have you ever looked at an Instagram model, or a model in a fashion e-commerce web-page, and thought \"Wish I could get a list of fashion items similar to the ones worn by the model!\". This is what we address in this paper, where we propose a novel computer vision based technique called ShopLook to address the challenging problem of recommending similar fashion products. The proposed method has been evaluated at Myntra (www.myntra.com), a leading online fashion e-commerce platform. In particular, given a user query and the corresponding Product Display Page (PDP) against the query, the goal of our method is to recommend similar fashion products corresponding to the entire set of fashion articles worn by a model in the PDP full-shot image (the one showing the entire model from head to toe). The novelty and strength of our method lies in its capability to recommend similar articles for all the fashion items worn by the model, in addition to the primary article corresponding to the query. This is not only important to promote cross-sells for boosting revenue, but also for improving customer experience and engagement. In addition, our approach is also capable of recommending similar products for User Generated Content (UGC), eg., fashion article images uploaded by users. Formally, our proposed method consists of the following components (in the same order): i) Human keypoint detection, ii) Pose classification, iii) Article localisation and object detection, along with active learning feedback, and iv) Triplet network based image embedding model.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133718983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting human behavior using logs that include user location information and categories of facilities visited is being actively researched. However, not enough research has focused on user behavioral embedding expressing user preferences. We have developed a behavior prediction model that uses an action graph with categories as nodes and transitions between categories as edges in order to capture the preference of transition on the basis of the context of the places visited by users. It uses the features of the action graph, which are extracted using a graph convolutional network. Experiments demonstrated that using user behavioral embedding extracted by graph convolution improves prediction accuracy. Quantitative and qualitative analyses demonstrated the effectiveness of action graph embedding representation.
{"title":"Predicting Human Behavior Using User’s Contextual Embedding by Convolution of Action Graph","authors":"Aozora Inagaki, Shosuke Haji, Ryoko Nakamura, Ryoichi Osawa, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00028","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00028","url":null,"abstract":"Predicting human behavior using logs that include user location information and categories of facilities visited is being actively researched. However, not enough research has focused on user behavioral embedding expressing user preferences. We have developed a behavior prediction model that uses an action graph with categories as nodes and transitions between categories as edges in order to capture the preference of transition on the basis of the context of the places visited by users. It uses the features of the action graph, which are extracted using a graph convolutional network. Experiments demonstrated that using user behavioral embedding extracted by graph convolution improves prediction accuracy. Quantitative and qualitative analyses demonstrated the effectiveness of action graph embedding representation.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/MIPR51284.2021.00051
Tianqi Li, T. Akiyama, Liang-Ying Wei
In this research, we propose a high-accuracy price prediction model for the purpose of constructing a support system for information collection and automatic analysis of profitable properties in the real estate investment market. In the traditional real estate investment process, investors needed to go through the following processes: 1) collect information on the Internet, 2) make price predictions based on their own judgement, 3) order, 4) negotiate and purchase. 1 and 2 in particular are inefficient because they seem simple, but are very time-consuming and must be repeated many times until a suitable property is found. Therefore, we aim to construct an efficient real estate investment support system by automating the information gathering process and substituting the price prediction process with a machine learning model. In this paper, we focus on the price prediction of part (2) and propose a highly accurate price prediction model using LightGBM. Specifically, the accuracy was improved by incorporating the condominium brand name, which is a price determining factor unique to Japan, and the Geo Data, a geographic factor, into the price prediction model.
{"title":"Constructing a highly accurate price prediction model in real estate investment using LightGBM","authors":"Tianqi Li, T. Akiyama, Liang-Ying Wei","doi":"10.1109/MIPR51284.2021.00051","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00051","url":null,"abstract":"In this research, we propose a high-accuracy price prediction model for the purpose of constructing a support system for information collection and automatic analysis of profitable properties in the real estate investment market. In the traditional real estate investment process, investors needed to go through the following processes: 1) collect information on the Internet, 2) make price predictions based on their own judgement, 3) order, 4) negotiate and purchase. 1 and 2 in particular are inefficient because they seem simple, but are very time-consuming and must be repeated many times until a suitable property is found. Therefore, we aim to construct an efficient real estate investment support system by automating the information gathering process and substituting the price prediction process with a machine learning model. In this paper, we focus on the price prediction of part (2) and propose a highly accurate price prediction model using LightGBM. Specifically, the accuracy was improved by incorporating the condominium brand name, which is a price determining factor unique to Japan, and the Geo Data, a geographic factor, into the price prediction model.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117140178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}