{"title":"基于内容的基于词嵌入的协同过滤:以电影推荐为例","authors":"Luong Vuong Nguyen, Tri-Hai Nguyen, Jason J. Jung","doi":"10.1145/3400286.3418253","DOIUrl":null,"url":null,"abstract":"The lack of sufficient ratings will reduce effectively modeling user reference and finding trustworthy similar users in collaborative filtering (CF)-based recommendation systems, also known as a cold-start problem. To solve this problem and improve the efficiency of recommendation systems, we propose a new content-based CF approach based on item similarity. We apply the model in the movie domain and extract features such as genres, directors, actors, and plots of the movies. We use the Jaccard coefficient index to covert the extracted features such as genres, directors, actors to the vectors while the plot feature is converted to the semantic vectors. Then, the similarity of the movies is calculated by soft cosine measure based on vectorized features. We apply the word embedding model (i.e., Word2Vec) for representing the plots feature as semantic vectors instead of using traditional models such as a binary bag of words and a TF-IDF vector space. Experiment results show the superiority of the proposed system in terms of accuracy, precision, recall, and F1 scores in cold-start conditions compared to the baseline systems.","PeriodicalId":326100,"journal":{"name":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Content-Based Collaborative Filtering using Word Embedding: A Case Study on Movie Recommendation\",\"authors\":\"Luong Vuong Nguyen, Tri-Hai Nguyen, Jason J. Jung\",\"doi\":\"10.1145/3400286.3418253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The lack of sufficient ratings will reduce effectively modeling user reference and finding trustworthy similar users in collaborative filtering (CF)-based recommendation systems, also known as a cold-start problem. To solve this problem and improve the efficiency of recommendation systems, we propose a new content-based CF approach based on item similarity. We apply the model in the movie domain and extract features such as genres, directors, actors, and plots of the movies. We use the Jaccard coefficient index to covert the extracted features such as genres, directors, actors to the vectors while the plot feature is converted to the semantic vectors. Then, the similarity of the movies is calculated by soft cosine measure based on vectorized features. We apply the word embedding model (i.e., Word2Vec) for representing the plots feature as semantic vectors instead of using traditional models such as a binary bag of words and a TF-IDF vector space. Experiment results show the superiority of the proposed system in terms of accuracy, precision, recall, and F1 scores in cold-start conditions compared to the baseline systems.\",\"PeriodicalId\":326100,\"journal\":{\"name\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Research in Adaptive and Convergent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400286.3418253\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400286.3418253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Content-Based Collaborative Filtering using Word Embedding: A Case Study on Movie Recommendation
The lack of sufficient ratings will reduce effectively modeling user reference and finding trustworthy similar users in collaborative filtering (CF)-based recommendation systems, also known as a cold-start problem. To solve this problem and improve the efficiency of recommendation systems, we propose a new content-based CF approach based on item similarity. We apply the model in the movie domain and extract features such as genres, directors, actors, and plots of the movies. We use the Jaccard coefficient index to covert the extracted features such as genres, directors, actors to the vectors while the plot feature is converted to the semantic vectors. Then, the similarity of the movies is calculated by soft cosine measure based on vectorized features. We apply the word embedding model (i.e., Word2Vec) for representing the plots feature as semantic vectors instead of using traditional models such as a binary bag of words and a TF-IDF vector space. Experiment results show the superiority of the proposed system in terms of accuracy, precision, recall, and F1 scores in cold-start conditions compared to the baseline systems.