针对非随机缺失数据的推荐系统进行培训和测试

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI:10.1145/1835804.1835895

H. Steck

{"title":"针对非随机缺失数据的推荐系统进行培训和测试","authors":"H. Steck","doi":"10.1145/1835804.1835895","DOIUrl":null,"url":null,"abstract":"Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"345","resultStr":"{\"title\":\"Training and testing of recommender systems on data missing not at random\",\"authors\":\"H. Steck\",\"doi\":\"10.1145/1835804.1835895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.\",\"PeriodicalId\":20529,\"journal\":{\"name\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"345\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1835804.1835895\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835804.1835895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 345

摘要

用户通常只对所有可用物品中的一小部分进行评分。我们表明，评级的缺失为提高所有项目的top-k命中率提供了有用的信息，这是推荐的自然准确性度量。为了测试推荐系统，我们提出了两个性能度量，在温和的假设下，即使在评级缺失非随机(MNAR)的情况下，也可以从数据中估计出没有偏差的性能度量。为了获得最佳的测试结果，我们提出了合适的替代目标函数来对MNAR数据进行有效的训练。它们的主要属性是考虑所有评级——无论是观察到的还是数据中缺失的。关于测试数据的top-k命中率，我们的实验表明，即使是仅根据观察到的评级进行优化的复杂方法，也有显着的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Training and testing of recommender systems on data missing not at random

Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

Frequent regular itemset mining Suggesting friends using the implicit social graph Collusion-resistant privacy-preserving data mining Mining advisor-advisee relationships from research publication networks Session details: Research track 5: classification models and tools