Rating Movies and Rating the Raters Who Rate Them.

IF 2.1 4区数学 Q1 STATISTICS & PROBABILITY American Statistician Pub Date : 2009-11-01 DOI:10.1198/tast.2009.08278

Hua Zhou, Kenneth Lange

{"title":"Rating Movies and Rating the Raters Who Rate Them.","authors":"Hua Zhou, Kenneth Lange","doi":"10.1198/tast.2009.08278","DOIUrl":null,"url":null,"abstract":"<p><p>The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.</p>","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"63 4","pages":"297-307"},"PeriodicalIF":2.1000,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929029/pdf/nihms205491.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Statistician","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1198/tast.2009.08278","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

给电影打分，给给电影打分的人打分。

电影发行公司奈飞公司（Netflix）悬赏100万美元，希望改进其电影评级系统，这在统计界引起了不小的反响。在公开了他们技术的统计学家和计算机科学家中，重点一直放在机器学习方法上。本文的适度目标是讨论一个简单的电影评级模型和其他形式的民主评级。由于该模型涉及大量的参数，因此进行极大似然估计是非平凡的。在这里，我们从更一般的MM算法的角度推导了一个简单的EM算法。该算法能够在充斥着劣等模态的似然图上找到全局最大值。我们将模型的两个变体应用于MovieLens存档的数据集，并比较它们的结果。我们的模型识别出古怪的评级者，重新定义原始排名，并允许对缺失评级进行估算。该模型旨在促进讨论和发展更好的理论，而不是为了获奖。它还有一个额外的好处，就是向读者介绍与分析高维数据相关的一些问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

American Statistician 数学-统计学与概率论

CiteScore

3.50

自引率

5.60%

发文量

审稿时长

>12 weeks

期刊介绍： Are you looking for general-interest articles about current national and international statistical problems and programs; interesting and fun articles of a general nature about statistics and its applications; or the teaching of statistics? Then you are looking for The American Statistician (TAS), published quarterly by the American Statistical Association. TAS contains timely articles organized into the following sections: Statistical Practice, General, Teacher''s Corner, History Corner, Interdisciplinary, Statistical Computing and Graphics, Reviews of Books and Teaching Materials, and Letters to the Editor.

期刊最新文献

Introduction to Political Analysis in R Swinging, Fast and Slow: Interpreting Variation in Baseball Swing Tracking Metrics Introduction to Regression Methods for Public Health Using R Exploring Complex Survey Data Analysis Using R: A Tidy Introduction with {srvyr} and {survey} Scaffolding responsible software use: evaluating the effectiveness of a causal inference tool