J. Beel, Marcel Genzmehr, Stefan Langer, A. Nürnberger, Bela Gipp
{"title":"A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation","authors":"J. Beel, Marcel Genzmehr, Stefan Langer, A. Nürnberger, Bela Gipp","doi":"10.1145/2532508.2532511","DOIUrl":null,"url":null,"abstract":"Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.","PeriodicalId":398648,"journal":{"name":"RepSys '13","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"141","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RepSys '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2532508.2532511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 141
Abstract
Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.