{"title":"个性化测试集合的可重用性研究","authors":"Seyyed Hadi Hashemi, J. Kamps","doi":"10.1145/3099023.3099044","DOIUrl":null,"url":null,"abstract":"Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet reusability of test collections is under threat by different factors such as dynamic nature of data collections and new trends in building retrieval systems. Specifically, building reusable test collections that last over years is a very challenging problem as retrieval approaches change considerably per year based on new trends among Information Retrieval researchers. We experiment with a novel temporal reusability test to evaluate reusability of test collections over a year based on leaving mutual topics in experiment, in which we borrow some judged topics from previous years and include them in the new set of topics to be used in the current year. In fact, we experiment whether a new set of retrieval systems can be evaluated and comparatively ranked based on an old test collection. Our experiments is done based on two sets of runs from Text REtrieval Conference (TREC) 2015 and 2016 Contextual Suggestion Track, which is a personalized venue recommendation task. Our experiments show that the TREC 2015 test collection is not temporally reusable. The test collection should be used with extreme care based on early precision metrics and slightly less care based on NDCG, bpref and MAP metrics. Our approach offers a very precise experiment to test temporal reusability of test collections over a year, and it is very effective to be used in tracks running a setup similar to their previous years.","PeriodicalId":219391,"journal":{"name":"Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On the Reusability of Personalized Test Collections\",\"authors\":\"Seyyed Hadi Hashemi, J. Kamps\",\"doi\":\"10.1145/3099023.3099044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet reusability of test collections is under threat by different factors such as dynamic nature of data collections and new trends in building retrieval systems. Specifically, building reusable test collections that last over years is a very challenging problem as retrieval approaches change considerably per year based on new trends among Information Retrieval researchers. We experiment with a novel temporal reusability test to evaluate reusability of test collections over a year based on leaving mutual topics in experiment, in which we borrow some judged topics from previous years and include them in the new set of topics to be used in the current year. In fact, we experiment whether a new set of retrieval systems can be evaluated and comparatively ranked based on an old test collection. Our experiments is done based on two sets of runs from Text REtrieval Conference (TREC) 2015 and 2016 Contextual Suggestion Track, which is a personalized venue recommendation task. Our experiments show that the TREC 2015 test collection is not temporally reusable. The test collection should be used with extreme care based on early precision metrics and slightly less care based on NDCG, bpref and MAP metrics. Our approach offers a very precise experiment to test temporal reusability of test collections over a year, and it is very effective to be used in tracks running a setup similar to their previous years.\",\"PeriodicalId\":219391,\"journal\":{\"name\":\"Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3099023.3099044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3099023.3099044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Reusability of Personalized Test Collections
Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet reusability of test collections is under threat by different factors such as dynamic nature of data collections and new trends in building retrieval systems. Specifically, building reusable test collections that last over years is a very challenging problem as retrieval approaches change considerably per year based on new trends among Information Retrieval researchers. We experiment with a novel temporal reusability test to evaluate reusability of test collections over a year based on leaving mutual topics in experiment, in which we borrow some judged topics from previous years and include them in the new set of topics to be used in the current year. In fact, we experiment whether a new set of retrieval systems can be evaluated and comparatively ranked based on an old test collection. Our experiments is done based on two sets of runs from Text REtrieval Conference (TREC) 2015 and 2016 Contextual Suggestion Track, which is a personalized venue recommendation task. Our experiments show that the TREC 2015 test collection is not temporally reusable. The test collection should be used with extreme care based on early precision metrics and slightly less care based on NDCG, bpref and MAP metrics. Our approach offers a very precise experiment to test temporal reusability of test collections over a year, and it is very effective to be used in tracks running a setup similar to their previous years.