Melanie Imhof, Martin Braschler, P. Hansen, Stefan Rietberger
{"title":"可操作IR应用的评估:通用性和自动化","authors":"Melanie Imhof, Martin Braschler, P. Hansen, Stefan Rietberger","doi":"10.1145/2513150.2513160","DOIUrl":null,"url":null,"abstract":"Black box information retrieval (IR) application evaluation allows practitioners to measure the quality of their IR application. Instead of evaluating specific components, e.g. solely the search engine, a complete IR application, including the user's perspective, is evaluated. The evaluation methodology is designed to be applicable to operational IR applications. The black box evaluation methodology could be packaged into an evaluation and monitoring tool, making it usable for industry stakeholders. The tool should lead practitioners through the evaluation process and maintain the test results for the manual and automatic tests. This paper shows that the methodology is generalizable, even though the diversity of IR applications is high. The challenges in automating tests are the simulation of tasks that require intellectual effort and the handling of different visualizations of the same concept.","PeriodicalId":436800,"journal":{"name":"LivingLab '13","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation for operational IR applications: generalizability and automation\",\"authors\":\"Melanie Imhof, Martin Braschler, P. Hansen, Stefan Rietberger\",\"doi\":\"10.1145/2513150.2513160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Black box information retrieval (IR) application evaluation allows practitioners to measure the quality of their IR application. Instead of evaluating specific components, e.g. solely the search engine, a complete IR application, including the user's perspective, is evaluated. The evaluation methodology is designed to be applicable to operational IR applications. The black box evaluation methodology could be packaged into an evaluation and monitoring tool, making it usable for industry stakeholders. The tool should lead practitioners through the evaluation process and maintain the test results for the manual and automatic tests. This paper shows that the methodology is generalizable, even though the diversity of IR applications is high. The challenges in automating tests are the simulation of tasks that require intellectual effort and the handling of different visualizations of the same concept.\",\"PeriodicalId\":436800,\"journal\":{\"name\":\"LivingLab '13\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"LivingLab '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2513150.2513160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"LivingLab '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513150.2513160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation for operational IR applications: generalizability and automation
Black box information retrieval (IR) application evaluation allows practitioners to measure the quality of their IR application. Instead of evaluating specific components, e.g. solely the search engine, a complete IR application, including the user's perspective, is evaluated. The evaluation methodology is designed to be applicable to operational IR applications. The black box evaluation methodology could be packaged into an evaluation and monitoring tool, making it usable for industry stakeholders. The tool should lead practitioners through the evaluation process and maintain the test results for the manual and automatic tests. This paper shows that the methodology is generalizable, even though the diversity of IR applications is high. The challenges in automating tests are the simulation of tasks that require intellectual effort and the handling of different visualizations of the same concept.