Developing Benchmarks: The Importance of the Process and New Paradigms

Proceedings of the 2016 ACM Workshop on Multimedia COMMONS Pub Date : 2016-10-16 DOI:10.1145/2983554.2983562

R. Ordelman

{"title":"Developing Benchmarks: The Importance of the Process and New Paradigms","authors":"R. Ordelman","doi":"10.1145/2983554.2983562","DOIUrl":null,"url":null,"abstract":"The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval (\"Search and Hyperlinking\"), and recently also TRECVid (\"Video Hyperlinking\"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.","PeriodicalId":340803,"journal":{"name":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983554.2983562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval ("Search and Hyperlinking"), and recently also TRECVid ("Video Hyperlinking"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

开发基准:过程和新范式的重要性

基准评估的价值和重要性已得到广泛认可。基准在许多研究项目中发挥着关键作用。建立一个健全的评估框架需要时间，一个由领域专家组成的平衡良好的团队(最好与用户社区和行业有联系)，以及研究界本身的强烈参与，该框架包括(注释的)数据集，反映“现实世界”需求的定义良好的任务，适当的评估方法，基本事实，包括重复评估的策略，最后但并非最不重要的是，资金。虽然评估框架的好处通常是从“研究产出”的角度来评估的——例如，科学出版物展示了某种方法的进步——但重要的是要意识到创建基准过程本身的价值:它显著地增加了对我们想要解决的问题的理解，因此也增加了评估结果的影响。在这次演讲中，我将概述一系列专注于视听搜索的任务的历史，强调其“多模态”方面，从2006年的“搜索自发对话语音”研讨会开始，该研讨会导致了CLEF和MediaEval(“搜索和超链接”)的任务，以及最近的TRECVid(“视频超链接”)。我演讲的重点将放在过程上，而不是这些评估本身的结果上，并将讨论跨基准连接和新的基准范例，特别是在某些领域变得流行的工业“生活实验室”中的基准整合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2016 ACM Workshop on Multimedia COMMONS

自引率

0.00%

发文量