Developing Benchmarks: The Importance of the Process and New Paradigms

R. Ordelman
{"title":"Developing Benchmarks: The Importance of the Process and New Paradigms","authors":"R. Ordelman","doi":"10.1145/2983554.2983562","DOIUrl":null,"url":null,"abstract":"The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval (\"Search and Hyperlinking\"), and recently also TRECVid (\"Video Hyperlinking\"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.","PeriodicalId":340803,"journal":{"name":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM Workshop on Multimedia COMMONS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983554.2983562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The value and importance of Benchmark Evaluations is widely acknowledged. Benchmarks play a key role in many research projects. It takes time, a well-balanced team of domain specialists preferably with links to the user community and industry, and a strong involvement of the research community itself to establish a sound evaluation framework that includes (annotated) data sets, well-defined tasks that reflect the needs in the 'real world', a proper evaluation methodology, ground-truth, including a strategy for repetitive assessments, and last but not least, funding. Although the benefits of an evaluation framework are typically reviewed from a perspective of 'research output' --e.g., a scientific publication demonstrating an advance of a certain methodology-- it is important to be aware of the value of the process of creating a benchmark itself: it increases significantly the understanding of the problem we want to address and as a consequence also the impact of the evaluation outcomes. In this talk I will overview the history of a series of tasks focusing on audiovisual search emphasizing its 'multimodal' aspects, starting in 2006 with the workshop on 'Searching Spontaneous Conversational Speech' that led to tasks in CLEF and MediaEval ("Search and Hyperlinking"), and recently also TRECVid ("Video Hyperlinking"). The focus of my talk will be on the process rather than on the results of these evaluations themselves, and will address cross-benchmark connections, and new benchmark paradigms, specifically the integration of benchmarking in industrial 'living labs' that are becoming popular in some domains.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发基准:过程和新范式的重要性
基准评估的价值和重要性已得到广泛认可。基准在许多研究项目中发挥着关键作用。建立一个健全的评估框架需要时间,一个由领域专家组成的平衡良好的团队(最好与用户社区和行业有联系),以及研究界本身的强烈参与,该框架包括(注释的)数据集,反映“现实世界”需求的定义良好的任务,适当的评估方法,基本事实,包括重复评估的策略,最后但并非最不重要的是,资金。虽然评估框架的好处通常是从“研究产出”的角度来评估的——例如,科学出版物展示了某种方法的进步——但重要的是要意识到创建基准过程本身的价值:它显著地增加了对我们想要解决的问题的理解,因此也增加了评估结果的影响。在这次演讲中,我将概述一系列专注于视听搜索的任务的历史,强调其“多模态”方面,从2006年的“搜索自发对话语音”研讨会开始,该研讨会导致了CLEF和MediaEval(“搜索和超链接”)的任务,以及最近的TRECVid(“视频超链接”)。我演讲的重点将放在过程上,而不是这些评估本身的结果上,并将讨论跨基准连接和新的基准范例,特别是在某些领域变得流行的工业“生活实验室”中的基准整合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Which Languages do People Speak on Flickr?: A Language and Geo-Location Study of the YFCC100m Dataset Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset YFCC100M HybridNet fc6 Deep Features for Content-Based Image Retrieval Developing Benchmarks: The Importance of the Process and New Paradigms In-depth Exploration of Geotagging Performance using Sampling Strategies on YFCC100M
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1