A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Intelligence Pub Date : 2023-11-07 DOI:10.1162/dint_a_00234

Henrique Santos, Ke Shen, Alice M. Mulvehill, Mayank Kejriwal, Deborah L. McGuinness

{"title":"A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense","authors":"Henrique Santos, Ke Shen, Alice M. Mulvehill, Mayank Kejriwal, Deborah L. McGuinness","doi":"10.1162/dint_a_00234","DOIUrl":null,"url":null,"abstract":"ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense. We propose a novel benchmark, called Theoretically Grounded common sense Reasoning (TG-CSR), modeled as a set of question answering instances, with each instance grounded in a semantic category of common sense, such as space, time, and emotions. The benchmark is few-shot i.e., only a few training and validation examples are provided in the public release to avoid the possibility of overfitting. Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models. Due to its semantic rigor, this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"14 8","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/dint_a_00234","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense. We propose a novel benchmark, called Theoretically Grounded common sense Reasoning (TG-CSR), modeled as a set of question answering instances, with each instance grounded in a semantic category of common sense, such as space, time, and emotions. The benchmark is few-shot i.e., only a few training and validation examples are provided in the public release to avoid the possibility of overfitting. Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models. Due to its semantic rigor, this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于评估机器常识的理论基础问答数据集

实现机器常识一直是人工智能领域一个长期存在的问题。到目前为止，还缺乏基于常识理论并可用于对常识推理(CSR)系统进行严格的语义评估的基准数据集。人工智能社区的一个期望是，神经符号推理器可以帮助弥合这一差距，使系统具有更可靠的常识。我们提出了一个新的基准，称为基于理论的常识推理(TG-CSR)，将其建模为一组问答实例，每个实例都基于常识的语义类别，如空间、时间和情感。基准测试是few-shot的，即在公开发布中只提供了少量的训练和验证示例，以避免过度拟合的可能性。最近的评估结果表明，TG-CSR即使对于最先进的统计模型也是具有挑战性的。由于它的语义严谨性，这个基准可以用来评估神经符号系统的常识推理能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊