A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense

IF 1.3 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Intelligence Pub Date : 2023-11-07 DOI:10.1162/dint_a_00234
Henrique Santos, Ke Shen, Alice M. Mulvehill, Mayank Kejriwal, Deborah L. McGuinness
{"title":"A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense","authors":"Henrique Santos, Ke Shen, Alice M. Mulvehill, Mayank Kejriwal, Deborah L. McGuinness","doi":"10.1162/dint_a_00234","DOIUrl":null,"url":null,"abstract":"ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense. We propose a novel benchmark, called Theoretically Grounded common sense Reasoning (TG-CSR), modeled as a set of question answering instances, with each instance grounded in a semantic category of common sense, such as space, time, and emotions. The benchmark is few-shot i.e., only a few training and validation examples are provided in the public release to avoid the possibility of overfitting. Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models. Due to its semantic rigor, this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"14 8","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/dint_a_00234","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense. We propose a novel benchmark, called Theoretically Grounded common sense Reasoning (TG-CSR), modeled as a set of question answering instances, with each instance grounded in a semantic category of common sense, such as space, time, and emotions. The benchmark is few-shot i.e., only a few training and validation examples are provided in the public release to avoid the possibility of overfitting. Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models. Due to its semantic rigor, this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于评估机器常识的理论基础问答数据集
实现机器常识一直是人工智能领域一个长期存在的问题。到目前为止,还缺乏基于常识理论并可用于对常识推理(CSR)系统进行严格的语义评估的基准数据集。人工智能社区的一个期望是,神经符号推理器可以帮助弥合这一差距,使系统具有更可靠的常识。我们提出了一个新的基准,称为基于理论的常识推理(TG-CSR),将其建模为一组问答实例,每个实例都基于常识的语义类别,如空间、时间和情感。基准测试是few-shot的,即在公开发布中只提供了少量的训练和验证示例,以避免过度拟合的可能性。最近的评估结果表明,TG-CSR即使对于最先进的统计模型也是具有挑战性的。由于它的语义严谨性,这个基准可以用来评估神经符号系统的常识推理能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data Intelligence
Data Intelligence COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
6.50
自引率
15.40%
发文量
40
审稿时长
8 weeks
期刊最新文献
The Limitations and Ethical Considerations of ChatGPT Rule Mining Trends from 1987 to 2022: A Bibliometric Analysis and Visualization Classification and quantification of timestamp data quality issues and its impact on data quality outcome BIKAS: Bio-Inspired Knowledge Acquisition and Simulacrum—A Knowledge Database to Support Multifunctional Design Concept Generation Exploring Attentive Siamese LSTM for Low-Resource Text Plagiarism Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1