Do large language models and humans have similar behaviors in causal inference with script knowledge?

Hong, Xudong, Ryzhova, Margarita, Biondi, Daniel Adrian, Demberg, Vera
{"title":"Do large language models and humans have similar behaviors in causal\n inference with script knowledge?","authors":"Hong, Xudong, Ryzhova, Margarita, Biondi, Daniel Adrian, Demberg, Vera","doi":"10.48550/arxiv.2311.07311","DOIUrl":null,"url":null,"abstract":"Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text. We first conducted a self-paced reading experiment, which showed that humans exhibit significantly longer reading times when causal conflicts exist ($\\neg A \\rightarrow B$) than under logical conditions ($A \\rightarrow B$). However, reading times remain similar when cause A is not explicitly mentioned, indicating that humans can easily infer event B from their script knowledge. We then tested a variety of LLMs on the same data to check to what extent the models replicate human behavior. Our experiments show that 1) only recent LLMs, like GPT-3 or Vicuna, correlate with human behavior in the $\\neg A \\rightarrow B$ condition. 2) Despite this correlation, all models still fail to predict that $nil \\rightarrow B$ is less surprising than $\\neg A \\rightarrow B$, indicating that LLMs still have difficulties integrating script knowledge. Our code and collected data set are available at https://github.com/tony-hong/causal-script.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv (Cornell University)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arxiv.2311.07311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text. We first conducted a self-paced reading experiment, which showed that humans exhibit significantly longer reading times when causal conflicts exist ($\neg A \rightarrow B$) than under logical conditions ($A \rightarrow B$). However, reading times remain similar when cause A is not explicitly mentioned, indicating that humans can easily infer event B from their script knowledge. We then tested a variety of LLMs on the same data to check to what extent the models replicate human behavior. Our experiments show that 1) only recent LLMs, like GPT-3 or Vicuna, correlate with human behavior in the $\neg A \rightarrow B$ condition. 2) Despite this correlation, all models still fail to predict that $nil \rightarrow B$ is less surprising than $\neg A \rightarrow B$, indicating that LLMs still have difficulties integrating script knowledge. Our code and collected data set are available at https://github.com/tony-hong/causal-script.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型和人类在使用文字知识进行因果推理时有相似的行为吗?
最近,大型预训练语言模型(llm)已经展示了卓越的语言理解能力,包括零概率因果推理。然而,目前尚不清楚它们的能力在多大程度上与人类相似。我们在这里研究基于脚本的故事中事件$B$的处理,它因果地依赖于之前的事件$ a $。在我们的操作中,事件$A$在文本的前面部分被声明、否定或省略。我们首先进行了一个自定节奏的阅读实验,结果表明,当存在因果冲突($\负a \右箭头B$)时,人类的阅读时间明显长于逻辑条件($ a \右箭头B$)。然而,当没有明确提到原因A时,阅读时间仍然相似,这表明人类可以很容易地从他们的脚本知识中推断出事件B。然后,我们在相同的数据上测试了各种llm,以检查模型在多大程度上复制了人类行为。我们的实验表明,1)只有最近的LLMs,如GPT-3或Vicuna,与人类在负A右B条件下的行为相关。2)尽管存在这种相关性,但所有模型仍然无法预测$nil \right - row B$比$\ - A \right - row B$更令人惊讶,这表明llm仍然难以整合脚本知识。我们的代码和收集的数据集可在https://github.com/tony-hong/causal-script上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
CCD Photometry of the Globular Cluster NGC 5897 The Distribution of Sandpile Groups of Random Graphs with their Pairings CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings Full-dry Flipping Transfer Method for van der Waals Heterostructure Code-Aided Channel Estimation in LDPC-Coded MIMO Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1