首页 > 最新文献

Computational Linguistics最新文献

英文 中文
My Tenure as the Editor-in-Chief of Computational Linguistics 我的计算语言学主编任期
IF 9.3 2区 计算机科学 Pub Date : 2024-01-10 DOI: 10.1162/coli_e_00505
Hwee Tou Ng
Time flies and it has been close to five and a half years since I became the editor-in-chief of Computational Linguistics on 15 July 2018. In this editorial, I will describe the changes that I have introduced at the journal, and highlight the achievements and challenges of the journal.
时光飞逝,自2018年7月15日我担任《计算语言学》主编以来,已经过去了将近五年半的时间。在这篇社论中,我将介绍我为期刊带来的变化,并强调期刊的成就和挑战。
{"title":"My Tenure as the Editor-in-Chief of Computational Linguistics","authors":"Hwee Tou Ng","doi":"10.1162/coli_e_00505","DOIUrl":"https://doi.org/10.1162/coli_e_00505","url":null,"abstract":"Time flies and it has been close to five and a half years since I became the editor-in-chief of Computational Linguistics on 15 July 2018. In this editorial, I will describe the changes that I have introduced at the journal, and highlight the achievements and challenges of the journal.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"83 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion 干草堆中的话题:通过语料库扩展提高主题质量
IF 9.3 2区 计算机科学 Pub Date : 2024-01-08 DOI: 10.1162/coli_a_00506
Anton Thielmann, Arik Reuter, Quentin Seifert, Elisabeth Bergherr, Benjamin Säfken
Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. Through simple corpus expansion, our model can detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.
在自然语言处理(NLP)领域,提取和识别大型文本语料库中的潜在主题越来越重要。大多数模型,无论是类似于潜在 Dirichlet 分配(LDA)的概率模型,还是神经主题模型,都遵循相同的主题可解释性和主题提取的基本方法。我们提出的方法结合了对句子和文档主题的更深入理解,并超越了简单分析数据中单词频率的范畴。通过简单的语料库扩展,我们的模型可以检测到潜在的主题,其中可能包括不常见的词或新词,以及文档本身不存在的词。此外,我们还根据语义空间中的入侵词和相似度量提出了几个新的评估指标。我们提出了与人类识别入侵词的相关系数,并在单词入侵任务中取得了接近人类水平的结果。我们通过一项大型基准研究证明了我们的方法极具竞争力,与最先进的主题建模和文档聚类模型相比,我们的方法取得了更优异的结果。
{"title":"Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion","authors":"Anton Thielmann, Arik Reuter, Quentin Seifert, Elisabeth Bergherr, Benjamin Säfken","doi":"10.1162/coli_a_00506","DOIUrl":"https://doi.org/10.1162/coli_a_00506","url":null,"abstract":"Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence and document themes, and goes beyond simply analyzing word frequencies in the data. Through simple corpus expansion, our model can detect latent topics that may include uncommon words or neologisms, as well as words not present in the documents themselves. Additionally, we propose several new evaluation metrics based on intruder words and similarity measures in the semantic space. We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task. We demonstrate the competitive performance of our method with a large benchmark study, and achieve superior results compared to state-of-the-art topic modeling and document clustering models.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"14 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139409651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common Flaws in Running Human Evaluation Experiments in NLP 在 NLP 中进行人工评估实验的常见缺陷
IF 9.3 2区 计算机科学 Pub Date : 2024-01-08 DOI: 10.1162/coli_a_00508
Craig Thomson, Ehud Reiter, Anya Belz
While conducting a coordinated set of repeat runs of human evaluation experiments in NLP, we discovered flaws in every single experiment we selected for inclusion via a systematic process. In this paper, we describe the types of flaws we discovered which include coding errors (e.g., loading the wrong system outputs to evaluate), failure to follow standard scientific practice (e.g., ad hoc exclusion of participants and responses), and mistakes in reported numerical results (e.g., reported numbers not matching experimental data). If these problems are widespread, it would have worrying implications for the rigour of NLP evaluation experiments as currently conducted. We discuss what researchers can do to reduce the occurrence of such flaws, including pre-registration, better code development practices, increased testing and piloting, and post-publication addressing of errors.
在对 NLP 中的人类评估实验进行一系列协调的重复运行时,我们发现了我们通过系统流程选择纳入的每一个实验中存在的缺陷。在本文中,我们将描述我们发现的缺陷类型,其中包括编码错误(例如,加载错误的系统输出进行评估)、未遵循标准科学实践(例如,临时排除参与者和回应)以及报告的数字结果错误(例如,报告的数字与实验数据不符)。如果这些问题普遍存在,将对目前进行的NLP评估实验的严谨性产生令人担忧的影响。我们将讨论研究人员可以采取哪些措施来减少此类缺陷的发生,包括预先注册、更好的代码开发实践、增加测试和试验以及发布后的错误处理。
{"title":"Common Flaws in Running Human Evaluation Experiments in NLP","authors":"Craig Thomson, Ehud Reiter, Anya Belz","doi":"10.1162/coli_a_00508","DOIUrl":"https://doi.org/10.1162/coli_a_00508","url":null,"abstract":"While conducting a coordinated set of repeat runs of human evaluation experiments in NLP, we discovered flaws in every single experiment we selected for inclusion via a systematic process. In this paper, we describe the types of flaws we discovered which include coding errors (e.g., loading the wrong system outputs to evaluate), failure to follow standard scientific practice (e.g., ad hoc exclusion of participants and responses), and mistakes in reported numerical results (e.g., reported numbers not matching experimental data). If these problems are widespread, it would have worrying implications for the rigour of NLP evaluation experiments as currently conducted. We discuss what researchers can do to reduce the occurrence of such flaws, including pre-registration, better code development practices, increased testing and piloting, and post-publication addressing of errors.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"45 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139415273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian approach to uncertainty in word embedding bias estimation 用贝叶斯方法估算词嵌入偏差的不确定性
IF 9.3 2区 计算机科学 Pub Date : 2024-01-08 DOI: 10.1162/coli_a_00507
Alicja Dobrzeniecka, Rafal Urbaniak
Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using such methods even if the data are generated by a null model lacking the intended bias. Consequently, we argue that this approach generates false confidence. To address this issue, we propose a Bayesian alternative: hierarchical Bayesian modeling, which enables a more uncertainty-sensitive inspection of bias in word embeddings at different levels of granularity. To showcase our method, we apply it to Religion, Gender, and Race word lists from the original research, together with our control neutral word lists. We deploy the method using Google, Glove, and Reddit embeddings. Further, we utilize our approach to evaluate a debiasing technique applied to the Reddit word embedding. Our findings reveal a more complex landscape than suggested by the proponents of single-number metrics. The datasets and source code for the paper are publicly available.
WEAT 或 MAC 等多种度量方法都试图用单一数字度量词嵌入中存在的偏差大小。然而,这些指标和相关的统计显著性计算依赖于将预平均数据视为单个数据点,并采用低样本量的引导技术。我们的研究表明,即使数据是由缺乏预期偏差的空模型生成的,使用这种方法也能轻松获得类似的结果。因此,我们认为这种方法会产生错误的置信度。为了解决这个问题,我们提出了一种贝叶斯替代方法:分层贝叶斯建模,它可以在不同粒度水平上对词嵌入中的偏差进行对不确定性更加敏感的检验。为了展示我们的方法,我们将其应用于原始研究中的宗教、性别和种族词表,以及我们的对照中性词表。我们使用 Google、Glove 和 Reddit 嵌入来部署该方法。此外,我们还利用我们的方法对应用于 Reddit 词嵌入的去伪存真技术进行了评估。我们的研究结果揭示了比单一数字度量支持者所认为的更为复杂的情况。本文的数据集和源代码均可公开获取。
{"title":"A Bayesian approach to uncertainty in word embedding bias estimation","authors":"Alicja Dobrzeniecka, Rafal Urbaniak","doi":"10.1162/coli_a_00507","DOIUrl":"https://doi.org/10.1162/coli_a_00507","url":null,"abstract":"Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using such methods even if the data are generated by a null model lacking the intended bias. Consequently, we argue that this approach generates false confidence. To address this issue, we propose a Bayesian alternative: hierarchical Bayesian modeling, which enables a more uncertainty-sensitive inspection of bias in word embeddings at different levels of granularity. To showcase our method, we apply it to Religion, Gender, and Race word lists from the original research, together with our control neutral word lists. We deploy the method using Google, Glove, and Reddit embeddings. Further, we utilize our approach to evaluate a debiasing technique applied to the Reddit word embedding. Our findings reveal a more complex landscape than suggested by the proponents of single-number metrics. The datasets and source code for the paper are publicly available.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"14 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139409608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Cross-linguistic Utility of Abstract Meaning Representation 评估抽象意义表征的跨语言效用
IF 9.3 2区 计算机科学 Pub Date : 2023-12-19 DOI: 10.1162/coli_a_00503
Shira Wein, Nathan Schneider
Semantic representations capture the meaning of a text. Meaning Representation (AMR), a type of semantic representation, focuses on predicate-argument structure and abstracts away from surface form. Though AMR was developed initially for English, it has now been adapted to a multitude of languages in the form of non-English annotation schemas, cross-lingual text-to-AMR parsing, and AMR-to-(non-English) text generation. We advance prior work on cross-lingual AMR by thoroughly investigating the amount, types, and causes of differences which appear in AMRs of different languages. Further, we compare how AMR captures meaning in cross-lingual pairs versus strings, and show that AMR graphs are able to draw out fine-grained differences between parallel sentences. We explore three primary research questions: (1) What are the types and causes of differences in parallel AMRs? (2) How can we measure the amount of difference between AMR pairs in different languages? (3) Given that AMR structure is affected by language and exhibits cross-lingual differences, how do cross-lingual AMR pairs compare to string-based representations of cross-lingual sentence pairs? We find that the source language itself does have a measurable impact on AMR structure, and that translation divergences and annotator choices also lead to differences in cross-lingual AMR pairs. We explore the implications of this finding throughout our study, concluding that, while AMR is useful to capture meaning across languages, evaluations need to take into account source language influences if they are to paint an accurate picture of system output, and meaning generally.
语义表征捕捉文本的意义。意义表征(AMR)是语义表征的一种,它侧重于谓词-论据结构,并从表面形式中抽象出来。虽然 AMR 最初是针对英语开发的,但现在它已通过非英语注释模式、跨语言文本到 AMR 的解析以及 AMR 到(非英语)文本的生成等形式适用于多种语言。通过深入研究不同语言 AMR 中出现的差异的数量、类型和原因,我们推进了之前在跨语言 AMR 方面的工作。此外,我们还比较了 AMR 如何捕捉跨语言对和字符串中的意义,并表明 AMR 图能够找出平行句子之间的细粒度差异。我们探讨了三个主要研究问题:(1) 平行 AMR 差异的类型和原因是什么?(2) 如何测量不同语言中 AMR 对之间的差异量?(3) 既然 AMR 结构受语言影响并表现出跨语言差异,那么跨语言 AMR 对与跨语言句子对的基于字符串的表述相比如何?我们发现,源语言本身确实对 AMR 结构有可衡量的影响,而翻译差异和注释者的选择也会导致跨语言 AMR 对的差异。我们在整个研究过程中探讨了这一发现的意义,并得出结论:虽然 AMR 对于捕捉跨语言意义非常有用,但如果要准确描绘系统输出和一般意义,评估需要考虑源语言的影响。
{"title":"Assessing the Cross-linguistic Utility of Abstract Meaning Representation","authors":"Shira Wein, Nathan Schneider","doi":"10.1162/coli_a_00503","DOIUrl":"https://doi.org/10.1162/coli_a_00503","url":null,"abstract":"Semantic representations capture the meaning of a text. Meaning Representation (AMR), a type of semantic representation, focuses on predicate-argument structure and abstracts away from surface form. Though AMR was developed initially for English, it has now been adapted to a multitude of languages in the form of non-English annotation schemas, cross-lingual text-to-AMR parsing, and AMR-to-(non-English) text generation. We advance prior work on cross-lingual AMR by thoroughly investigating the amount, types, and causes of differences which appear in AMRs of different languages. Further, we compare how AMR captures meaning in cross-lingual pairs versus strings, and show that AMR graphs are able to draw out fine-grained differences between parallel sentences. We explore three primary research questions: (1) What are the types and causes of differences in parallel AMRs? (2) How can we measure the amount of difference between AMR pairs in different languages? (3) Given that AMR structure is affected by language and exhibits cross-lingual differences, how do cross-lingual AMR pairs compare to string-based representations of cross-lingual sentence pairs? We find that the source language itself does have a measurable impact on AMR structure, and that translation divergences and annotator choices also lead to differences in cross-lingual AMR pairs. We explore the implications of this finding throughout our study, concluding that, while AMR is useful to capture meaning across languages, evaluations need to take into account source language influences if they are to paint an accurate picture of system output, and meaning generally.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"3 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138819238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UG-schematic Annotation for Event Nominals: A Case Study in Mandarin Chinese 事件命名词的 UG-schematic 注释:普通话案例研究
IF 9.3 2区 计算机科学 Pub Date : 2023-12-19 DOI: 10.1162/coli_a_00504
Wenxi Li, Guy Emerson, Yutong Zhang, Weiwei Sun
Divergence of languages observed at the surface level is a major challenge encountered by multilingual data representation, especially when typologically distant languages are involved. Drawing inspirations from a formalist Chomskyan perspective towards language universals, Universal Grammar (UG), this article employs deductively pre-defined universals to analyse a multilingually heterogeneous phenomenon, event nominals. In this way, deeper universality of event nominals beneath their huge divergence in different languages is uncovered, which empowers us to break barriers between languages and thus extend insights from some synthetic languages to a non-inflectional language, Mandarin Chinese. Our empirical investigation also demonstrates this UG-inspired schema is effective: with its assistance, the inter-annotator agreement (IAA) for identifying event nominals in Mandarin grows from 88.02% to 94.99%, and automatic detection of event-reading nominalizations on the newly-established data achieves an accuracy of 94.76% and an F1 score of 91.3%, which are significantly surpass those achieved on the pre-existing resource by 9.8% and 5.2% respectively. Our systematic analysis also sheds light on nominal semantic role labelling (SRL). By providing a clear definition and classification on arguments of event nominal, the IAA of this task significantly increases from 90.46% to 98.04%.
在表层观察到的语言差异是多语言数据表示所面临的一大挑战,尤其是涉及到类型学上相距甚远的语言时。本文从形式主义乔姆斯基的语言普遍性视角--普遍语法(UG)中汲取灵感,采用演绎预设普遍性来分析多语言异质现象--事件称谓语。通过这种方法,我们发现了事件称谓语在不同语言中巨大差异下的深层普遍性,这使我们有能力打破语言之间的障碍,从而将从一些合成语言中获得的见解扩展到非会意语言--汉语普通话中。我们的实证调查也证明了这种受 UG 启发的模式是有效的:在它的帮助下,普通话中识别事件名词的标注者间一致率(IAA)从 88.02% 提高到了 94.99%,在新建立的数据上自动检测事件阅读名词的准确率达到了 94.76%,F1 得分为 91.3%,分别比在原有资源上的准确率和 F1 得分高出 9.8% 和 5.2%。我们的系统分析还揭示了名义语义角色标签(SRL)。通过对事件名词的参数进行明确的定义和分类,这项任务的 IAA 从 90.46% 显著提高到 98.04%。
{"title":"UG-schematic Annotation for Event Nominals: A Case Study in Mandarin Chinese","authors":"Wenxi Li, Guy Emerson, Yutong Zhang, Weiwei Sun","doi":"10.1162/coli_a_00504","DOIUrl":"https://doi.org/10.1162/coli_a_00504","url":null,"abstract":"Divergence of languages observed at the surface level is a major challenge encountered by multilingual data representation, especially when typologically distant languages are involved. Drawing inspirations from a formalist Chomskyan perspective towards language universals, Universal Grammar (UG), this article employs deductively pre-defined universals to analyse a multilingually heterogeneous phenomenon, event nominals. In this way, deeper universality of event nominals beneath their huge divergence in different languages is uncovered, which empowers us to break barriers between languages and thus extend insights from some synthetic languages to a non-inflectional language, Mandarin Chinese. Our empirical investigation also demonstrates this UG-inspired schema is effective: with its assistance, the inter-annotator agreement (IAA) for identifying event nominals in Mandarin grows from 88.02% to 94.99%, and automatic detection of event-reading nominalizations on the newly-established data achieves an accuracy of 94.76% and an F1 score of 91.3%, which are significantly surpass those achieved on the pre-existing resource by 9.8% and 5.2% respectively. Our systematic analysis also sheds light on nominal semantic role labelling (SRL). By providing a clear definition and classification on arguments of event nominal, the IAA of this task significantly increases from 90.46% to 98.04%.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"3 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138819390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Large Language Models Transform Computational Social Science? 大型语言模型能否改变计算社会科学?
IF 9.3 2区 计算机科学 Pub Date : 2023-12-12 DOI: 10.1162/coli_a_00502
Caleb Ziems, Omar Shaikh, Zhehao Zhang, William Held, Jiaao Chen, Diyi Yang
Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references.We conclude that the performance of today’s LLMs can augment the CSS research pipeline in two ways: (1) serving as zeroshot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans.
大型语言模型(LLMs)能够成功地完成许多语言处理任务,无需训练数据。如果零点 LLM 也能可靠地分类和解释诸如说服力和政治意识形态等社会现象,那么 LLM 就能以重要的方式增强计算社会科学(CSS)管道。这项工作提供了将 LLMs 用作 CSS 工具的路线图。为此,我们提供了一套提示最佳实践和一个广泛的评估管道,用于测量 13 个语言模型在 25 个具有代表性的英语 CSS 基准上的零点性能。在分类标注任务(分类)上,LLMs 的表现未能超过最佳微调模型,但仍能达到与人类相当的一致水平。我们的结论是,当今 LLM 的性能可以通过以下两种方式增强 CSS 研究:(1) 在人类注释团队中充当 zeroshot 数据注释员;(2) 引导具有挑战性的创造性生成任务(例如解释文本的基本属性)。总之,LLM 可以与人类合作,有意义地参与社会科学分析。
{"title":"Can Large Language Models Transform Computational Social Science?","authors":"Caleb Ziems, Omar Shaikh, Zhehao Zhang, William Held, Jiaao Chen, Diyi Yang","doi":"10.1162/coli_a_00502","DOIUrl":"https://doi.org/10.1162/coli_a_00502","url":null,"abstract":"Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers' gold references.We conclude that the performance of today’s LLMs can augment the CSS research pipeline in two ways: (1) serving as zeroshot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"103 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stance Detection with Explanations 姿态检测与解释
IF 9.3 2区 计算机科学 Pub Date : 2023-12-12 DOI: 10.1162/coli_a_00501
Rudra Ranajee Saha, Raymond T. Ng, Laks V. S. Lakshmanan
Identification of stance has recently gained a lot of attention with the extreme growth of fake news and filter bubbles. Over the last decade, many feature-based and deep-learning approaches have been proposed to solve Stance Detection. However, almost none of the existing works focus on providing a meaningful explanation for their prediction. In this work, we study Stance Detection with an emphasis on generating explanations for the predicted stance by capturing the pivotal argumentative structure embedded in a document. We propose to build a Stance Tree which utilizes Rhetorical Parsing to construct an evidence tree and to use Dempster Shafer Theory to aggregate the evidence. Human studies show that our unsupervised technique of generating stance explanations outperforms the SOTA extractive summarization method in terms of informativeness, non-redundancy, coverage, and overall quality. Furthermore, experiments show that our explanation-based stance prediction excels or matches the performance of the SOTA model on various benchmark datasets.
最近,随着假新闻和过滤泡沫的急剧增长,立场识别受到了广泛关注。在过去十年中,人们提出了许多基于特征和深度学习的方法来解决立场检测问题。然而,几乎所有现有作品都没有专注于为其预测提供有意义的解释。在这项工作中,我们研究了立场检测,重点是通过捕捉文档中嵌入的关键论证结构,为预测的立场生成解释。我们建议建立立场树,利用修辞解析法来构建证据树,并使用 Dempster Shafer 理论来汇总证据。人工研究表明,我们的无监督立场解释生成技术在信息量、非冗余性、覆盖率和整体质量方面都优于 SOTA 提取摘要方法。此外,实验表明,我们基于解释的立场预测在各种基准数据集上的表现都优于或与 SOTA 模型不相上下。
{"title":"Stance Detection with Explanations","authors":"Rudra Ranajee Saha, Raymond T. Ng, Laks V. S. Lakshmanan","doi":"10.1162/coli_a_00501","DOIUrl":"https://doi.org/10.1162/coli_a_00501","url":null,"abstract":"Identification of stance has recently gained a lot of attention with the extreme growth of fake news and filter bubbles. Over the last decade, many feature-based and deep-learning approaches have been proposed to solve Stance Detection. However, almost none of the existing works focus on providing a meaningful explanation for their prediction. In this work, we study Stance Detection with an emphasis on generating explanations for the predicted stance by capturing the pivotal argumentative structure embedded in a document. We propose to build a Stance Tree which utilizes Rhetorical Parsing to construct an evidence tree and to use Dempster Shafer Theory to aggregate the evidence. Human studies show that our unsupervised technique of generating stance explanations outperforms the SOTA extractive summarization method in terms of informativeness, non-redundancy, coverage, and overall quality. Furthermore, experiments show that our explanation-based stance prediction excels or matches the performance of the SOTA model on various benchmark datasets.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"18 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polysemy - Evidence from Linguistics, Behavioural Science and Contextualised Language Models 多义词--来自语言学、行为科学和语境化语言模型的证据
IF 9.3 2区 计算机科学 Pub Date : 2023-12-12 DOI: 10.1162/coli_a_00500
Janosch Haber, Massimo Poesio
Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysemy precludes simple, universal answers, especially concerning the representation and processing of polysemous words. But fuelled by the growing availability of large, crowdsourced datasets providing substantial empirical evidence; improved behavioral methodology; and the development of contextualised language models capable of encoding the fine-grained meaning of a word within a given context, the literature on polysemy recently has developed more complex theoretical analyses. In this survey we discuss these recent contributions to the investigation of polysemy against the backdrop of a long legacy of research across multiple decades and disciplines. Our aim is to bring together different perspectives to achieve a more complete picture of the heterogeneity and complexity of the phenomenon of polysemy. Specifically, we highlight evidence supporting a range of hybrid models of the mental processing of polysemes. These hybrid models combine elements from different previous theoretical approaches to explain patterns and idiosyncrasies in the processing of polysemous that the best known models so far have failed to account for. Our literature review finds that i) traditional analyses of polysemy can be limited in their generalisability by loose definitions and selective materials; ii) linguistic tests provide useful evidence on individual cases, but fail to capture the full range of factors involved in the processing of polysemous sense extensions; and iii) recent behavioural (psycho) linguistics studies, largescale annotation efforts and investigations leveraging contextualised language models provide accumulating evidence suggesting that polysemous sense similarity covers a wide spectrum between identity of sense and homonymy-like unrelatedness of meaning. We hope that the interdisciplinary account of polysemy provided in this survey inspires further fundamental research on the nature of polysemy and better equips applied research to deal with the complexity surrounding the phenomenon, e.g. by enabling the development of benchmarks and testing paradigms for large language models informed by a greater portion of the rich evidence on the phenomenon currently available.
多义词是一种词汇歧义,一个词有多种不同但相关的解释。在过去十年中,语言学、心理学、神经科学和计算语言学等多个学科都对多义词进行了大量研究,这些研究越来越清楚地表明,多义词的复杂性排除了简单、普遍的答案,尤其是在多义词的表征和处理方面。但是,随着提供大量实证证据的大型众包数据集的日益普及、行为学方法论的改进以及能够在特定语境中编码单词细粒度含义的语境化语言模型的发展,有关多义词的文献最近发展出了更为复杂的理论分析。在本调查报告中,我们将以跨越数十年和多个学科的长期研究成果为背景,讨论近期对多义词研究的贡献。我们的目的是汇集不同的观点,以更全面地了解多义词现象的异质性和复杂性。具体来说,我们强调了支持多义词心理加工的一系列混合模型的证据。这些混合模型结合了以往不同理论方法的要素,以解释多义词加工过程中的模式和特异性,而迄今为止已知的最佳模型都未能解释这些模式和特异性。我们的文献综述发现:i) 传统的多义词分析可能因定义不严谨和材料选择性而限制了其普遍性;ii) 语言测试为个别案例提供了有用的证据,但未能捕捉到多义词意义扩展处理过程中所涉及的全部因素;iii) 最近的行为(心理)语言学研究、大规模注释工作和利用语境化语言模型的调查提供了越来越多的证据,表明多义词意义相似性涵盖了意义同一性和类似同义词的意义不相关性之间的广泛范围。我们希望本调查报告中对多义词的跨学科论述能够进一步激发对多义词本质的基础研究,并使应用研究更好地应对围绕多义词现象的复杂性,例如,通过开发大型语言模型的基准和测试范式,让更多有关多义词现象的丰富证据为我们所用。
{"title":"Polysemy - Evidence from Linguistics, Behavioural Science and Contextualised Language Models","authors":"Janosch Haber, Massimo Poesio","doi":"10.1162/coli_a_00500","DOIUrl":"https://doi.org/10.1162/coli_a_00500","url":null,"abstract":"Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysemy precludes simple, universal answers, especially concerning the representation and processing of polysemous words. But fuelled by the growing availability of large, crowdsourced datasets providing substantial empirical evidence; improved behavioral methodology; and the development of contextualised language models capable of encoding the fine-grained meaning of a word within a given context, the literature on polysemy recently has developed more complex theoretical analyses. In this survey we discuss these recent contributions to the investigation of polysemy against the backdrop of a long legacy of research across multiple decades and disciplines. Our aim is to bring together different perspectives to achieve a more complete picture of the heterogeneity and complexity of the phenomenon of polysemy. Specifically, we highlight evidence supporting a range of hybrid models of the mental processing of polysemes. These hybrid models combine elements from different previous theoretical approaches to explain patterns and idiosyncrasies in the processing of polysemous that the best known models so far have failed to account for. Our literature review finds that i) traditional analyses of polysemy can be limited in their generalisability by loose definitions and selective materials; ii) linguistic tests provide useful evidence on individual cases, but fail to capture the full range of factors involved in the processing of polysemous sense extensions; and iii) recent behavioural (psycho) linguistics studies, largescale annotation efforts and investigations leveraging contextualised language models provide accumulating evidence suggesting that polysemous sense similarity covers a wide spectrum between identity of sense and homonymy-like unrelatedness of meaning. We hope that the interdisciplinary account of polysemy provided in this survey inspires further fundamental research on the nature of polysemy and better equips applied research to deal with the complexity surrounding the phenomenon, e.g. by enabling the development of benchmarks and testing paradigms for large language models informed by a greater portion of the rich evidence on the phenomenon currently available.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"8 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
My Big, Fat 50-Year Journey 我的大胖 50 年历程
IF 9.3 2区 计算机科学 Pub Date : 2023-12-06 DOI: 10.1162/coli_a_00499
Martha Palmer
{"title":"My Big, Fat 50-Year Journey","authors":"Martha Palmer","doi":"10.1162/coli_a_00499","DOIUrl":"https://doi.org/10.1162/coli_a_00499","url":null,"abstract":"","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"16 7","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138597249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1