首页 > 最新文献

Findings (Sydney (N.S.W.)最新文献

英文 中文
PLACES: Prompting Language Models for Social Conversation Synthesis PLACES:Social Conversation Synthesis的提示语言模型
Pub Date : 2023-02-07 DOI: 10.48550/arXiv.2302.03269
Maximillian Chen, A. Papangelis, Chenyang Tao, Seokhwan Kim, Andrew Rosenbaum, Yang Liu, Zhou Yu, Dilek Z. Hakkani-Tür
Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.
由于隐私、道德或类似问题,收集高质量的会话数据对大多数应用程序来说可能非常昂贵,对其他应用程序来说则不可行。解决这个问题的一个有希望的方向是通过提示大型语言模型来生成合成对话。在这项工作中,我们使用一小组专家书面对话作为上下文示例,使用提示合成社交对话数据集。与人类收集的对话相比,我们对我们的合成对话进行了几次彻底的评估。这包括直接在合成对话上进行人工评估的对话质量的各个维度,以及在合成生成的数据集上微调的聊天机器人的交互式人工评估。我们还证明,这种提示方法可推广到多方对话,为多方任务创建新的合成数据提供了潜力。与从人类收集的多方数据集中采样的对话摘录相比,我们的合成多方对话在所有测量维度上都得到了更高的评价。
{"title":"PLACES: Prompting Language Models for Social Conversation Synthesis","authors":"Maximillian Chen, A. Papangelis, Chenyang Tao, Seokhwan Kim, Andrew Rosenbaum, Yang Liu, Zhou Yu, Dilek Z. Hakkani-Tür","doi":"10.48550/arXiv.2302.03269","DOIUrl":"https://doi.org/10.48550/arXiv.2302.03269","url":null,"abstract":"Collecting high quality conversational data can be very expensive for most applications and infeasible for others due to privacy, ethical, or similar concerns. A promising direction to tackle this problem is to generate synthetic dialogues by prompting large language models. In this work, we use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting. We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations. This includes various dimensions of conversation quality with human evaluation directly on the synthesized conversations, and interactive human evaluation of chatbots fine-tuned on the synthetically generated dataset. We additionally demonstrate that this prompting approach is generalizable to multi-party conversations, providing potential to create new synthetic data for multi-party tasks. Our synthetic multi-party conversations were rated more favorably across all measured dimensions compared to conversation excerpts sampled from a human-collected multi-party dataset.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"814-838"},"PeriodicalIF":0.0,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49430947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
It’s about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits 这是关于时间的:重新思考使用时间分裂的谣言检测基准评估
Pub Date : 2023-02-06 DOI: 10.48550/arXiv.2302.03147
Yida Mu, Kalina Bontcheva, Nikolaos Aletras
New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits. Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models. Therefore, we suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.
随着时间的推移,新事件的出现会影响社交媒体上的谣言话题。目前的谣言检测基准使用随机分割作为训练、开发和测试集,这通常会导致主题重叠。因此,由于时间概念漂移,在随机分裂上训练的模型可能不能很好地对以前未见过的主题进行谣言分类。在本文中,我们对四种流行的谣言检测基准的分类模型进行了重新评估,考虑了时间顺序而不是随机分裂。我们的实验结果表明,使用随机分割可以显著高估所有数据集和模型的预测性能。因此,我们建议谣言检测模型应该始终使用时间间隔来评估,以尽量减少主题重叠。
{"title":"It’s about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits","authors":"Yida Mu, Kalina Bontcheva, Nikolaos Aletras","doi":"10.48550/arXiv.2302.03147","DOIUrl":"https://doi.org/10.48550/arXiv.2302.03147","url":null,"abstract":"New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits. Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models. Therefore, we suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"724-731"},"PeriodicalIF":0.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46515223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion 利用门控融合提高NLP模型升级中预测的后向兼容性
Pub Date : 2023-02-04 DOI: 10.48550/arXiv.2302.02080
Yi-An Lai, Elman Mansimov, Yuqing Xie, Yan Zhang
When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. This inconsistent behavior during model upgrade often outweighs the benefits of accuracy gain and hinders the adoption of new models. To mitigate regression errors from model upgrade, distillation and ensemble have proven to be viable solutions without significant compromise in performance. Despite the progress, these approaches attained an incremental reduction in regression which is still far from achieving backward-compatible model upgrade. In this work, we propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models. Empirical results on two distinct model upgrade scenarios show that our method reduces the number of regression errors by 62% on average, outperforming the strongest baseline by an average of 25%.
当将神经模型升级到较新的版本时,可能会引入旧版本中没有遇到的新错误,称为回归错误。在模型升级过程中,这种不一致的行为往往超过了准确性获得的好处,并阻碍了新模型的采用。为了减轻模型升级带来的回归误差,蒸馏和集成已被证明是可行的解决方案,而不会对性能造成重大影响。尽管取得了进展,但这些方法实现了回归的增量减少,这仍然远远不能实现向后兼容的模型升级。在这项工作中,我们提出了一种新的方法,门控融合,通过学习混合新旧模型之间的预测来促进向后兼容性。在两种不同的模型升级场景下的实证结果表明,我们的方法平均减少了62%的回归误差,比最强基线平均高出25%。
{"title":"Improving Prediction Backward-Compatiblility in NLP Model Upgrade with Gated Fusion","authors":"Yi-An Lai, Elman Mansimov, Yuqing Xie, Yan Zhang","doi":"10.48550/arXiv.2302.02080","DOIUrl":"https://doi.org/10.48550/arXiv.2302.02080","url":null,"abstract":"When upgrading neural models to a newer version, new errors that were not encountered in the legacy version can be introduced, known as regression errors. This inconsistent behavior during model upgrade often outweighs the benefits of accuracy gain and hinders the adoption of new models. To mitigate regression errors from model upgrade, distillation and ensemble have proven to be viable solutions without significant compromise in performance. Despite the progress, these approaches attained an incremental reduction in regression which is still far from achieving backward-compatible model upgrade. In this work, we propose a novel method, Gated Fusion, that promotes backward compatibility via learning to mix predictions between old and new models. Empirical results on two distinct model upgrade scenarios show that our method reduces the number of regression errors by 62% on average, outperforming the strongest baseline by an average of 25%.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"980-992"},"PeriodicalIF":0.0,"publicationDate":"2023-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46593370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Zero-shot Transfer of Article-aware Legal Outcome Classification for European Court of Human Rights Cases 欧洲人权法院案件中条款意识法律结果分类的零枪转移
Pub Date : 2023-02-01 DOI: 10.48550/arXiv.2302.00609
Santosh T.Y.S.S, O. Ichim, Matthias Grabmair
In this paper, we cast Legal Judgment Prediction on European Court of Human Rights cases into an article-aware classification task, where the case outcome is classified from a combined input of case facts and convention articles. This configuration facilitates the model learning some legal reasoning ability in mapping article text to specific case fact text. It also provides an opportunity to evaluate the model’s ability to generalize to zero-shot settings when asked to classify the case outcome with respect to articles not seen during training. We devise zero-shot experiments and apply domain adaptation methods based on domain discrimination and Wasserstein distance. Our results demonstrate that the article-aware architecture outperforms straightforward fact classification. We also find that domain adaptation methods improve zero-shot transfer performance, with article relatedness and encoder pre-training influencing the effect.
在本文中,我们将欧洲人权法院案件的法律判决预测转换为一个条款意识分类任务,其中案件结果从案件事实和惯例条款的组合输入中分类。这种配置有助于模型在将文章文本映射到具体案例事实文本时学习一些法律推理能力。当被要求对训练期间未见的文章的案例结果进行分类时,它还提供了一个评估模型泛化到零射击设置的能力的机会。我们设计了零射击实验,并应用了基于域判别和Wasserstein距离的域自适应方法。我们的结果表明,文章感知架构优于直接的事实分类。我们还发现,领域自适应方法提高了零镜头传输性能,文章相关性和编码器预训练影响了效果。
{"title":"Zero-shot Transfer of Article-aware Legal Outcome Classification for European Court of Human Rights Cases","authors":"Santosh T.Y.S.S, O. Ichim, Matthias Grabmair","doi":"10.48550/arXiv.2302.00609","DOIUrl":"https://doi.org/10.48550/arXiv.2302.00609","url":null,"abstract":"In this paper, we cast Legal Judgment Prediction on European Court of Human Rights cases into an article-aware classification task, where the case outcome is classified from a combined input of case facts and convention articles. This configuration facilitates the model learning some legal reasoning ability in mapping article text to specific case fact text. It also provides an opportunity to evaluate the model’s ability to generalize to zero-shot settings when asked to classify the case outcome with respect to articles not seen during training. We devise zero-shot experiments and apply domain adaptation methods based on domain discrimination and Wasserstein distance. Our results demonstrate that the article-aware architecture outperforms straightforward fact classification. We also find that domain adaptation methods improve zero-shot transfer performance, with article relatedness and encoder pre-training influencing the effect.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"593-605"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43735110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference AmbiCoref:评估人类和模型对模糊共指的敏感性
Pub Date : 2023-02-01 DOI: 10.48550/arXiv.2302.00762
Yuewei Yuan, Chaitanya Malaviya, Mark Yatskar
Given a sentence “Abby told Brittney that she upset Courtney”, one would struggle to understand who “she” refers to, and ask for clarification. However, if the word “upset” were replaced with “hugged”, “she” unambiguously refers to Abby. We study if modern coreference resolution models are sensitive to such pronominal ambiguity. To this end, we construct AmbiCoref, a diagnostic corpus of minimal sentence pairs with ambiguous and unambiguous referents. Our examples generalize psycholinguistic studies of human perception of ambiguity around particular arrangements of verbs and their arguments. Analysis shows that (1) humans are less sure of referents in ambiguous AmbiCoref examples than unambiguous ones, and (2) most coreference models show little difference in output between ambiguous and unambiguous pairs. We release AmbiCoref as a diagnostic corpus for testing whether models treat ambiguity similarly to humans.
如果有一句话“艾比告诉布里特尼她惹恼了考特尼”,人们会很难理解“她”指的是谁,并要求澄清。然而,如果“心烦意乱”一词被替换为“拥抱”,“她”毫不含糊地指的是艾比。我们研究了现代共指消解模型是否对这种代词歧义敏感。为此,我们构建了AmbiCoref,这是一个由具有歧义和不歧义指称的最小句子对组成的诊断语料库。我们的例子概括了心理语言学对人类对动词及其论据的特定排列的歧义感知的研究。分析表明:(1)在歧义的AmbiCoref例子中,人类对指称的把握不如不歧义的,(2)大多数共指称模型在歧义对和不歧义对之间的输出几乎没有差异。我们发布了AmbiCoref作为一个诊断语料库,用于测试模型是否像对待人类一样对待歧义。
{"title":"AmbiCoref: Evaluating Human and Model Sensitivity to Ambiguous Coreference","authors":"Yuewei Yuan, Chaitanya Malaviya, Mark Yatskar","doi":"10.48550/arXiv.2302.00762","DOIUrl":"https://doi.org/10.48550/arXiv.2302.00762","url":null,"abstract":"Given a sentence “Abby told Brittney that she upset Courtney”, one would struggle to understand who “she” refers to, and ask for clarification. However, if the word “upset” were replaced with “hugged”, “she” unambiguously refers to Abby. We study if modern coreference resolution models are sensitive to such pronominal ambiguity. To this end, we construct AmbiCoref, a diagnostic corpus of minimal sentence pairs with ambiguous and unambiguous referents. Our examples generalize psycholinguistic studies of human perception of ambiguity around particular arrangements of verbs and their arguments. Analysis shows that (1) humans are less sure of referents in ambiguous AmbiCoref examples than unambiguous ones, and (2) most coreference models show little difference in output between ambiguous and unambiguous pairs. We release AmbiCoref as a diagnostic corpus for testing whether models treat ambiguity similarly to humans.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"993-1000"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41984843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sentence Identification with BOS and EOS Label Combinations BOS和EOS标签组合的句子识别
Pub Date : 2023-01-31 DOI: 10.48550/arXiv.2301.13352
Takuma Udagawa, H. Kanayama, Issei Yoshida
The sentence is a fundamental unit in many NLP applications. Sentence segmentation is widely used as the first preprocessing task, where an input text is split into consecutive sentences considering the end of the sentence (EOS) as their boundaries. This task formulation relies on a strong assumption that the input text consists only of sentences, or what we call the sentential units (SUs). However, real-world texts often contain non-sentential units (NSUs) such as metadata, sentence fragments, nonlinguistic markers, etc. which are unreasonable or undesirable to be treated as a part of an SU. To tackle this issue, we formulate a novel task of sentence identification, where the goal is to identify SUs while excluding NSUs in a given text. To conduct sentence identification, we propose a simple yet effective method which combines the beginning of the sentence (BOS) and EOS labels to determine the most probable SUs and NSUs based on dynamic programming. To evaluate this task, we design an automatic, language-independent procedure to convert the Universal Dependencies corpora into sentence identification benchmarks. Finally, our experiments on the sentence identification task demonstrate that our proposed method generally outperforms sentence segmentation baselines which only utilize EOS labels.
在许多NLP应用中,句子是一个基本单元。句子分割被广泛用作第一个预处理任务,其中输入文本被分割成连续的句子,将句子的结尾(EOS)作为它们的边界。这个任务公式依赖于一个强有力的假设,即输入文本只由句子组成,或者我们称之为句子单元(SU)。然而,现实世界的文本通常包含非句子单元(NSU),如元数据、句子片段、非语言标记等,这些单元被视为SU的一部分是不合理或不可取的。为了解决这个问题,我们制定了一个新的句子识别任务,目标是识别SU,同时排除给定文本中的NSU。为了进行句子识别,我们提出了一种简单而有效的方法,该方法结合句子开头(BOS)和EOS标签,基于动态规划来确定最可能的SU和NSU。为了评估这项任务,我们设计了一个自动的、独立于语言的过程,将通用依赖语料库转换为句子识别基准。最后,我们在句子识别任务上的实验表明,我们提出的方法通常优于仅使用EOS标签的句子分割基线。
{"title":"Sentence Identification with BOS and EOS Label Combinations","authors":"Takuma Udagawa, H. Kanayama, Issei Yoshida","doi":"10.48550/arXiv.2301.13352","DOIUrl":"https://doi.org/10.48550/arXiv.2301.13352","url":null,"abstract":"The sentence is a fundamental unit in many NLP applications. Sentence segmentation is widely used as the first preprocessing task, where an input text is split into consecutive sentences considering the end of the sentence (EOS) as their boundaries. This task formulation relies on a strong assumption that the input text consists only of sentences, or what we call the sentential units (SUs). However, real-world texts often contain non-sentential units (NSUs) such as metadata, sentence fragments, nonlinguistic markers, etc. which are unreasonable or undesirable to be treated as a part of an SU. To tackle this issue, we formulate a novel task of sentence identification, where the goal is to identify SUs while excluding NSUs in a given text. To conduct sentence identification, we propose a simple yet effective method which combines the beginning of the sentence (BOS) and EOS labels to determine the most probable SUs and NSUs based on dynamic programming. To evaluate this task, we design an automatic, language-independent procedure to convert the Universal Dependencies corpora into sentence identification benchmarks. Finally, our experiments on the sentence identification task demonstrate that our proposed method generally outperforms sentence segmentation baselines which only utilize EOS labels.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"343-358"},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42022203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active Learning for Multilingual Semantic Parser 多语言语义解析器的主动学习
Pub Date : 2023-01-30 DOI: 10.48550/arXiv.2301.12920
Zhuang Li, Gholamreza Haffari
Current multilingual semantic parsing (MSP) datasets are almost all collected by translating the utterances in the existing datasets from the resource-rich language to the target language. However, manual translation is costly. To reduce the translation effort, this paper proposes the first active learning procedure for MSP (AL-MSP). AL-MSP selects only a subset from the existing datasets to be translated. We also propose a novel selection method that prioritizes the examples diversifying the logical form structures with more lexical choices, and a novel hyperparameter tuning method that needs no extra annotation cost. Our experiments show that AL-MSP significantly reduces translation costs with ideal selection methods. Our selection method with proper hyperparameters yields better parsing performance than the other baselines on two multilingual datasets.
目前的多语言语义解析(MSP)数据集几乎都是通过将现有数据集中的话语从资源丰富的语言翻译成目标语言来收集的。然而,人工翻译成本高昂。为了减少翻译工作量,本文提出了第一个主动学习程序(AL-MSP)。AL-MSP仅从现有数据集中选择一个子集进行翻译。我们还提出了一种新的选择方法,该方法通过更多的词汇选择来优先选择使逻辑形式结构多样化的例子,以及一种不需要额外注释成本的新的超参数调整方法。我们的实验表明,AL-MSP通过理想的选择方法显著降低了翻译成本。我们的选择方法具有适当的超参数,在两个多语言数据集上产生了比其他基线更好的解析性能。
{"title":"Active Learning for Multilingual Semantic Parser","authors":"Zhuang Li, Gholamreza Haffari","doi":"10.48550/arXiv.2301.12920","DOIUrl":"https://doi.org/10.48550/arXiv.2301.12920","url":null,"abstract":"Current multilingual semantic parsing (MSP) datasets are almost all collected by translating the utterances in the existing datasets from the resource-rich language to the target language. However, manual translation is costly. To reduce the translation effort, this paper proposes the first active learning procedure for MSP (AL-MSP). AL-MSP selects only a subset from the existing datasets to be translated. We also propose a novel selection method that prioritizes the examples diversifying the logical form structures with more lexical choices, and a novel hyperparameter tuning method that needs no extra annotation cost. Our experiments show that AL-MSP significantly reduces translation costs with ideal selection methods. Our selection method with proper hyperparameters yields better parsing performance than the other baselines on two multilingual datasets.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"621-627"},"PeriodicalIF":0.0,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49078829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Crawling The Internal Knowledge-Base of Language Models 语言模型的内部知识库抓取
Pub Date : 2023-01-30 DOI: 10.48550/arXiv.2301.12810
Roi Cohen, Mor Geva, Jonathan Berant, A. Globerson
Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for “crawling” the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.
语言模型是在大量文本上训练的,因此它们的参数可能包含大量的事实知识。由这些模型执行的任何下游任务都隐含地建立在这些事实之上,因此非常希望能够以一种可解释的方式表示这些知识体系。然而,目前还没有这样的表示机制。在这里,我们建议通过从给定的语言模型中提取事实知识图来实现这一目标。我们描述了一个“爬行”语言模型的内部知识库的过程。具体来说,给定一个种子实体,我们围绕它展开一个知识图谱。爬行过程被分解为子任务,通过特别设计的提示来实现,这些提示既控制精度(即没有生成错误的事实),也控制召回率(即生成的事实数量)。我们在从数十个种子实体开始抓取的图上评估了我们的方法,并表明它产生了高精度的图(82-92%),同时每个实体发出了合理数量的事实。
{"title":"Crawling The Internal Knowledge-Base of Language Models","authors":"Roi Cohen, Mor Geva, Jonathan Berant, A. Globerson","doi":"10.48550/arXiv.2301.12810","DOIUrl":"https://doi.org/10.48550/arXiv.2301.12810","url":null,"abstract":"Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation.Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for “crawling” the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"1811-1824"},"PeriodicalIF":0.0,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47110140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Learning the Effects of Physical Actions in a Multi-modal Environment 在多模态环境中学习身体动作的效果
Pub Date : 2023-01-27 DOI: 10.48550/arXiv.2301.11845
Gautier Dagan, Frank Keller, A. Lascarides
Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action’s outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs (images and text). Next, we extend an LLM to model latent representations of objects to better predict action outcomes in an environment. We show that multi-modal models can capture physical commonsense when augmented with visual information. Finally, we evaluate our model’s performance on novel actions and objects and find that combining modalities help models to generalize and learn physical commonsense reasoning better.
大型语言模型(LLM)不能充分处理物理常识信息。由于在一个没有实体的环境中接受训练,LLM往往无法预测特定环境中的行动结果。然而,在行动执行之前预测行动的效果在规划中至关重要,因为实现目标通常需要连贯的行动序列。因此,我们引入了仅从现实的感官输入(图像和文本)预测动作结果的多模态任务。接下来,我们将LLM扩展到对对象的潜在表示进行建模,以更好地预测环境中的行动结果。我们证明,当使用视觉信息进行增强时,多模态模型可以捕捉物理常识。最后,我们评估了我们的模型在新动作和对象上的性能,发现结合模态有助于模型更好地泛化和学习物理常识推理。
{"title":"Learning the Effects of Physical Actions in a Multi-modal Environment","authors":"Gautier Dagan, Frank Keller, A. Lascarides","doi":"10.48550/arXiv.2301.11845","DOIUrl":"https://doi.org/10.48550/arXiv.2301.11845","url":null,"abstract":"Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action’s outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs (images and text). Next, we extend an LLM to model latent representations of objects to better predict action outcomes in an environment. We show that multi-modal models can capture physical commonsense when augmented with visual information. Finally, we evaluate our model’s performance on novel actions and objects and find that combining modalities help models to generalize and learn physical commonsense reasoning better.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"133-148"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44458106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Style-Aware Contrastive Learning for Multi-Style Image Captioning 多风格图像字幕的风格感知对比学习
Pub Date : 2023-01-26 DOI: 10.48550/arXiv.2301.11367
Yucheng Zhou, Guodong Long
Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.
现有的多风格图像字幕方法在生成具有准确视觉内容和所需语言风格的字幕方面显示出有希望的结果。然而,现有的方法忽视了语言风格和视觉内容之间的关系。为了克服这一缺点,我们提出了用于多风格图像字幕的风格感知对比学习。首先,我们提出了一种具有对比学习的风格感知视觉编码器,以挖掘与风格相关的潜在视觉内容。此外,我们提出了一个风格感知的三元组对比目标来区分图像、风格和标题是否匹配。为了给对比学习提供正样本和负样本,我们提出了三种检索方案:基于对象的检索、基于RoI的检索和基于三元组的检索,并设计了一个动态权衡函数来计算检索得分。实验结果表明,我们的方法达到了最先进的性能。此外,我们进行了广泛的分析,以验证我们的方法的有效性。
{"title":"Style-Aware Contrastive Learning for Multi-Style Image Captioning","authors":"Yucheng Zhou, Guodong Long","doi":"10.48550/arXiv.2301.11367","DOIUrl":"https://doi.org/10.48550/arXiv.2301.11367","url":null,"abstract":"Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"1 1","pages":"2212-2222"},"PeriodicalIF":0.0,"publicationDate":"2023-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41371909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Findings (Sydney (N.S.W.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1