AACL Bioflux最新文献

英文中文

HaRiM^+: Evaluating Summary Quality with Hallucination Risk HaRiM^+:用幻觉风险评估总结质量

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-11-22 DOI: 10.48550/arXiv.2211.12118

Seonil Son, Junsoo Park, J. Hwang, Junghwa Lee, Hyungjong Noh, Yeonsoo Lee

One of the challenges of developing a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary.

开发摘要模型的挑战之一是难以衡量生成文本的事实不一致性。在本研究中，我们将(Miao et al.， 2021)中提出的解码器过度置信度正则化目标重新解释为幻觉风险测量，以更好地估计生成摘要的质量。我们提出了一个无参考的度量，HaRiM+，它只需要一个现成的总结模型来计算基于令牌可能性的幻觉风险。部署它不需要对模型或特别模块进行额外的训练，这通常需要与人类的判断保持一致。对于摘要质量估计，HaRiM+在三个摘要质量注释集FRANK、QAGS和SummEval上记录了最先进的与人类判断的相关性。我们希望我们的工作，值得使用的摘要模型，促进自动化评估和生成摘要的进展。

引用次数: 1

PESE: Event Structure Extraction using Pointer Network based Encoder-Decoder Architecture 基于指针网络的编码器-解码器结构的事件结构提取

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-11-22 DOI: 10.48550/arXiv.2211.12157

Alapan Kuila, Sudeshan Sarkar

The task of event extraction (EE) aims to find the events and event-related argument information from the text and represent them in a structured format. Most previous works try to solve the problem by separately identifying multiple substructures and aggregating them to get the complete event structure. The problem with the methods is that it fails to identify all the interdependencies among the event participants (event-triggers, arguments, and roles). In this paper, we represent each event record in a unique tuple format that contains trigger phrase, trigger type, argument phrase, and corresponding role information. Our proposed pointer network-based encoder-decoder model generates an event tuple in each time step by exploiting the interactions among event participants and presenting a truly end-to-end solution to the EE task. We evaluate our model on the ACE2005 dataset, and experimental results demonstrate the effectiveness of our model by achieving competitive performance compared to the state-of-the-art methods.

事件提取(EE)任务的目的是从文本中找到事件和与事件相关的参数信息，并以结构化格式表示它们。以往的研究大多是通过单独识别多个子结构并将其聚合以得到完整的事件结构来解决问题。这些方法的问题在于，它无法识别事件参与者(事件触发器、参数和角色)之间的所有相互依赖关系。在本文中，我们用唯一的元组格式表示每个事件记录，其中包含触发器短语、触发器类型、参数短语和相应的角色信息。我们提出的基于指针网络的编码器-解码器模型通过利用事件参与者之间的交互，在每个时间步生成一个事件元组，并为EE任务提供真正的端到端解决方案。我们在ACE2005数据集上评估了我们的模型，实验结果证明了我们的模型的有效性，与最先进的方法相比，我们的模型实现了具有竞争力的性能。

引用次数: 0

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems 对话系统实用自动评价的双方对话采集

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-11-19 DOI: 10.48550/arXiv.2211.10596

Shiki Sato, Yosuke Kishinami, Hiroaki Sugiyama, Reina Akama, Ryoko Tokuhisa, Jun Suzuki

Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the automatic evaluation using the bipartite-play method mitigates these two drawbacks and correlates as strongly with human subjectivity as existing methods.

对话系统评价自动化是对话系统高效开发的动力。本文介绍了一种用于自动评估对话系统的对话收集方法——双方播放法。它解决了现有对话收集方法的局限性:(i)无法与非公开可用的系统进行比较，(ii)容易通过故意选择要比较的系统进行欺骗。实验结果表明，采用双参与法的自动评价方法减轻了这两个缺点，并且与现有方法一样强烈地与人的主观性相关。

引用次数: 2

Local Structure Matters Most in Most Languages 局部结构在大多数语言中最重要

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-11-09 DOI: 10.48550/arXiv.2211.05025

Louis Clouâtre, Prasanna Parthasarathi, A. Zouaq, Sarath Chandar

Many recent perturbation studies have found unintuitive results on what does and does not matter when performing Natural Language Understanding (NLU) tasks in English. Coding properties, such as the order of words, can often be removed through shuffling without impacting downstream performances. Such insight may be used to direct future research into English NLP models. As many improvements in multilingual settings consist of wholesale adaptation of English approaches, it is important to verify whether those studies replicate or not in multilingual settings. In this work, we replicate a study on the importance of local structure, and the relative unimportance of global structure, in a multilingual setting. We find that the phenomenon observed on the English language broadly translates to over 120 languages, with a few caveats.

最近的许多扰动研究发现，在用英语执行自然语言理解(NLU)任务时，什么重要，什么不重要，结果并不直观。编码属性，例如字的顺序，通常可以通过变换来删除，而不会影响下游的性能。这种见解可能用于指导未来对英语NLP模型的研究。由于多语言环境中的许多改进包括对英语方法的全面适应，因此验证这些研究是否在多语言环境中复制是很重要的。在这项工作中，我们复制了一项研究，在多语言环境下，本地结构的重要性和全球结构的相对不重要性。我们发现，在英语中观察到的这种现象可以广泛地翻译成120多种语言，但有一些需要注意的地方。

引用次数: 1

Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps 基于词汇和词频间隙填充的无监督域自适应稀疏检索

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-11-08 DOI: 10.48550/arXiv.2211.03988

Hiroki Iida, Naoaki Okazaki

IR models using a pretrained language model significantly outperform lexical approaches like BM25. In particular, SPLADE, which encodes texts to sparse vectors, is an effective model for practical use because it shows robustness to out-of-domain datasets. However, SPLADE still struggles with exact matching of low-frequency words in training data. In addition, domain shifts in vocabulary and word frequencies deteriorate the IR performance of SPLADE. Because supervision data are scarce in the target domain, addressing the domain shifts without supervision data is necessary. This paper proposes an unsupervised domain adaptation method by filling vocabulary and word-frequency gaps. First, we expand a vocabulary and execute continual pretraining with a masked language model on a corpus of the target domain. Then, we multiply SPLADE-encoded sparse vectors by inverse document frequency weights to consider the importance of documents with low-frequency words. We conducted experiments using our method on datasets with a large vocabulary gap from a source domain. We show that our method outperforms the present state-of-the-art domain adaptation method. In addition, our method achieves state-of-the-art results, combined with BM25.

使用预训练语言模型的IR模型明显优于BM25等词法方法。特别是SPLADE，它将文本编码为稀疏向量，是一个实际使用的有效模型，因为它对域外数据集具有鲁棒性。然而，SPLADE在训练数据中低频词的精确匹配方面仍然存在问题。此外，词汇和词频的域移位会降低SPLADE的红外性能。由于目标领域的监管数据是稀缺的，因此在没有监管数据的情况下解决领域转移是必要的。本文提出了一种通过填充词汇和词频间隙的无监督领域自适应方法。首先，我们扩展词汇表，并在目标领域的语料库上使用屏蔽语言模型进行持续的预训练。然后，我们将splade编码的稀疏向量乘以逆文档频率权值，以考虑含有低频词的文档的重要性。我们使用我们的方法在源域具有较大词汇差的数据集上进行了实验。我们表明，我们的方法优于目前最先进的领域自适应方法。此外，结合BM25，我们的方法达到了最先进的结果。

{"title":"Unsupervised Domain Adaptation for Sparse Retrieval by Filling Vocabulary and Word Frequency Gaps","authors":"Hiroki Iida, Naoaki Okazaki","doi":"10.48550/arXiv.2211.03988","DOIUrl":"https://doi.org/10.48550/arXiv.2211.03988","url":null,"abstract":"IR models using a pretrained language model significantly outperform lexical approaches like BM25. In particular, SPLADE, which encodes texts to sparse vectors, is an effective model for practical use because it shows robustness to out-of-domain datasets. However, SPLADE still struggles with exact matching of low-frequency words in training data. In addition, domain shifts in vocabulary and word frequencies deteriorate the IR performance of SPLADE. Because supervision data are scarce in the target domain, addressing the domain shifts without supervision data is necessary. This paper proposes an unsupervised domain adaptation method by filling vocabulary and word-frequency gaps. First, we expand a vocabulary and execute continual pretraining with a masked language model on a corpus of the target domain. Then, we multiply SPLADE-encoded sparse vectors by inverse document frequency weights to consider the importance of documents with low-frequency words. We conducted experiments using our method on datasets with a large vocabulary gap from a source domain. We show that our method outperforms the present state-of-the-art domain adaptation method. In addition, our method achieves state-of-the-art results, combined with BM25.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"37 1","pages":"752-765"},"PeriodicalIF":0.0,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90598204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Parsing linearizations appreciate PoS tags - but some are fussy about errors 解析线性化欣赏PoS标记——但有些人对错误很挑剔

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-27 DOI: 10.48550/arXiv.2210.15219

Alberto Muñoz-Ortiz, Mark Anderson, David Vilares, Carlos Gómez-Rodríguez

PoS tags, once taken for granted as a useful resource for syntactic parsing, have become more situational with the popularization of deep learning. Recent work on the impact of PoS tags on graph- and transition-based parsers suggests that they are only useful when tagging accuracy is prohibitively high, or in low-resource scenarios. However, such an analysis is lacking for the emerging sequence labeling parsing paradigm, where it is especially relevant as some models explicitly use PoS tags for encoding and decoding. We undertake a study and uncover some trends. Among them, PoS tags are generally more useful for sequence labeling parsers than for other paradigms, but the impact of their accuracy is highly encoding-dependent, with the PoS-based head-selection encoding being best only when both tagging accuracy and resource availability are high.

随着深度学习的普及，曾经被认为是语法分析的有用资源的PoS标签变得更加情境化。最近关于PoS标记对基于图和转换的解析器的影响的研究表明，它们仅在标记精度过高或资源不足的情况下才有用。然而，对于新兴的序列标记解析范式，这种分析是缺乏的，因为一些模型显式地使用PoS标记进行编码和解码，因此它特别相关。我们进行了一项研究，发现了一些趋势。其中，PoS标记对序列标记解析器的作用通常比其他范式更大，但其准确性的影响与编码高度相关，只有在标记精度和资源可用性都很高的情况下，基于PoS的头部选择编码才会达到最佳效果。

引用次数: 1

Outlier-Aware Training for Improving Group Accuracy Disparities 提高群体准确度差异的离群值感知训练

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-27 DOI: 10.48550/arXiv.2210.15183

Li-Kuang Chen, Canasai Kruengkrai, J. Yamagishi

Methods addressing spurious correlations such as Just Train Twice (JTT, Liu et al. 2021) involve reweighting a subset of the training set to maximize the worst-group accuracy. However, the reweighted set of examples may potentially contain unlearnable examples that hamper the model’s learning. We propose mitigating this by detecting outliers to the training set and removing them before reweighting. Our experiments show that our method achieves competitive or better accuracy compared with JTT and can detect and remove annotation errors in the subset being reweighted in JTT.

处理虚假相关性的方法，如Just Train Twice (JTT, Liu et al. 2021)涉及重新加权训练集的一个子集，以最大化最差组的准确性。然而，重新加权的示例集可能包含阻碍模型学习的不可学习示例。我们建议通过检测训练集的异常值并在重新加权之前删除它们来缓解这种情况。实验结果表明，该方法具有与JTT相当甚至更好的精度，可以检测和去除JTT中被重加权子集中的标注错误。

引用次数: 0

Performance-Efficiency Trade-Offs in Adapting Language Models to Text Classification Tasks 适应文本分类任务的语言模型的性能-效率权衡

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.12022

Laura Aina, Nikos Voskarides

Pre-trained language models (LMs) obtain state-of-the-art performance when adapted to text classification tasks. However, when using such models in real world applications, efficiency considerations are paramount. In this paper, we study how different training procedures that adapt LMs to text classification perform, as we vary model and train set size. More specifically, we compare standard fine-tuning, prompting, and knowledge distillation (KD) when the teacher was trained with either fine-tuning or prompting. Our findings suggest that even though fine-tuning and prompting work well to train large LMs on large train sets, there are more efficient alternatives that can reduce compute or data cost. Interestingly, we find that prompting combined with KD can reduce compute and data cost at the same time.

预训练语言模型(LMs)在适应文本分类任务时获得最先进的性能。然而，当在实际应用程序中使用此类模型时，效率考虑是至关重要的。在本文中，我们研究了当我们改变模型和训练集大小时，使LMs适应文本分类的不同训练过程是如何执行的。更具体地说，当教师接受微调或提示培训时，我们比较了标准微调、提示和知识蒸馏(KD)。我们的研究结果表明，尽管微调和提示可以很好地训练大型训练集上的大型LMs，但还有更有效的替代方法可以减少计算或数据成本。有趣的是，我们发现提示与KD相结合可以同时减少计算和数据成本。

引用次数: 0

Low-Resource Multilingual and Zero-Shot Multispeaker TTS 低资源多语言和零机会多语TTS

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.12223

Florian Lux, Julia Koch, Ngoc Thang Vu

While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world’s over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnostic meta learning (LAML) procedure and modifications to a TTS encoder, we show that it is possible for a system to learn speaking a new language using just 5 minutes of training data while retaining the ability to infer the voice of even unseen speakers in the newly learned language. We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker using objective metrics as well as human studies and provide our code and trained models open source.

虽然文本到语音(TTS)的神经方法在多说话者建模方面取得了巨大进步，即使在零射击设置中，这些方法所需的数据量对于世界上6000多种口语中的绝大多数来说通常是不可行的。在这项工作中，我们将零采样语音克隆和多语言低资源TTS任务结合在一起。使用语言不可知元学习(LAML)过程和对TTS编码器的修改，我们表明系统可以使用仅5分钟的训练数据学习说一门新语言，同时保留推断新学语言中甚至看不见的说话者声音的能力。我们使用客观指标和人类研究，在可理解性、自然性和与目标说话者的相似性方面展示了我们提出的方法的成功，并提供了我们的代码和训练模型的开源。

引用次数: 6

Modeling Document-level Temporal Structures for Building Temporal Dependency Graphs 为构建时间依赖图建模文档级时间结构

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.11787

Prafulla Kumar Choubey, Ruihong Huang

We propose to leverage news discourse profiling to model document-level temporal structures for building temporal dependency graphs. Our key observation is that the functional roles of sentences used for profiling news discourse signify different time frames relevant to a news story and can, therefore, help to recover the global temporal structure of a document. Our analyses and experiments with the widely used knowledge distillation technique show that discourse profiling effectively identifies distant inter-sentence event and (or) time expression pairs that are temporally related and otherwise difficult to locate.

我们建议利用新闻话语分析来建模文档级时间结构，以构建时间依赖图。我们的主要观察是，用于分析新闻话语的句子的功能角色表示与新闻故事相关的不同时间框架，因此可以帮助恢复文档的全局时间结构。我们对广泛使用的知识蒸馏技术的分析和实验表明，话语分析可以有效地识别远距离的句子间事件和(或)时间表达对，这些事件和(或)时间表达对在时间上是相关的，否则难以定位。

引用次数: 3

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

AACL Bioflux

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀