首页 > 最新文献

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing最新文献

英文 中文
Syntactically Robust Training on Partially-Observed Data for Open Information Extraction 面向开放信息抽取的部分观测数据的句法鲁棒训练
Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu
Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for validating the robustness of the models. Experiments including a thorough analysis show that the performance of the model degrades with the increase of the difference in syntactic distribution, while our framework gives a robust boundary. The source code is publicly available at https://github.com/qijimrc/RobustOIE.
开放信息抽取模型在充分监督的情况下显示出良好的效果。然而,这些模型面临着一个根本性的挑战,即训练数据的语法分布与现实世界相比是部分可观察的。在本文中,我们提出了一个语法健壮的训练框架,使模型能够在基于不同释义生成的语法丰富的分布上进行训练。为了解决释义过程中知识变形的固有问题,采用语义相似度匹配和句法树行走两种算法对转化后的知识进行还原。训练框架一般可以应用于其他语法部分可观察领域。基于所提出的框架,我们构建了一个新的评估集,称为CaRB-AutoPara,这是一个语法多样化的数据集,与现实世界的设置一致,用于验证模型的鲁棒性。实验表明,模型的性能随着句法分布差异的增加而下降,而我们的框架给出了一个鲁棒边界。源代码可在https://github.com/qijimrc/RobustOIE上公开获得。
{"title":"Syntactically Robust Training on Partially-Observed Data for Open Information Extraction","authors":"Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu","doi":"10.48550/arXiv.2301.06841","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06841","url":null,"abstract":"Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for validating the robustness of the models. Experiments including a thorough analysis show that the performance of the model degrades with the increase of the difference in syntactic distribution, while our framework gives a robust boundary. The source code is publicly available at https://github.com/qijimrc/RobustOIE.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"107 1","pages":"6245-6257"},"PeriodicalIF":0.0,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77449555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Opening up Minds with Argumentative Dialogues 用辩论式对话打开思维
Youmna Farag, C. Brand, Jacopo Amidei, P. Piwek, T. Stafford, Svetlana Stoyanchev, Andreas Vlachos
Recent research on argumentative dialogues has focused on persuading people to take some action, changing their stance on the topic of discussion, or winning debates. In this work, we focus on argumentative dialogues that aim to open up (rather than change) people's minds to help them become more understanding to views that are unfamiliar or in opposition to their own convictions. To this end, we present a dataset of 183 argumentative dialogues about 3 controversial topics: veganism, Brexit and COVID-19 vaccination. The dialogues were collected using the Wizard of Oz approach, where wizards leverage a knowledge-base of arguments to converse with participants. Open-mindedness is measured before and after engaging in the dialogue using a questionnaire from the psychology literature, and success of the dialogue is measured as the change in the participant's stance towards those who hold opinions different to theirs. We evaluate two dialogue models: a Wikipedia-based and an argument-based model. We show that while both models perform closely in terms of opening up minds, the argument-based model is significantly better on other dialogue properties such as engagement and clarity.
最近关于辩论对话的研究集中在说服人们采取一些行动,改变他们在讨论话题上的立场,或者赢得辩论。在这项工作中,我们专注于辩论性对话,旨在打开(而不是改变)人们的思想,帮助他们更加理解不熟悉或与自己信念相反的观点。为此,我们提出了一个关于3个有争议话题的183个辩论对话的数据集:素食主义、英国脱欧和COVID-19疫苗接种。对话是用《绿野仙踪》的方法收集的,向导利用知识库中的论点与参与者交谈。在参与对话之前和之后,使用心理学文献中的问卷来衡量思想的开放程度,而对话的成功是通过参与者对持不同意见的人的立场变化来衡量的。我们评估了两种对话模型:基于维基百科的模型和基于论证的模型。我们表明,虽然两种模型在开放思维方面表现密切,但基于论点的模型在其他对话属性(如参与度和清晰度)上明显更好。
{"title":"Opening up Minds with Argumentative Dialogues","authors":"Youmna Farag, C. Brand, Jacopo Amidei, P. Piwek, T. Stafford, Svetlana Stoyanchev, Andreas Vlachos","doi":"10.48550/arXiv.2301.06400","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06400","url":null,"abstract":"Recent research on argumentative dialogues has focused on persuading people to take some action, changing their stance on the topic of discussion, or winning debates. In this work, we focus on argumentative dialogues that aim to open up (rather than change) people's minds to help them become more understanding to views that are unfamiliar or in opposition to their own convictions. To this end, we present a dataset of 183 argumentative dialogues about 3 controversial topics: veganism, Brexit and COVID-19 vaccination. The dialogues were collected using the Wizard of Oz approach, where wizards leverage a knowledge-base of arguments to converse with participants. Open-mindedness is measured before and after engaging in the dialogue using a questionnaire from the psychology literature, and success of the dialogue is measured as the change in the participant's stance towards those who hold opinions different to theirs. We evaluate two dialogue models: a Wikipedia-based and an argument-based model. We show that while both models perform closely in terms of opening up minds, the argument-based model is significantly better on other dialogue properties such as engagement and clarity.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"38 1","pages":"4569-4582"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82817770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distinguish Sense from Nonsense: Out-of-Scope Detection for Virtual Assistants 区分意义与无意义:虚拟助手的范围外检测
Cheng Qian, Haode Qi, Gengyu Wang, L. Kunc, Saloni Potdar
Out of Scope (OOS) detection in Conversational AI solutions enables a chatbot to handle a conversation gracefully when it is unable to make sense of the end-user query. Accurately tagging a query as out-of-domain is particularly hard in scenarios when the chatbot is not equipped to handle a topic which has semantic overlap with an existing topic it is trained on. We propose a simple yet effective OOS detection method that outperforms standard OOS detection methods in a real-world deployment of virtual assistants. We discuss the various design and deployment considerations for a cloud platform solution to train virtual assistants and deploy them at scale. Additionally, we propose a collection of datasets that replicates real-world scenarios and show comprehensive results in various settings using both offline and online evaluation metrics.
会话AI解决方案中的超出范围(OOS)检测使聊天机器人能够在无法理解最终用户查询时优雅地处理对话。当聊天机器人不具备处理与其训练的现有主题有语义重叠的主题的能力时,准确地将查询标记为域外尤其困难。我们提出了一种简单而有效的OOS检测方法,在虚拟助手的实际部署中优于标准的OOS检测方法。我们讨论了云平台解决方案的各种设计和部署注意事项,以培训虚拟助手并大规模部署它们。此外,我们提出了一组数据集,这些数据集复制了现实世界的场景,并使用离线和在线评估指标在各种设置中显示了全面的结果。
{"title":"Distinguish Sense from Nonsense: Out-of-Scope Detection for Virtual Assistants","authors":"Cheng Qian, Haode Qi, Gengyu Wang, L. Kunc, Saloni Potdar","doi":"10.48550/arXiv.2301.06544","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06544","url":null,"abstract":"Out of Scope (OOS) detection in Conversational AI solutions enables a chatbot to handle a conversation gracefully when it is unable to make sense of the end-user query. Accurately tagging a query as out-of-domain is particularly hard in scenarios when the chatbot is not equipped to handle a topic which has semantic overlap with an existing topic it is trained on. We propose a simple yet effective OOS detection method that outperforms standard OOS detection methods in a real-world deployment of virtual assistants. We discuss the various design and deployment considerations for a cloud platform solution to train virtual assistants and deploy them at scale. Additionally, we propose a collection of datasets that replicates real-world scenarios and show comprehensive results in various settings using both offline and online evaluation metrics.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"20 1","pages":"502-511"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74390930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Active Learning for Abstractive Text Summarization 抽象文本摘要的主动学习
Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir E. Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, A. Panchenko, M. Burtsev, Artem Shelmanov
Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance. In information extraction and text classification, AL can reduce the amount of labor up to multiple times. Despite its potential for aiding expensive annotation, as far as we know, there were no effective AL query strategies for ATS. This stems from the fact that many AL strategies rely on uncertainty estimation, while as we show in our work, uncertain instances are usually noisy, and selecting them can degrade the model performance compared to passive annotation. We address this problem by proposing the first effective query strategy for AL in ATS based on diversity principles. We show that given a certain annotation budget, using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores. Additionally, we analyze the effect of self-learning and show that it can further increase the performance of the model.
为抽象文本摘要(ATS)构建人工管理的注释数据集是非常耗时和昂贵的,因为创建每个实例都需要人工注释者阅读长文档并编写较短的摘要,以保留原始文档传递的关键信息。主动学习(AL)是一种开发的技术,用于减少达到一定水平的机器学习模型性能所需的注释量。在信息提取和文本分类方面,人工智能可以将人工智能的工作量减少数倍。尽管它有可能帮助昂贵的注释,但据我们所知,还没有针对ATS的有效的ai查询策略。这源于许多人工智能策略依赖于不确定性估计的事实,而正如我们在工作中所示,不确定性实例通常是嘈杂的,与被动注释相比,选择它们会降低模型的性能。针对这一问题,我们提出了首个基于多样性原则的人工智能查询策略。我们表明,给定一定的注释预算,在人工智能注释中使用我们的策略有助于提高模型在ROUGE和一致性分数方面的性能。此外,我们分析了自学习的效果,表明它可以进一步提高模型的性能。
{"title":"Active Learning for Abstractive Text Summarization","authors":"Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir E. Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, A. Panchenko, M. Burtsev, Artem Shelmanov","doi":"10.48550/arXiv.2301.03252","DOIUrl":"https://doi.org/10.48550/arXiv.2301.03252","url":null,"abstract":"Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance. In information extraction and text classification, AL can reduce the amount of labor up to multiple times. Despite its potential for aiding expensive annotation, as far as we know, there were no effective AL query strategies for ATS. This stems from the fact that many AL strategies rely on uncertainty estimation, while as we show in our work, uncertain instances are usually noisy, and selecting them can degrade the model performance compared to passive annotation. We address this problem by proposing the first effective query strategy for AL in ATS based on diversity principles. We show that given a certain annotation budget, using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores. Additionally, we analyze the effect of self-learning and show that it can further increase the performance of the model.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"66 1","pages":"5128-5152"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87538738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mask-then-Fill: A Flexible and Effective Data Augmentation Framework for Event Extraction 掩码填充:一种灵活有效的事件提取数据增强框架
Jun Gao, Changlong Yu, Wei Wang, Huan Zhao, Ruifeng Xu
We present Mask-then-Fill, a flexible and effective data augmentation framework for event extraction. Our approach allows for more flexible manipulation of text and thus can generate more diverse data while keeping the original event structure unchanged as much as possible. Specifically, it first randomly masks out an adjunct sentence fragment and then infills a variable-length text span with a fine-tuned infilling model. The main advantage lies in that it can replace a fragment of arbitrary length in the text with another fragment of variable length, compared to the existing methods which can only replace a single word or a fixed-length fragment. On trigger and argument extraction tasks, the proposed framework is more effective than baseline methods and it demonstrates particularly strong results in the low-resource setting. Our further analysis shows that it achieves a good balance between diversity and distributional similarity.
我们提出了Mask-then-Fill,一个灵活有效的事件提取数据增强框架。我们的方法允许对文本进行更灵活的操作,因此可以在尽可能保持原始事件结构不变的情况下生成更多样化的数据。具体来说,它首先随机屏蔽一个附加句片段,然后用一个微调的填充模型填充一个可变长度的文本跨度。它的主要优点在于可以将文本中任意长度的片段替换为另一个可变长度的片段,而现有的方法只能替换单个单词或固定长度的片段。在触发器和参数提取任务上,所提出的框架比基线方法更有效,并且在低资源设置中显示出特别强的结果。我们进一步的分析表明,它在多样性和分布相似性之间取得了很好的平衡。
{"title":"Mask-then-Fill: A Flexible and Effective Data Augmentation Framework for Event Extraction","authors":"Jun Gao, Changlong Yu, Wei Wang, Huan Zhao, Ruifeng Xu","doi":"10.48550/arXiv.2301.02427","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02427","url":null,"abstract":"We present Mask-then-Fill, a flexible and effective data augmentation framework for event extraction. Our approach allows for more flexible manipulation of text and thus can generate more diverse data while keeping the original event structure unchanged as much as possible. Specifically, it first randomly masks out an adjunct sentence fragment and then infills a variable-length text span with a fine-tuned infilling model. The main advantage lies in that it can replace a fragment of arbitrary length in the text with another fragment of variable length, compared to the existing methods which can only replace a single word or a fixed-length fragment. On trigger and argument extraction tasks, the proposed framework is more effective than baseline methods and it demonstrates particularly strong results in the low-resource setting. Our further analysis shows that it achieves a good balance between diversity and distributional similarity.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"27 1","pages":"4537-4544"},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78844163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Facilitating Contrastive Learning of Discourse Relational Senses by Exploiting the Hierarchy of Sense Relations 利用感官关系层次促进语篇关系感官的对比学习
Wanqiu Long, B. Webber
Implicit discourse relation recognition is a challenging task that involves identifying the sense or senses that hold between two adjacent spans of text, in the absense of an explicit connective between them. In both PDTB-2 (prasad et al., 2008) and PDTB-3 (Webber et al., 2019), discourse relational senses are organized into a three-level hierarchy ranging from four broad top-level senses, to more specific senses below them. Most previous work on implicitf discourse relation recognition have used the sense hierarchy simply to indicate what sense labels were available. Here we do more — incorporating the sense hierarchy into the recognition process itself and using it to select the negative examples used in contrastive learning. With no additional effort, the approach achieves state-of-the-art performance on the task. Our code is released inhttps://github.com/wanqiulong 0923/Contrastive_IDRR.
内隐语篇关系识别是一项具有挑战性的任务,它涉及到在两个相邻的文本之间缺乏显性连接的情况下识别它们之间的一个或多个意义。在PDTB-2 (prasad et al., 2008)和PDTB-3 (Webber et al., 2019)中,话语关系感官被组织成一个三层层次结构,从四个广泛的顶层感官到更具体的底层感官。以往大多数关于内隐语篇关系识别的研究都是简单地使用语义层次来表示可用的语义标签。在这里,我们做了更多的工作——将感觉层次结构纳入识别过程本身,并用它来选择对比学习中使用的负面例子。无需额外的努力,该方法在任务上实现了最先进的性能。我们的代码发布在https://github.com/wanqiulong 0923/Contrastive_IDRR。
{"title":"Facilitating Contrastive Learning of Discourse Relational Senses by Exploiting the Hierarchy of Sense Relations","authors":"Wanqiu Long, B. Webber","doi":"10.48550/arXiv.2301.02724","DOIUrl":"https://doi.org/10.48550/arXiv.2301.02724","url":null,"abstract":"Implicit discourse relation recognition is a challenging task that involves identifying the sense or senses that hold between two adjacent spans of text, in the absense of an explicit connective between them. In both PDTB-2 (prasad et al., 2008) and PDTB-3 (Webber et al., 2019), discourse relational senses are organized into a three-level hierarchy ranging from four broad top-level senses, to more specific senses below them. Most previous work on implicitf discourse relation recognition have used the sense hierarchy simply to indicate what sense labels were available. Here we do more — incorporating the sense hierarchy into the recognition process itself and using it to select the negative examples used in contrastive learning. With no additional effort, the approach achieves state-of-the-art performance on the task. Our code is released inhttps://github.com/wanqiulong 0923/Contrastive_IDRR.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"44 1","pages":"10704-10716"},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85978335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings 基于词嵌入的性别偏差度量对频率的不良依赖
Francisco Valentini, Germán Rosati, D. Slezak, E. Altszyler
Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.
许多作品使用基于词嵌入的度量来量化文本中的社会偏见和刻板印象。近年来的研究发现,词嵌入可以捕获语义相似度,但可能受到词频的影响。在本研究中,我们使用基于词嵌入的偏见量化方法研究了频率在测量女性与男性性别偏见时的影响。我们发现带有负采样的Skip-gram和GloVe倾向于在高频词中检测到男性偏见,而GloVe倾向于在低频词中返回女性偏见。我们发现,当单词被随机洗牌时,这些行为仍然存在。这证明了在未洗牌的语料库中观察到的基于频率的效应源于度量的属性而不是单词关联。这种效应是虚假的,也是有问题的,因为偏差度量应该完全依赖于单词共现,而不是单个单词的频率。最后,我们将这些结果与基于点互信息的替代度量所获得的结果进行比较。我们发现这个指标并没有显示出对频率的明确依赖,尽管它在所有频率上都略微偏向男性。
{"title":"The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings","authors":"Francisco Valentini, Germán Rosati, D. Slezak, E. Altszyler","doi":"10.48550/arXiv.2301.00792","DOIUrl":"https://doi.org/10.48550/arXiv.2301.00792","url":null,"abstract":"Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"62 5 1","pages":"5086-5092"},"PeriodicalIF":0.0,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90697042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving Complex Knowledge Base Question Answering via Question-to-Action and Question-to-Question Alignment 通过问题到行动和问题到问题对齐改进复杂知识库的问题回答
Yechun Tang, Xiaoxia Cheng, Weiming Lu
Complex knowledge base question answering can be achieved by converting questions into sequences of predefined actions. However, there is a significant semantic and structural gap between natural language and action sequences, which makes this conversion difficult. In this paper, we introduce an alignment-enhanced complex question answering framework, called ALCQA, which mitigates this gap through question-to-action alignment and question-to-question alignment. We train a question rewriting model to align the question and each action, and utilize a pretrained language model to implicitly align the question and KG artifacts. Moreover, considering that similar questions correspond to similar action sequences, we retrieve top-k similar question-answer pairs at the inference stage through question-to-question alignment and propose a novel reward-guided action sequence selection strategy to select from candidate action sequences. We conduct experiments on CQA and WQSP datasets, and the results show that our approach outperforms state-of-the-art methods and obtains a 9.88% improvements in the F1 metric on CQA dataset. Our source code is available at https://github.com/TTTTTTTTy/ALCQA.
复杂的知识库问答可以通过将问题转换为预定义的动作序列来实现。然而,在自然语言和动作序列之间存在显著的语义和结构差距,这使得这种转换变得困难。在本文中,我们引入了一个对齐增强的复杂问答框架,称为ALCQA,它通过问题到行动的对齐和问题到问题的对齐来缓解这种差距。我们训练了一个问题重写模型来对齐问题和每个动作,并利用预训练的语言模型来隐式对齐问题和KG工件。此外,考虑到相似的问题对应相似的动作序列,我们在推理阶段通过问题对问题对齐来检索top-k个相似的问答对,并提出了一种新的奖励引导的动作序列选择策略来从候选动作序列中进行选择。我们在CQA和WQSP数据集上进行了实验,结果表明我们的方法优于目前最先进的方法,在CQA数据集上的F1度量提高了9.88%。我们的源代码可从https://github.com/TTTTTTTTy/ALCQA获得。
{"title":"Improving Complex Knowledge Base Question Answering via Question-to-Action and Question-to-Question Alignment","authors":"Yechun Tang, Xiaoxia Cheng, Weiming Lu","doi":"10.48550/arXiv.2212.13036","DOIUrl":"https://doi.org/10.48550/arXiv.2212.13036","url":null,"abstract":"Complex knowledge base question answering can be achieved by converting questions into sequences of predefined actions. However, there is a significant semantic and structural gap between natural language and action sequences, which makes this conversion difficult. In this paper, we introduce an alignment-enhanced complex question answering framework, called ALCQA, which mitigates this gap through question-to-action alignment and question-to-question alignment. We train a question rewriting model to align the question and each action, and utilize a pretrained language model to implicitly align the question and KG artifacts. Moreover, considering that similar questions correspond to similar action sequences, we retrieve top-k similar question-answer pairs at the inference stage through question-to-question alignment and propose a novel reward-guided action sequence selection strategy to select from candidate action sequences. We conduct experiments on CQA and WQSP datasets, and the results show that our approach outperforms state-of-the-art methods and obtains a 9.88% improvements in the F1 metric on CQA dataset. Our source code is available at https://github.com/TTTTTTTTy/ALCQA.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"52 1","pages":"137-147"},"PeriodicalIF":0.0,"publicationDate":"2022-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84959619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TextBox 2.0: A Text Generation Library with Pre-trained Language Models TextBox 2.0:一个带有预训练语言模型的文本生成库
Tianyi Tang, Junyi Li, Z. Chen, Yiwen Hu, Zhuohao Yu, Wen-Dao Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, J. Nie, Ji-rong Wen
To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
为了促进文本生成的研究,本文提出了一个全面统一的库TextBox 2.0,重点关注预训练语言模型(plm)的使用。为了全面,我们的库涵盖了13美元的常见文本生成任务及其相应的83美元数据集,并进一步纳入了45美元的plm,包括通用plm、翻译plm、中文plm、对话plm、可控plm、蒸馏plm、提示plm和轻量级plm。我们还实施了4美元的高效培训策略,并为从头开始的新plm预培训提供了4美元的生成目标。为了实现统一,我们设计了支持整个研究流程(从数据加载到培训和评估)的接口,确保每个步骤都可以统一完成。尽管功能丰富,但通过友好的Python API或命令行,很容易使用我们的库。为了验证我们的图书馆的有效性,我们进行了广泛的实验,并举例说明了四种类型的研究场景。该项目发布在链接:https://github.com/RUCAIBox/TextBox。
{"title":"TextBox 2.0: A Text Generation Library with Pre-trained Language Models","authors":"Tianyi Tang, Junyi Li, Z. Chen, Yiwen Hu, Zhuohao Yu, Wen-Dao Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, J. Nie, Ji-rong Wen","doi":"10.48550/arXiv.2212.13005","DOIUrl":"https://doi.org/10.48550/arXiv.2212.13005","url":null,"abstract":"To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"452 1","pages":"435-444"},"PeriodicalIF":0.0,"publicationDate":"2022-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76493273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension 结构化的对话摘要以促进对话理解
Borui Wang, Chengcheng Feng, Arjun Nair, Madelyn Mao, Jai Desai, Asli Celikyilmaz, Haoran Li, Yashar Mehdad, Dragomir R. Radev
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system’s performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization (STRUDEL) - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. In contrast to the holistic approach taken by the traditional free-form abstractive summarization task for dialogues, STRUDEL aims to decompose and imitate the hierarchical, systematic and structured mental process that we human beings usually go through when understanding and analyzing dialogues, and thus has the advantage of being more focused, specific and instructive for dialogue comprehension models to learn from. We further introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension ability. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we demonstrate that our STRUDEL dialogue comprehension models can significantly improve the dialogue comprehension performance of transformer encoder language models.
摘要抽象对话摘要一直被认为是自然语言处理中一个重要的独立任务,但目前还没有研究表明抽象对话摘要是否也可以作为一种手段来提高NLP系统在其他重要对话理解任务上的表现。在本文中,我们提出了一种新型的对话摘要任务——结构化对话摘要(STRUDEL),它可以帮助预训练的语言模型更好地理解对话,并提高它们在重要对话理解任务上的表现。与传统的自由形式抽象的对话总结任务所采取的整体方法相比,STRUDEL旨在分解和模仿我们人类在理解和分析对话时所经历的层次化、系统化和结构化的心理过程,从而具有更集中、更具体和更有指导意义的优势,可供对话理解模型学习。我们进一步引入了一个新的STRUDEL对话理解建模框架,该框架将STRUDEL集成到转换器编码器语言模型的对话推理模块中,以提高其对话理解能力。在两个重要的下游对话理解任务——对话问答和对话响应预测的实证实验中,我们证明了我们的STRUDEL对话理解模型可以显著提高变压器编码器语言模型的对话理解性能。
{"title":"STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension","authors":"Borui Wang, Chengcheng Feng, Arjun Nair, Madelyn Mao, Jai Desai, Asli Celikyilmaz, Haoran Li, Yashar Mehdad, Dragomir R. Radev","doi":"10.48550/arXiv.2212.12652","DOIUrl":"https://doi.org/10.48550/arXiv.2212.12652","url":null,"abstract":"Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system’s performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization (STRUDEL) - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. In contrast to the holistic approach taken by the traditional free-form abstractive summarization task for dialogues, STRUDEL aims to decompose and imitate the hierarchical, systematic and structured mental process that we human beings usually go through when understanding and analyzing dialogues, and thus has the advantage of being more focused, specific and instructive for dialogue comprehension models to learn from. We further introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension ability. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we demonstrate that our STRUDEL dialogue comprehension models can significantly improve the dialogue comprehension performance of transformer encoder language models.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"160 3 1","pages":"4949-4958"},"PeriodicalIF":0.0,"publicationDate":"2022-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83262235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1