首页 > 最新文献

Proceedings of COLING. International Conference on Computational Linguistics最新文献

英文 中文
Coalescing Global and Local Information for Procedural Text Understanding 过程性文本理解的全局和局部信息合并
Pub Date : 2022-08-26 DOI: 10.48550/arXiv.2208.12848
Kaixin Ma, Filip Ilievski, Jonathan M Francis, Eric Nyberg, A. Oltramari
Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. We identify three core aspects required for modeling this task, namely the local and global view of the inputs, as well as the global view of outputs. Prior methods have considered a subset of these aspects, which leads to either low precision or low recall. In this paper, we propose a new model Coalescing Global and Local Information (CGLI), which builds entity- and timestep-aware input representations (local input) considering the whole context (global input), and we jointly model the entity states with a structured prediction objective (global output). Thus, CGLI simultaneously optimizes for both precision and recall. Moreover, we extend CGLI with additional output layers and integrate it into a story reasoning framework. Extensive experiments on a popular procedural text understanding dataset show that our model achieves state-of-the-art results, while experiments on a story reasoning benchmark show the positive impact of our model on downstream reasoning.
程序文本理解是一项具有挑战性的语言推理任务,它要求模型在叙事的发展过程中跟踪实体状态。我们确定了建模此任务所需的三个核心方面,即输入的局部和全局视图,以及输出的全局视图。先前的方法只考虑了这些方面的一个子集,这导致精度低或召回率低。在本文中,我们提出了一种新的全局和局部信息合并模型(CGLI),该模型考虑整个上下文(全局输入),构建实体感知和时间步长感知的输入表示(局部输入),并通过结构化的预测目标(全局输出)共同建模实体状态。因此,CGLI同时优化了准确率和召回率。此外,我们用额外的输出层扩展CGLI,并将其集成到故事推理框架中。在一个流行的程序文本理解数据集上的大量实验表明,我们的模型达到了最先进的结果,而在故事推理基准上的实验表明,我们的模型对下游推理有积极的影响。
{"title":"Coalescing Global and Local Information for Procedural Text Understanding","authors":"Kaixin Ma, Filip Ilievski, Jonathan M Francis, Eric Nyberg, A. Oltramari","doi":"10.48550/arXiv.2208.12848","DOIUrl":"https://doi.org/10.48550/arXiv.2208.12848","url":null,"abstract":"Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. We identify three core aspects required for modeling this task, namely the local and global view of the inputs, as well as the global view of outputs. Prior methods have considered a subset of these aspects, which leads to either low precision or low recall. In this paper, we propose a new model Coalescing Global and Local Information (CGLI), which builds entity- and timestep-aware input representations (local input) considering the whole context (global input), and we jointly model the entity states with a structured prediction objective (global output). Thus, CGLI simultaneously optimizes for both precision and recall. Moreover, we extend CGLI with additional output layers and integrate it into a story reasoning framework. Extensive experiments on a popular procedural text understanding dataset show that our model achieves state-of-the-art results, while experiments on a story reasoning benchmark show the positive impact of our model on downstream reasoning.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"11 1","pages":"1534-1545"},"PeriodicalIF":0.0,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84488328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
DPTDR: Deep Prompt Tuning for Dense Passage Retrieval 深度提示调优密集通道检索
Pub Date : 2022-08-24 DOI: 10.48550/arXiv.2208.11503
Zhen-Quan Tang, Benyou Wang, Ting Yao
Deep prompt tuning (DPT) has gained great success in most natural language processing (NLP) tasks. However, it is not well-investigated in dense retrieval where fine-tuning (FT) still dominates. When deploying multiple retrieval tasks using the same backbone model (e.g., RoBERTa), FT-based methods are unfriendly in terms of deployment cost: each new retrieval model needs to repeatedly deploy the backbone model without reuse. To reduce the deployment cost in such a scenario, this work investigates applying DPT in dense retrieval. The challenge is that directly applying DPT in dense retrieval largely underperforms FT methods. To compensate for the performance drop, we propose two model-agnostic and task-agnostic strategies for DPT-based retrievers, namely retrieval-oriented intermediate pretraining and unified negative mining, as a general approach that could be compatible with any pre-trained language model and retrieval task. The experimental results show that the proposed method (called DPTDR) outperforms previous state-of-the-art models on both MS-MARCO and Natural Questions. We also conduct ablation studies to examine the effectiveness of each strategy in DPTDR. We believe this work facilitates the industry, as it saves enormous efforts and costs of deployment and increases the utility of computing resources. Our code is available at https://github.com/tangzhy/DPTDR.
深度提示调优(Deep prompt tuning, DPT)在大多数自然语言处理(NLP)任务中取得了巨大的成功。然而,在精细调整(FT)仍然占主导地位的密集检索中,尚未得到很好的研究。当使用相同的骨干模型(例如RoBERTa)部署多个检索任务时,基于ft的方法在部署成本方面是不友好的:每个新的检索模型都需要重复部署骨干模型,而不能重用。为了降低这种情况下的部署成本,本文研究了DPT在密集检索中的应用。挑战在于直接将DPT应用于密集检索在很大程度上不如FT方法。为了弥补性能的下降,我们提出了两种模型不可知和任务不可知的基于dpt的检索器策略,即面向检索的中间预训练和统一负挖掘,作为一种可以兼容任何预训练语言模型和检索任务的通用方法。实验结果表明,所提出的方法(称为DPTDR)在MS-MARCO和Natural Questions上都优于先前最先进的模型。我们还进行消融研究,以检查每种策略在DPTDR中的有效性。我们相信这项工作有利于整个行业,因为它节省了大量的工作和部署成本,并提高了计算资源的效用。我们的代码可在https://github.com/tangzhy/DPTDR上获得。
{"title":"DPTDR: Deep Prompt Tuning for Dense Passage Retrieval","authors":"Zhen-Quan Tang, Benyou Wang, Ting Yao","doi":"10.48550/arXiv.2208.11503","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11503","url":null,"abstract":"Deep prompt tuning (DPT) has gained great success in most natural language processing (NLP) tasks. However, it is not well-investigated in dense retrieval where fine-tuning (FT) still dominates. When deploying multiple retrieval tasks using the same backbone model (e.g., RoBERTa), FT-based methods are unfriendly in terms of deployment cost: each new retrieval model needs to repeatedly deploy the backbone model without reuse. To reduce the deployment cost in such a scenario, this work investigates applying DPT in dense retrieval. The challenge is that directly applying DPT in dense retrieval largely underperforms FT methods. To compensate for the performance drop, we propose two model-agnostic and task-agnostic strategies for DPT-based retrievers, namely retrieval-oriented intermediate pretraining and unified negative mining, as a general approach that could be compatible with any pre-trained language model and retrieval task. The experimental results show that the proposed method (called DPTDR) outperforms previous state-of-the-art models on both MS-MARCO and Natural Questions. We also conduct ablation studies to examine the effectiveness of each strategy in DPTDR. We believe this work facilitates the industry, as it saves enormous efforts and costs of deployment and increases the utility of computing resources. Our code is available at https://github.com/tangzhy/DPTDR.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"14 1","pages":"1193-1202"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85215768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation 人的标准与自动度量:故事生成评价的基准
Pub Date : 2022-08-24 DOI: 10.48550/arXiv.2208.11646
Cyril Chhun, Pierre Colombo, C. Clavel, Fabian M. Suchanek
Research on Automatic Story Generation (ASG) relies heavily on human and automatic evaluation. However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them. In this paper, we propose to re-evaluate ASG evaluation. We introduce a set of 6 orthogonal and comprehensive human criteria, carefully motivated by the social sciences literature. We also present HANNA, an annotated dataset of 1,056 stories produced by 10 different ASG systems. HANNA allows us to quantitatively evaluate the correlations of 72 automatic metrics with human criteria. Our analysis highlights the weaknesses of current metrics for ASG and allows us to formulate practical recommendations for ASG evaluation.
自动故事生成(ASG)的研究很大程度上依赖于人工和自动评估。然而,对于使用哪种人类评估标准没有达成共识,也没有分析自动标准与它们之间的关系有多好。在本文中,我们建议重新评估ASG的评价。我们引入了一套6个正交和全面的人类标准,精心激发了社会科学文献。我们还介绍了HANNA,这是一个由10个不同的ASG系统生成的1,056个故事的注释数据集。HANNA允许我们定量地评估72个自动指标与人类标准的相关性。我们的分析突出了当前ASG指标的弱点,并使我们能够为ASG评估制定实用的建议。
{"title":"Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation","authors":"Cyril Chhun, Pierre Colombo, C. Clavel, Fabian M. Suchanek","doi":"10.48550/arXiv.2208.11646","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11646","url":null,"abstract":"Research on Automatic Story Generation (ASG) relies heavily on human and automatic evaluation. However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them. In this paper, we propose to re-evaluate ASG evaluation. We introduce a set of 6 orthogonal and comprehensive human criteria, carefully motivated by the social sciences literature. We also present HANNA, an annotated dataset of 1,056 stories produced by 10 different ASG systems. HANNA allows us to quantitatively evaluate the correlations of 72 automatic metrics with human criteria. Our analysis highlights the weaknesses of current metrics for ASG and allows us to formulate practical recommendations for ASG evaluation.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"27 1","pages":"5794-5836"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86403613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition FactMix:使用几个域内标记示例推广到跨域命名实体识别
Pub Date : 2022-08-24 DOI: 10.48550/arXiv.2208.11464
Linyi Yang, Lifan Yuan, Leyang Cui, Wen Gao, Yue Zhang
Few-shot Named Entity Recognition (NER) is imperative for entity tagging in limited resource domains and thus received proper attention in recent years. Existing approaches for few-shot NER are evaluated mainly under in-domain settings. In contrast, little is known about how these inherently faithful models perform in cross-domain NER using a few labeled in-domain examples. This paper proposes a two-step rationale-centric data augmentation method to improve the model’s generalization ability. Results on several datasets show that our model-agnostic method significantly improves the performance of cross-domain NER tasks compared to previous state-of-the-art methods compared to the counterfactual data augmentation and prompt-tuning methods.
摘要少射命名实体识别(NER)是有限资源领域中实体标注的必要手段,近年来受到了广泛的关注。现有的少弹NER方法主要是在域内设置下进行评价的。相比之下,很少有人知道这些固有的忠实模型如何在跨域NER中使用一些标记的域内示例。为了提高模型的泛化能力,本文提出了一种以理性为中心的两步数据增强方法。在多个数据集上的结果表明,与之前最先进的方法相比,与反事实数据增强和提示调优方法相比,我们的模型不可知方法显著提高了跨域NER任务的性能。
{"title":"FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition","authors":"Linyi Yang, Lifan Yuan, Leyang Cui, Wen Gao, Yue Zhang","doi":"10.48550/arXiv.2208.11464","DOIUrl":"https://doi.org/10.48550/arXiv.2208.11464","url":null,"abstract":"Few-shot Named Entity Recognition (NER) is imperative for entity tagging in limited resource domains and thus received proper attention in recent years. Existing approaches for few-shot NER are evaluated mainly under in-domain settings. In contrast, little is known about how these inherently faithful models perform in cross-domain NER using a few labeled in-domain examples. This paper proposes a two-step rationale-centric data augmentation method to improve the model’s generalization ability. Results on several datasets show that our model-agnostic method significantly improves the performance of cross-domain NER tasks compared to previous state-of-the-art methods compared to the counterfactual data augmentation and prompt-tuning methods.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"2 1","pages":"5360-5371"},"PeriodicalIF":0.0,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87386843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Few-shot Table-to-text Generation with Prefix-Controlled Generator 使用前缀控制的生成器生成少量表格到文本
Pub Date : 2022-08-23 DOI: 10.48550/arXiv.2208.10709
Yutao Luo, Menghua Lu, Gongshen Liu, Shilin Wang
Neural table-to-text generation approaches are data-hungry, limiting their adaption for low-resource real-world applications. Previous works mostly resort to Pre-trained Language Models (PLMs) to generate fluent summaries of a table. However, they often contain hallucinated contents due to the uncontrolled nature of PLMs. Moreover, the topological differences between tables and sequences are rarely studied. Last but not least, fine-tuning on PLMs with a handful of instances may lead to over-fitting and catastrophic forgetting. To alleviate these problems, we propose a prompt-based approach, Prefix-Controlled Generator (i.e., PCG), for few-shot table-to-text generation. We prepend a task-specific prefix for a PLM to make the table structure better fit the pre-trained input. In addition, we generate an input-specific prefix to control the factual contents and word order of the generated text. Both automatic and human evaluations on different domains (humans, books and songs) of the Wikibio dataset prove the effectiveness of our approach.
神经表到文本的生成方法需要大量数据,这限制了它们对低资源的实际应用程序的适应。以前的工作大多采用预训练语言模型(PLMs)来生成表的流畅摘要。然而,由于plm不受控制的性质,它们经常含有幻觉内容。此外,表和序列之间的拓扑差异很少被研究。最后但并非最不重要的是,对plm进行少量实例的微调可能会导致过度拟合和灾难性的遗忘。为了缓解这些问题,我们提出了一种基于提示的方法,前缀控制生成器(即PCG),用于少量表格到文本的生成。我们为PLM添加了一个特定于任务的前缀,以使表结构更好地适应预训练的输入。此外,我们生成一个特定于输入的前缀来控制生成文本的事实内容和词序。对维基生物数据集不同领域(人类、书籍和歌曲)的自动评估和人工评估都证明了我们方法的有效性。
{"title":"Few-shot Table-to-text Generation with Prefix-Controlled Generator","authors":"Yutao Luo, Menghua Lu, Gongshen Liu, Shilin Wang","doi":"10.48550/arXiv.2208.10709","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10709","url":null,"abstract":"Neural table-to-text generation approaches are data-hungry, limiting their adaption for low-resource real-world applications. Previous works mostly resort to Pre-trained Language Models (PLMs) to generate fluent summaries of a table. However, they often contain hallucinated contents due to the uncontrolled nature of PLMs. Moreover, the topological differences between tables and sequences are rarely studied. Last but not least, fine-tuning on PLMs with a handful of instances may lead to over-fitting and catastrophic forgetting. To alleviate these problems, we propose a prompt-based approach, Prefix-Controlled Generator (i.e., PCG), for few-shot table-to-text generation. We prepend a task-specific prefix for a PLM to make the table structure better fit the pre-trained input. In addition, we generate an input-specific prefix to control the factual contents and word order of the generated text. Both automatic and human evaluations on different domains (humans, books and songs) of the Wikibio dataset prove the effectiveness of our approach.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"7 1","pages":"6493-6504"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89925172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment 基于K-MHaS的韩语在线新闻评论多标签仇恨言论检测数据集
Pub Date : 2022-08-23 DOI: 10.48550/arXiv.2208.10684
Jean Lee, Taejun Lim, Hee-Youn Lee, Bogeun Jo, Yangsok Kim, Heegeun Yoon, S. Han
Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baselines on K-MHaS. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.
由于在线内容的增长,在线仇恨言论检测已成为一个重要问题,但英语以外的语言资源极其有限。我们引入了K-MHaS,这是一种新的多标签数据集,用于仇恨言论检测,可以有效地处理韩语模式。该数据集由来自新闻评论的109k个话语组成,使用1到4个标签进行多标签分类,并处理主观性和交叉性。我们评估K-MHaS的强基线。带有子字符标记器的KR-BERT优于其他工具,可以识别每个仇恨言论类别中的分解字符。
{"title":"K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment","authors":"Jean Lee, Taejun Lim, Hee-Youn Lee, Bogeun Jo, Yangsok Kim, Heegeun Yoon, S. Han","doi":"10.48550/arXiv.2208.10684","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10684","url":null,"abstract":"Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baselines on K-MHaS. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"29 1","pages":"3530-3538"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78707340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations 基于单词和字符表征对比学习的预训练语言模型
Pub Date : 2022-08-23 DOI: 10.48550/arXiv.2208.10844
Borun Chen, Hongyin Tang, Jingang Wang, Qifan Wang, Haitao Zheng, Wei Yu Wu, Liqian Yu
Pre-trained Language Models (PLMs) have achieved remarkable performance gains across numerous downstream tasks in natural language understanding. Various Chinese PLMs have been successively proposed for learning better Chinese language representation. However, most current models use Chinese characters as inputs and are not able to encode semantic information contained in Chinese words. While recent pre-trained models incorporate both words and characters simultaneously, they usually suffer from deficient semantic interactions and fail to capture the semantic relation between words and characters. To address the above issues, we propose a simple yet effective PLM CLOWER, which adopts the Contrastive Learning Over Word and charactER representations. In particular, CLOWER implicitly encodes the coarse-grained information (i.e., words) into the fine-grained representations (i.e., characters) through contrastive learning on multi-grained information. CLOWER is of great value in realistic scenarios since it can be easily incorporated into any existing fine-grained based PLMs without modifying the production pipelines. Extensive experiments conducted on a range of downstream tasks demonstrate the superior performance of CLOWER over several state-of-the-art baselines.
预训练语言模型(PLMs)在自然语言理解的众多下游任务中取得了显著的性能提升。为了更好地学习汉语语言表示,人们陆续提出了各种汉语plm。然而,目前大多数模型使用中文字符作为输入,无法对中文单词中的语义信息进行编码。虽然最近的预训练模型同时包含单词和字符,但它们通常缺乏语义交互,无法捕获单词和字符之间的语义关系。为了解决上述问题,我们提出了一个简单而有效的PLM cloer,它采用了单词和字符表示的对比学习。特别是,cloer通过对多粒度信息的对比学习,将粗粒度信息(即单词)隐式地编码为细粒度表示(即字符)。cloer在实际场景中非常有价值,因为它可以轻松地合并到任何现有的基于细粒度的plm中,而无需修改生产管道。在一系列下游任务上进行的广泛实验表明,cloer在几个最先进的基线上具有优越的性能。
{"title":"CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations","authors":"Borun Chen, Hongyin Tang, Jingang Wang, Qifan Wang, Haitao Zheng, Wei Yu Wu, Liqian Yu","doi":"10.48550/arXiv.2208.10844","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10844","url":null,"abstract":"Pre-trained Language Models (PLMs) have achieved remarkable performance gains across numerous downstream tasks in natural language understanding. Various Chinese PLMs have been successively proposed for learning better Chinese language representation. However, most current models use Chinese characters as inputs and are not able to encode semantic information contained in Chinese words. While recent pre-trained models incorporate both words and characters simultaneously, they usually suffer from deficient semantic interactions and fail to capture the semantic relation between words and characters. To address the above issues, we propose a simple yet effective PLM CLOWER, which adopts the Contrastive Learning Over Word and charactER representations. In particular, CLOWER implicitly encodes the coarse-grained information (i.e., words) into the fine-grained representations (i.e., characters) through contrastive learning on multi-grained information. CLOWER is of great value in realistic scenarios since it can be easily incorporated into any existing fine-grained based PLMs without modifying the production pipelines. Extensive experiments conducted on a range of downstream tasks demonstrate the superior performance of CLOWER over several state-of-the-art baselines.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"93 1","pages":"3098-3108"},"PeriodicalIF":0.0,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82219980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit 一种新的上下文敏感的梵文复合型识别多任务学习方法
Pub Date : 2022-08-22 DOI: 10.48550/arXiv.2208.10310
Jivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, Tushar Sandhan, S. Samanta, L. Behera, Pawan Goyal
The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.
复合现象在梵语中无处不在。它的作用是达到表达思想的简洁,同时丰富语言的词汇和结构形式。在这项工作中,我们专注于梵语复合类型识别(SaCTI)任务,在该任务中,我们考虑了识别复合词组成部分之间语义关系的问题。早期的方法仅依赖于从组件获得的词汇信息,而忽略了对SaCTI有用的最重要的上下文和语法信息。然而,由于复合组件之间隐式编码的上下文敏感语义关系,SaCTI任务具有挑战性。因此,我们提出了一种新的多任务学习架构,该架构以形态标注和依存句法分析为辅助任务,融合语境信息并丰富互补句法信息。在SaCTI基准数据集上的实验显示,与最先进的系统相比,该系统的绝对增益为6.1分(精度)和7.7分(F1-score)。此外,我们的多语言实验证明了该架构在英语和马拉地语中的有效性。
{"title":"A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit","authors":"Jivnesh Sandhan, Ashish Gupta, Hrishikesh Terdalkar, Tushar Sandhan, S. Samanta, L. Behera, Pawan Goyal","doi":"10.48550/arXiv.2208.10310","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10310","url":null,"abstract":"The phenomenon of compounding is ubiquitous in Sanskrit. It serves for achieving brevity in expressing thoughts, while simultaneously enriching the lexical and structural formation of the language. In this work, we focus on the Sanskrit Compound Type Identification (SaCTI) task, where we consider the problem of identifying semantic relations between the components of a compound word. Earlier approaches solely rely on the lexical information obtained from the components and ignore the most crucial contextual and syntactic information useful for SaCTI. However, the SaCTI task is challenging primarily due to the implicitly encoded context-sensitive semantic relation between the compound components. Thus, we propose a novel multi-task learning architecture which incorporates the contextual information and enriches the complementary syntactic information using morphological tagging and dependency parsing as two auxiliary tasks. Experiments on the benchmark datasets for SaCTI show 6.1 points (Accuracy) and 7.7 points (F1-score) absolute gain compared to the state-of-the-art system. Further, our multi-lingual experiments demonstrate the efficacy of the proposed architecture in English and Marathi languages.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"8 1","pages":"4071-4083"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75227189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Neuro-Symbolic Visual Dialog 神经符号视觉对话
Pub Date : 2022-08-22 DOI: 10.48550/arXiv.2208.10353
Adnen Abdessaied, Mihai Bâce, A. Bulling
We propose Neuro-Symbolic Visual Dialog (NSVD) —the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog—a relative improvement of more than 10% over the state of the art—while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure round, are more robust against incomplete dialog histories, and generalise better not only to dialogs that are up to three times longer than those seen during training but also to unseen question types and scenes.
我们提出了神经符号视觉对话(NSVD),这是第一种将深度学习和符号程序执行相结合的方法,用于多轮视觉基础推理。NSVD在视觉对话固有的两个关键挑战上显著优于现有的纯连接主义方法:远距离共同参考分辨率和消失的问答性能。我们通过提出一个更现实和更严格的评估方案来证明后者,在该方案中,我们在计算准确性时使用完整对话历史的预测答案。我们描述了我们模型的两个变体,并表明使用这个新方案,我们的最佳模型在clevr - dialog上达到99.72%的准确率,比目前的水平提高了10%以上,而只需要一小部分训练数据。此外,我们证明了我们的神经符号模型具有更高的平均第一次失败回合,对不完整的对话历史更具鲁棒性,并且不仅可以更好地推广到比训练期间看到的对话长三倍的对话,而且还可以更好地推广到未见过的问题类型和场景。
{"title":"Neuro-Symbolic Visual Dialog","authors":"Adnen Abdessaied, Mihai Bâce, A. Bulling","doi":"10.48550/arXiv.2208.10353","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10353","url":null,"abstract":"We propose Neuro-Symbolic Visual Dialog (NSVD) —the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog—a relative improvement of more than 10% over the state of the art—while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure round, are more robust against incomplete dialog histories, and generalise better not only to dialogs that are up to three times longer than those seen during training but also to unseen question types and scenes.","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"51 1","pages":"192-217"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90743265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis 基于异构图卷积网络的文档布局分析
Pub Date : 2022-08-22 DOI: 10.48550/arXiv.2208.10970
Siwen Luo, Yi Ding, Siqu Long, S. Han, Josiah Poon
Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on visual cues to understand documents while ignoring other information, such as contextual information or the relationships between document layout components, which are vital to boost better layout analysis performance. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We construct different graphs to capture the four main features aspects of document layout components, including syntactic, semantic, density, and appearance features. Then, we apply graph convolutional networks to enhance each aspect of features and apply the node-level pooling for integration. Finally, we concatenate features of all aspects and feed them into the 2-layer MLPs for document layout component classification. Our Doc-GCN achieves state-of-the-art results on three widely used DLA datasets: PubLayNet, FUNSD, and DocBank. The code will be released at https://github.com/adlnlp/doc_gcn
在将文档解析为用于下游应用程序的结构化、机器可读格式时,识别非结构化数字文档的布局至关重要。最近的文档布局分析研究通常依赖于视觉线索来理解文档,而忽略了其他信息,例如上下文信息或文档布局组件之间的关系,这些信息对于提高更好的布局分析性能至关重要。Doc-GCN为文档布局分析提供了一种协调和集成异构方面的有效方法。我们构建不同的图来捕获文档布局组件的四个主要特征,包括语法、语义、密度和外观特征。然后,我们应用图卷积网络增强特征的各个方面,并应用节点级池化进行集成。最后,我们将所有方面的特征连接起来,并将它们输入到2层mlp中,用于文档布局组件分类。我们的Doc-GCN在三个广泛使用的数据集(PubLayNet、fundd和DocBank)上实现了最先进的结果。代码将在https://github.com/adlnlp/doc_gcn上发布
{"title":"Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis","authors":"Siwen Luo, Yi Ding, Siqu Long, S. Han, Josiah Poon","doi":"10.48550/arXiv.2208.10970","DOIUrl":"https://doi.org/10.48550/arXiv.2208.10970","url":null,"abstract":"Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on visual cues to understand documents while ignoring other information, such as contextual information or the relationships between document layout components, which are vital to boost better layout analysis performance. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We construct different graphs to capture the four main features aspects of document layout components, including syntactic, semantic, density, and appearance features. Then, we apply graph convolutional networks to enhance each aspect of features and apply the node-level pooling for integration. Finally, we concatenate features of all aspects and feed them into the 2-layer MLPs for document layout component classification. Our Doc-GCN achieves state-of-the-art results on three widely used DLA datasets: PubLayNet, FUNSD, and DocBank. The code will be released at https://github.com/adlnlp/doc_gcn","PeriodicalId":91381,"journal":{"name":"Proceedings of COLING. International Conference on Computational Linguistics","volume":"110 1","pages":"2906-2916"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80725668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of COLING. International Conference on Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1