首页 > 最新文献

Recent Advances in Natural Language Processing最新文献

英文 中文
OlloBot - Towards A Text-Based Arabic Health Conversational Agent: Evaluation and Results OlloBot -迈向基于文本的阿拉伯语健康会话代理:评估和结果
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_034
Ahmed Fadhil, Ahmed Ghassan Tawfiq AbuRa'ed
We introduce OlloBot, an Arabic conversational agent that assists physicians and supports patients with the care process. It doesn’t replace the physicians, instead provides health tracking and support and assists physicians with the care delivery through a conversation medium. The current model comprises healthy diet, physical activity, mental health, in addition to food logging. Not only OlloBot tracks user daily food, it also offers useful tips for healthier living. We will discuss the design, development and testing of OlloBot, and highlight the findings and limitations arose from the testing.
我们介绍OlloBot,一个阿拉伯语会话代理,帮助医生和支持病人的护理过程。它不会取代医生,而是提供健康跟踪和支持,并通过对话媒介协助医生提供护理服务。目前的模式除了食物记录外,还包括健康饮食、身体活动、心理健康。OlloBot不仅跟踪用户的日常饮食,还提供有用的健康生活建议。我们将讨论OlloBot的设计、开发和测试,并强调测试中的发现和局限性。
{"title":"OlloBot - Towards A Text-Based Arabic Health Conversational Agent: Evaluation and Results","authors":"Ahmed Fadhil, Ahmed Ghassan Tawfiq AbuRa'ed","doi":"10.26615/978-954-452-056-4_034","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_034","url":null,"abstract":"We introduce OlloBot, an Arabic conversational agent that assists physicians and supports patients with the care process. It doesn’t replace the physicians, instead provides health tracking and support and assists physicians with the care delivery through a conversation medium. The current model comprises healthy diet, physical activity, mental health, in addition to food logging. Not only OlloBot tracks user daily food, it also offers useful tips for healthier living. We will discuss the design, development and testing of OlloBot, and highlight the findings and limitations arose from the testing.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125607839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Turkish Tweet Classification with Transformer Encoder 土耳其推文分类与变压器编码器
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_158
Atif Emre Yüksel, Yacsar Alim Türkmen, Arzucan Özgür, B. Altinel
Short-text classification is a challenging task, due to the sparsity and high dimensionality of the feature space. In this study, we aim to analyze and classify Turkish tweets based on their topics. Social media jargon and the agglutinative structure of the Turkish language makes this classification task even harder. As far as we know, this is the first study that uses a Transformer Encoder for short text classification in Turkish. The model is trained in a weakly supervised manner, where the training data set has been labeled automatically. Our results on the test set, which has been manually labeled, show that performing morphological analysis improves the classification performance of the traditional machine learning algorithms Random Forest, Naive Bayes, and Support Vector Machines. Still, the proposed approach achieves an F-score of 89.3 % outperforming those algorithms by at least 5 points.
由于特征空间的稀疏性和高维性,短文本分类是一项具有挑战性的任务。在这项研究中,我们的目标是根据土耳其语的主题对其进行分析和分类。社交媒体术语和土耳其语的粘合结构使得分类任务更加困难。据我们所知,这是第一个使用Transformer Encoder进行土耳其语短文本分类的研究。该模型以弱监督的方式进行训练,其中训练数据集已自动标记。我们在人工标记的测试集上的结果表明,进行形态分析可以提高传统机器学习算法随机森林、朴素贝叶斯和支持向量机的分类性能。尽管如此,所提出的方法达到了89.3%的f分,比那些算法至少高出5分。
{"title":"Turkish Tweet Classification with Transformer Encoder","authors":"Atif Emre Yüksel, Yacsar Alim Türkmen, Arzucan Özgür, B. Altinel","doi":"10.26615/978-954-452-056-4_158","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_158","url":null,"abstract":"Short-text classification is a challenging task, due to the sparsity and high dimensionality of the feature space. In this study, we aim to analyze and classify Turkish tweets based on their topics. Social media jargon and the agglutinative structure of the Turkish language makes this classification task even harder. As far as we know, this is the first study that uses a Transformer Encoder for short text classification in Turkish. The model is trained in a weakly supervised manner, where the training data set has been labeled automatically. Our results on the test set, which has been manually labeled, show that performing morphological analysis improves the classification performance of the traditional machine learning algorithms Random Forest, Naive Bayes, and Support Vector Machines. Still, the proposed approach achieves an F-score of 89.3 % outperforming those algorithms by at least 5 points.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126028595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Improving Named Entity Linking Corpora Quality 提高命名实体链接语料库质量
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_152
A. Weichselbraun, Adrian M. P. Braşoveanu, P. Kuntschik, L. Nixon
Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.
金标准语料库和竞争性评估在命名实体链接(NEL)性能的基准测试和推动更复杂的NEL系统的开发中发挥着关键作用。在这个过程中,使用的语料库的质量和使用的评估指标至关重要。因此,我们评估了三个流行的评价语料库的质量,确定了影响这些金标准的四个主要问题:(i)使用不同的注释风格,(ii)不正确和缺失的注释,(iii)知识库的演变,(iv)注释共现的差异。本文通过形式化NEL注释和语料库版本控制来解决这些问题,这允许标准化语料库创建,支持语料库进化,并为使用透镜在不同语料库配置之间自动转换铺平了道路。此外,使用明确定义的评分规则和评价指标确保了评价结果更好的可比性。
{"title":"Improving Named Entity Linking Corpora Quality","authors":"A. Weichselbraun, Adrian M. P. Braşoveanu, P. Kuntschik, L. Nixon","doi":"10.26615/978-954-452-056-4_152","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_152","url":null,"abstract":"Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126925755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Automatic Question Answering for Medical MCQs: Can It go Further than Information Retrieval? 医学mcq的自动问答:它能比信息检索更进一步吗?
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_049
L. Ha, Victoria Yaneva
We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require that the training data come from the same source as the questions. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.
我们提出了一种新的自动问答方法,它不依赖于信息检索(IR)系统的性能,也不要求训练数据与问题来自相同的来源。我们通过一组具有挑战性的大学水平的医学选择题来评估系统的性能。当神经方法与红外方法相结合时,两者都是独立工作的,从而达到最佳性能。与以前的方法不同,该系统在随机猜测基线上取得了统计上显著的改进,即使是基于基线解算者的表现而被标记为具有挑战性的问题。
{"title":"Automatic Question Answering for Medical MCQs: Can It go Further than Information Retrieval?","authors":"L. Ha, Victoria Yaneva","doi":"10.26615/978-954-452-056-4_049","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_049","url":null,"abstract":"We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require that the training data come from the same source as the questions. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Study on Unsupervised Statistical Machine Translation for Backtranslation 面向反翻译的无监督统计机器翻译研究
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_068
Anush Kumar, Nihal V. Nayak, Aditya Chandra, Mydhili K. Nair
Machine Translation systems have drastically improved over the years for several language pairs. Monolingual data is often used to generate synthetic sentences to augment the training data which has shown to improve the performance of machine translation models. In our paper, we make use of an Unsupervised Statistical Machine Translation (USMT) to generate synthetic sentences. Our study compares the performance improvements in Neural Machine Translation model when using synthetic sentences from supervised and unsupervised Machine Translation models. Our approach of using USMT for backtranslation shows promise in low resource conditions and achieves an improvement of 3.2 BLEU score over the Neural Machine Translation model.
多年来,机器翻译系统在一些语言对方面取得了巨大的进步。单语数据通常用于生成合成句子,以增强训练数据,从而提高机器翻译模型的性能。在我们的论文中,我们使用无监督统计机器翻译(USMT)来生成合成句子。我们的研究比较了神经机器翻译模型在使用有监督和无监督机器翻译模型合成句子时的性能改进。我们使用USMT进行反翻译的方法在低资源条件下显示出希望,并且比神经机器翻译模型实现了3.2 BLEU分数的改进。
{"title":"Study on Unsupervised Statistical Machine Translation for Backtranslation","authors":"Anush Kumar, Nihal V. Nayak, Aditya Chandra, Mydhili K. Nair","doi":"10.26615/978-954-452-056-4_068","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_068","url":null,"abstract":"Machine Translation systems have drastically improved over the years for several language pairs. Monolingual data is often used to generate synthetic sentences to augment the training data which has shown to improve the performance of machine translation models. In our paper, we make use of an Unsupervised Statistical Machine Translation (USMT) to generate synthetic sentences. Our study compares the performance improvements in Neural Machine Translation model when using synthetic sentences from supervised and unsupervised Machine Translation models. Our approach of using USMT for backtranslation shows promise in low resource conditions and achieves an improvement of 3.2 BLEU score over the Neural Machine Translation model.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123804033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Terminology Translation in Statistical and Neural Machine Translation: A Case Study on English-to-Hindi and Hindi-to-English 统计与神经机器翻译中的术语翻译研究——以英语到印地语和印地语到英语为例
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_052
Rejwanul Haque, Mohammed Hasanuzzaman, Andy Way
Terminology translation plays a critical role in domain-specific machine translation (MT). In this paper, we conduct a comparative qualitative evaluation on terminology translation in phrase-based statistical MT (PB-SMT) and neural MT (NMT) in two translation directions: English-to-Hindi and Hindi-to-English. For this, we select a test set from a legal domain corpus and create a gold standard for evaluating terminology translation in MT. We also propose an error typology taking the terminology translation errors into consideration. We evaluate the MT systems’ performance on terminology translation, and demonstrate our findings, unraveling strengths, weaknesses, and similarities of PB-SMT and NMT in the area of term translation.
术语翻译在特定领域机器翻译中起着至关重要的作用。在本文中,我们对基于短语的统计机器翻译(PB-SMT)和神经机器翻译(NMT)在英语到印地语和印地语到英语两个翻译方向上的术语翻译进行了比较定性评价。为此,我们从法律领域语料库中选择了一个测试集,并创建了一个评估MT中术语翻译的黄金标准。我们还提出了一个考虑术语翻译错误的错误类型学。我们评估了机器翻译系统在术语翻译方面的表现,并展示了我们的发现,揭示了PB-SMT和NMT在术语翻译领域的优势、劣势和相似之处。
{"title":"Investigating Terminology Translation in Statistical and Neural Machine Translation: A Case Study on English-to-Hindi and Hindi-to-English","authors":"Rejwanul Haque, Mohammed Hasanuzzaman, Andy Way","doi":"10.26615/978-954-452-056-4_052","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_052","url":null,"abstract":"Terminology translation plays a critical role in domain-specific machine translation (MT). In this paper, we conduct a comparative qualitative evaluation on terminology translation in phrase-based statistical MT (PB-SMT) and neural MT (NMT) in two translation directions: English-to-Hindi and Hindi-to-English. For this, we select a test set from a legal domain corpus and create a gold standard for evaluating terminology translation in MT. We also propose an error typology taking the terminology translation errors into consideration. We evaluate the MT systems’ performance on terminology translation, and demonstrate our findings, unraveling strengths, weaknesses, and similarities of PB-SMT and NMT in the area of term translation.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122817760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Unsupervised dialogue intent detection via hierarchical topic model 基于分层主题模型的无监督对话意图检测
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_108
Artem Popov, V. Bulatov, Darya Polyudova, Eugenia Veselova
One of the challenges during a task-oriented chatbot development is the scarce availability of the labeled training data. The best way of getting one is to ask the assessors to tag each dialogue according to its intent. Unfortunately, performing labeling without any provisional collection structure is difficult since the very notion of the intent is ill-defined. In this paper, we propose a hierarchical multimodal regularized topic model to obtain a first approximation of the intent set. Our rationale for hierarchical models usage is their ability to take into account several degrees of the dialogues relevancy. We attempt to build a model that can distinguish between subject-based (e.g. medicine and transport topics) and action-based (e.g. filing of an application and tracking application status) similarities. In order to achieve this, we divide set of all features into several groups according to part-of-speech analysis. Various feature groups are treated differently on different hierarchy levels.
在面向任务的聊天机器人开发过程中面临的挑战之一是标记训练数据的稀缺可用性。获得对话的最佳方法是要求评估人员根据其意图标记每个对话。不幸的是,在没有任何临时集合结构的情况下执行标记是困难的,因为意图的概念是不明确的。在本文中,我们提出了一个分层的多模态正则化主题模型来获得意图集的第一近似。我们使用分层模型的基本原理是它们能够考虑到对话的几个程度的相关性。我们试图建立一个模型,可以区分基于主题(例如医学和交通主题)和基于操作(例如提交申请和跟踪申请状态)的相似性。为了实现这一点,我们根据词性分析将所有特征的集合分成几组。在不同的层次结构级别上,不同的特性组被区别对待。
{"title":"Unsupervised dialogue intent detection via hierarchical topic model","authors":"Artem Popov, V. Bulatov, Darya Polyudova, Eugenia Veselova","doi":"10.26615/978-954-452-056-4_108","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_108","url":null,"abstract":"One of the challenges during a task-oriented chatbot development is the scarce availability of the labeled training data. The best way of getting one is to ask the assessors to tag each dialogue according to its intent. Unfortunately, performing labeling without any provisional collection structure is difficult since the very notion of the intent is ill-defined. In this paper, we propose a hierarchical multimodal regularized topic model to obtain a first approximation of the intent set. Our rationale for hierarchical models usage is their ability to take into account several degrees of the dialogues relevancy. We attempt to build a model that can distinguish between subject-based (e.g. medicine and transport topics) and action-based (e.g. filing of an application and tracking application status) similarities. In order to achieve this, we divide set of all features into several groups according to part-of-speech analysis. Various feature groups are treated differently on different hierarchy levels.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131575643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Universal System for Automatic Text-to-Phonetics Conversion 一种通用的文本-语音自动转换系统
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_042
C. Gafni
This paper describes an automatic text-to-phonetics conversion system. The system was constructed to primarily serve as a research tool. It is implemented in a general-purpose linguistic software, which allows it to be incorporated in a multifaceted linguistic research in essentially any language. The system currently relies on two mechanisms to generate phonetic transcriptions from texts: (i) importing ready-made phonetic word forms from external dictionaries, and (ii) automatic generation of phonetic word forms based on a set of deterministic linguistic rules. The current paper describes the proposed system and its potential application to linguistic research.
本文介绍了一种自动文本语音转换系统。该系统的构建主要是作为一种研究工具。它是在一个通用的语言软件中实现的,这使得它可以在任何语言中进行多方面的语言研究。该系统目前依靠两种机制从文本中生成音标:(i)从外部字典中导入现成的音标词形式,以及(ii)基于一组确定性语言规则自动生成音标词形式。本文描述了该系统及其在语言学研究中的潜在应用。
{"title":"A Universal System for Automatic Text-to-Phonetics Conversion","authors":"C. Gafni","doi":"10.26615/978-954-452-056-4_042","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_042","url":null,"abstract":"This paper describes an automatic text-to-phonetics conversion system. The system was constructed to primarily serve as a research tool. It is implemented in a general-purpose linguistic software, which allows it to be incorporated in a multifaceted linguistic research in essentially any language. The system currently relies on two mechanisms to generate phonetic transcriptions from texts: (i) importing ready-made phonetic word forms from external dictionaries, and (ii) automatic generation of phonetic word forms based on a set of deterministic linguistic rules. The current paper describes the proposed system and its potential application to linguistic research.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128036035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Approach to Enhancing WordNet with Conceptual Frame Semantics 概念框架语义增强WordNet的结构方法
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_074
S. Leseva, I. Stoyanova
This paper outlines procedures for enhancing WordNet with conceptual information from FrameNet. The mapping of the two resources is non-trivial. We define a number of techniques for the validation of the consistency of the mapping and the extension of its coverage which make use of the structure of both resources and the systematic relations between synsets in WordNet and between frames in FrameNet, as well as between synsets and frames). We present a case study on causativity, a relation which provides enhancement complementary to the one using hierarchical relations, by means of linking in a systematic way large parts of the lexicon. We show how consistency checks and denser relations may be implemented on the basis of this relation. We, then, propose new frames based on causative-inchoative correspondences and in conclusion touch on the possibilities for defining new frames based on the types of specialisation that takes place from parent to child synset.
本文概述了利用框架网的概念信息增强WordNet的过程。这两个资源的映射是非平凡的。我们定义了一些技术来验证映射的一致性和扩展其覆盖范围,这些技术利用了资源的结构和WordNet中synsets之间、FrameNet中框架之间以及synsets和框架之间的系统关系。我们提出了一个因果关系的案例研究,因果关系通过系统地连接词典的大部分,为使用层次关系的因果关系提供了增强的补充。我们展示了如何在此关系的基础上实现一致性检查和更密集的关系。然后,我们提出了基于因果对应的新框架,并在结论中触及了基于从父到子同义词集发生的专门化类型定义新框架的可能性。
{"title":"Structural Approach to Enhancing WordNet with Conceptual Frame Semantics","authors":"S. Leseva, I. Stoyanova","doi":"10.26615/978-954-452-056-4_074","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_074","url":null,"abstract":"This paper outlines procedures for enhancing WordNet with conceptual information from FrameNet. The mapping of the two resources is non-trivial. We define a number of techniques for the validation of the consistency of the mapping and the extension of its coverage which make use of the structure of both resources and the systematic relations between synsets in WordNet and between frames in FrameNet, as well as between synsets and frames). We present a case study on causativity, a relation which provides enhancement complementary to the one using hierarchical relations, by means of linking in a systematic way large parts of the lexicon. We show how consistency checks and denser relations may be implemented on the basis of this relation. We, then, propose new frames based on causative-inchoative correspondences and in conclusion touch on the possibilities for defining new frames based on the types of specialisation that takes place from parent to child synset.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133928431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic Detection of Translation Direction 自动检测翻译方向
Pub Date : 2019-10-22 DOI: 10.26615/978-954-452-056-4_130
Ilia Sominsky, S. Wintner
Parallel corpora are crucial resources for NLP applications, most notably for machine translation. The direction of the (human) translation of parallel corpora has been shown to have significant implications for the quality of statistical machine translation systems that are trained with such corpora. We describe a method for determining the direction of the (manual) translation of parallel corpora at the sentence-pair level. Using several linguistically-motivated features, coupled with a neural network model, we obtain high accuracy on several language pairs. Furthermore, we demonstrate that the accuracy is correlated with the (typological) distance between the two languages.
平行语料库是自然语言处理应用的重要资源,尤其是机器翻译。平行语料库的(人类)翻译方向已被证明对使用此类语料库训练的统计机器翻译系统的质量具有重要意义。我们描述了一种在句子对层面上确定平行语料库(人工)翻译方向的方法。利用多种语言驱动特征,结合神经网络模型,我们在几种语言对上获得了较高的准确率。此外,我们还证明了准确率与两种语言之间的(类型学)距离有关。
{"title":"Automatic Detection of Translation Direction","authors":"Ilia Sominsky, S. Wintner","doi":"10.26615/978-954-452-056-4_130","DOIUrl":"https://doi.org/10.26615/978-954-452-056-4_130","url":null,"abstract":"Parallel corpora are crucial resources for NLP applications, most notably for machine translation. The direction of the (human) translation of parallel corpora has been shown to have significant implications for the quality of statistical machine translation systems that are trained with such corpora. We describe a method for determining the direction of the (manual) translation of parallel corpora at the sentence-pair level. Using several linguistically-motivated features, coupled with a neural network model, we obtain high accuracy on several language pairs. Furthermore, we demonstrate that the accuracy is correlated with the (typological) distance between the two languages.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122112670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Recent Advances in Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1