Extração de Relações utilizando Features Diferenciadas para Português

IF 0.1 Q4 LINGUISTICS Linguamatica Pub Date : 2014-12-26 DOI:10.21814/LM.6.2.182

Erick Nilsen Pereira de Souza, Daniela Barreio Claro

{"title":"Extração de Relações utilizando Features Diferenciadas para Português","authors":"Erick Nilsen Pereira de Souza, Daniela Barreio Claro","doi":"10.21814/LM.6.2.182","DOIUrl":null,"url":null,"abstract":"Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"55 3 1","pages":"57-65"},"PeriodicalIF":0.1000,"publicationDate":"2014-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/LM.6.2.182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 7

Abstract

Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

葡萄牙语中使用不同特征的关系提取

关系抽取(RE)是信息抽取(IE)的一项任务，负责发现非结构化文本中概念之间的语义关系。当提取不限于预定义的一组关系时，该任务称为开放关系提取(Open Relation extraction)，其主要挑战是在已识别的关系中减少无效提取的比例。当前基于一组特定机器学习特征的方法消除了许多无效的提取。然而，这些解决方案的缺点是高度依赖于语言。考虑到每种语言的特殊性，这种依赖源于很难找到Open RE问题的最具代表性的特性集。在此背景下，本研究提出评估基于葡萄牙语开放关系提取特征的分类困难，旨在建立新的解决方案，减少该任务中的语言依赖性。结果表明，英语中的许多代表性特征不能直接映射到葡萄牙语中，并具有令人满意的分类优点。在评价的分类算法中，J48的F-measure值为84.1%，效果最好，其次是SVM(83.9%)、Perceptron(82.0%)和Naive Bayes(79.9%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊