Extração de Relações utilizando Features Diferenciadas para Português

IF 0.3 Q4 LINGUISTICS Linguamatica Pub Date : 2014-12-26 DOI:10.21814/LM.6.2.182
Erick Nilsen Pereira de Souza, Daniela Barreio Claro
{"title":"Extração de Relações utilizando Features Diferenciadas para Português","authors":"Erick Nilsen Pereira de Souza, Daniela Barreio Claro","doi":"10.21814/LM.6.2.182","DOIUrl":null,"url":null,"abstract":"Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"55 3 1","pages":"57-65"},"PeriodicalIF":0.3000,"publicationDate":"2014-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/LM.6.2.182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 7

Abstract

Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
葡萄牙语中使用不同特征的关系提取
关系抽取(RE)是信息抽取(IE)的一项任务,负责发现非结构化文本中概念之间的语义关系。当提取不限于预定义的一组关系时,该任务称为开放关系提取(Open Relation extraction),其主要挑战是在已识别的关系中减少无效提取的比例。当前基于一组特定机器学习特征的方法消除了许多无效的提取。然而,这些解决方案的缺点是高度依赖于语言。考虑到每种语言的特殊性,这种依赖源于很难找到Open RE问题的最具代表性的特性集。在此背景下,本研究提出评估基于葡萄牙语开放关系提取特征的分类困难,旨在建立新的解决方案,减少该任务中的语言依赖性。结果表明,英语中的许多代表性特征不能直接映射到葡萄牙语中,并具有令人满意的分类优点。在评价的分类算法中,J48的F-measure值为84.1%,效果最好,其次是SVM(83.9%)、Perceptron(82.0%)和Naive Bayes(79.9%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Linguamatica
Linguamatica LINGUISTICS-
CiteScore
1.40
自引率
0.00%
发文量
4
审稿时长
6 weeks
期刊最新文献
A compilação e a análise de métricas textuais de um corpus de redações Classificação da qualidade da argumentação em tweets no domínio da política brasileira Extracção de Relações de Apoio e Oposição em Títulos de Notícias de Política em Português Pais, filhos e outras relações familiares no DIP DIP - Desafio de Identificação de Personagens: objectivo, organização, recursos e resultados
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1