AMR-PT语料库与新闻和观点文本挑战句的语义注释

Marcio Lima Inácio, Marco Antonio Sobrevilla Cabezudo, Renata Ramisch, Ariani Di Felippo, Thiago Alexandre Salgueiro Pardo
{"title":"AMR-PT语料库与新闻和观点文本挑战句的语义注释","authors":"Marcio Lima Inácio, Marco Antonio Sobrevilla Cabezudo, Renata Ramisch, Ariani Di Felippo, Thiago Alexandre Salgueiro Pardo","doi":"10.1590/1678-460x202339355159","DOIUrl":null,"url":null,"abstract":"ABSTRACT One of the most popular semantic representation languages in Natural Language Processing (NLP) is Abstract Meaning Representation (AMR). This formalism encodes the meaning of single sentences in directed rooted graphs. For English, there is a large annotated corpus that provides qualitative and reusable data for building or improving existing NLP methods and applications. For building AMR corpora for non-English languages, including Brazilian Portuguese, automatic and manual strategies have been conducted. The automatic annotation methods are essentially based on the cross-linguistic alignment of parallel corpora and the inheritance of the AMR annotation. The manual strategies focus on adapting the AMR English guidelines to a target language. Both annotation strategies have to deal with some phenomena that are challenging. This paper explores in detail some characteristics of Portuguese for which the AMR model had to be adapted and introduces two annotated corpora: AMRNews, a corpus of 870 annotated sentences from journalistic texts, and OpiSums-PT-AMR, comprising 404 opinionated sentences in AMR.","PeriodicalId":35332,"journal":{"name":"DELTA Documentacao de Estudos em Linguistica Teorica e Aplicada","volume":"247 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The AMR-PT corpus and the semantic annotation of challenging sentences from journalistic and opinion texts\",\"authors\":\"Marcio Lima Inácio, Marco Antonio Sobrevilla Cabezudo, Renata Ramisch, Ariani Di Felippo, Thiago Alexandre Salgueiro Pardo\",\"doi\":\"10.1590/1678-460x202339355159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT One of the most popular semantic representation languages in Natural Language Processing (NLP) is Abstract Meaning Representation (AMR). This formalism encodes the meaning of single sentences in directed rooted graphs. For English, there is a large annotated corpus that provides qualitative and reusable data for building or improving existing NLP methods and applications. For building AMR corpora for non-English languages, including Brazilian Portuguese, automatic and manual strategies have been conducted. The automatic annotation methods are essentially based on the cross-linguistic alignment of parallel corpora and the inheritance of the AMR annotation. The manual strategies focus on adapting the AMR English guidelines to a target language. Both annotation strategies have to deal with some phenomena that are challenging. This paper explores in detail some characteristics of Portuguese for which the AMR model had to be adapted and introduces two annotated corpora: AMRNews, a corpus of 870 annotated sentences from journalistic texts, and OpiSums-PT-AMR, comprising 404 opinionated sentences in AMR.\",\"PeriodicalId\":35332,\"journal\":{\"name\":\"DELTA Documentacao de Estudos em Linguistica Teorica e Aplicada\",\"volume\":\"247 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"DELTA Documentacao de Estudos em Linguistica Teorica e Aplicada\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1590/1678-460x202339355159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"DELTA Documentacao de Estudos em Linguistica Teorica e Aplicada","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1590/1678-460x202339355159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 2

摘要

摘要自然语言处理(NLP)中最流行的语义表示语言之一是抽象意义表示(AMR)。这种形式将单句的意义编码在有向根图中。对于英语,有一个大型的带注释的语料库,为建立或改进现有的NLP方法和应用提供了定性和可重用的数据。对于非英语语言(包括巴西葡萄牙语)的AMR语料库的构建,采用了自动和手动策略。自动标注方法本质上是基于平行语料库的跨语言对齐和AMR标注的继承。手册策略侧重于使AMR英语指南适应目标语言。这两种注释策略都必须处理一些具有挑战性的现象。本文详细探讨了葡萄牙语AMR模型必须适应的一些特征,并介绍了两个注释语料库:AMRNews,一个来自新闻文本的870个注释句子的语料库,以及OpiSums-PT-AMR,包含AMR中404个固执己见的句子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The AMR-PT corpus and the semantic annotation of challenging sentences from journalistic and opinion texts
ABSTRACT One of the most popular semantic representation languages in Natural Language Processing (NLP) is Abstract Meaning Representation (AMR). This formalism encodes the meaning of single sentences in directed rooted graphs. For English, there is a large annotated corpus that provides qualitative and reusable data for building or improving existing NLP methods and applications. For building AMR corpora for non-English languages, including Brazilian Portuguese, automatic and manual strategies have been conducted. The automatic annotation methods are essentially based on the cross-linguistic alignment of parallel corpora and the inheritance of the AMR annotation. The manual strategies focus on adapting the AMR English guidelines to a target language. Both annotation strategies have to deal with some phenomena that are challenging. This paper explores in detail some characteristics of Portuguese for which the AMR model had to be adapted and introduces two annotated corpora: AMRNews, a corpus of 870 annotated sentences from journalistic texts, and OpiSums-PT-AMR, comprising 404 opinionated sentences in AMR.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
DELTA Documentacao de Estudos em Linguistica Teorica e Aplicada
DELTA Documentacao de Estudos em Linguistica Teorica e Aplicada Social Sciences-Linguistics and Language
CiteScore
0.40
自引率
0.00%
发文量
39
审稿时长
52 weeks
期刊介绍: The journal Documentação de Estudos em Lingüística Teórica e Aplicada - DELTA is published by the Pontifícia Universidade Católica de São Paulo / PUC-SP. DELTA has been published since 1985, and in 1992 it became a biannual publication. Editions are published in February and August. The journal is addressed to all areas of study concerning language and speech, whether theoretical or applied; however, only unpublished contributions will be considered. To briefly refer to the journal, the short title DELTA is recommended regarding bibliographies, footnotes, as well as bibliographical strips and references.
期刊最新文献
A Survey on Generic Impersonal Structures in Brazilian Portuguese O contato linguístico e a aquisição da linguagem: considerações sobre a produção variável da concordância de número no português brasileiro, Desenvolvimento da consciência morfológica no 1º ciclo Características grafofonéticas das vogais francesas na gramática o Mestre francez ou novo methodo para aprender a lingua franceza por meio da portugueza de Francisco Clamopin Durand A oralidade no processo de alfabetização: o debate (sempre e de novo) sobre como fazer convergir prática pedagógica e projeto formativo humanizador
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1