{"title":"Processing European Portuguese Verbal Idioms: From the Lexicon-Grammar to a Rule-based Parser","authors":"Ana Galvão, J. Baptista, N. Mamede","doi":"10.26615/978-2-9701095-6-3_009","DOIUrl":null,"url":null,"abstract":"Processing verbal idioms is a challenging task for Natural Language Processing systems because they are syntactically analysable strings, with a wellformed structure, identical to that of distributionally free sentences, but whose meaning is for the most part non-compositional. This paper presents recent advances in processing European Portuguese verbal idioms. From a lexicon-grammar matrix, containing +2,500 verbal idioms and +100 (structural, distributional and transformational) properties, parsing rules are automatically generated, within the framework of a rule-based incremental parser. They are then integrated in STRING, a fully-fledged natural language processing system for Portuguese. The system now identifies not only the idioms’ base forms, but also the sentences resulting from some productive and very general transformations (passive, pronominalisation), admitted by some of these idioms. Other improvements include: a newly developed lexicon-grammar validator, a new generation module for transformations’ examples, and a new, more granular, evaluation module. An intrinsic evaluation achieves an overall recall of 92.5%.","PeriodicalId":259759,"journal":{"name":"Proceedings of the Third International Conference, Europhras 2019, Computational and Corpus-Based Phraseology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third International Conference, Europhras 2019, Computational and Corpus-Based Phraseology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-2-9701095-6-3_009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Processing verbal idioms is a challenging task for Natural Language Processing systems because they are syntactically analysable strings, with a wellformed structure, identical to that of distributionally free sentences, but whose meaning is for the most part non-compositional. This paper presents recent advances in processing European Portuguese verbal idioms. From a lexicon-grammar matrix, containing +2,500 verbal idioms and +100 (structural, distributional and transformational) properties, parsing rules are automatically generated, within the framework of a rule-based incremental parser. They are then integrated in STRING, a fully-fledged natural language processing system for Portuguese. The system now identifies not only the idioms’ base forms, but also the sentences resulting from some productive and very general transformations (passive, pronominalisation), admitted by some of these idioms. Other improvements include: a newly developed lexicon-grammar validator, a new generation module for transformations’ examples, and a new, more granular, evaluation module. An intrinsic evaluation achieves an overall recall of 92.5%.