首页 > 最新文献

Linguamatica最新文献

英文 中文
Hacia una clasificación verbal automática para el español: estudio sobre la relevancia de los diferentes tipos y configuraciones de información sintáctico-semántica 西班牙语的自动言语分类:句法语义信息不同类型和配置的相关性研究
IF 0.6 Q4 LINGUISTICS Pub Date : 2015-07-31 DOI: 10.21814/LM.7.1.202
Lara Gil-Vallejo, I. Castellón, Marta Coll-Florit, J. Turmo
En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.
在这篇文章中,我们研究了西班牙语自动语言分类的习得。为此,我们对语料库中的20种语言感觉进行了一系列实验。我们使用包含不同语言信息的不同类型的属性和聚类层次聚类方法来生成各种分类。我们将每一种自动分类与考虑理论语言学提出的语言结构的半自动黄金标准进行比较。这种比较使我们知道哪些属性最适合自动创建一个符合结构理论的分类,以及自动语言分类和基于语言结构理论的分类之间的异同。
{"title":"Hacia una clasificación verbal automática para el español: estudio sobre la relevancia de los diferentes tipos y configuraciones de información sintáctico-semántica","authors":"Lara Gil-Vallejo, I. Castellón, Marta Coll-Florit, J. Turmo","doi":"10.21814/LM.7.1.202","DOIUrl":"https://doi.org/10.21814/LM.7.1.202","url":null,"abstract":"En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"7 1","pages":"41-52"},"PeriodicalIF":0.6,"publicationDate":"2015-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geração de Linguagem Natural para Conversão de Dados em Texto - Aplicação a um Assistente de Medicação para o Português 生成用于数据转换为文本的自然语言-葡萄牙语药物助理的应用
IF 0.6 Q4 LINGUISTICS Pub Date : 2015-07-31 DOI: 10.21814/LM.7.1.206
J. C. Pereira, A. Teixeira
New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to deploy modules to translate the internal information of the applications into sentences or texts, in order to display it on screen or synthesize it. Also, these modules must generate phrases and texts in the user's native language; the development should not require considerable resources; and the outcome of the generation should achieve a good degree of variability. Our main objective is to propose, implement and evaluate a method of data conversion to Portuguese which can be developed with a minimum of time and knowledge, but without compromising the necessary variability and quality of what is generated. The developed system, for a Medication Assistant, is intended to create descriptions, in natural language, of medication to be taken. Motivated by recent results, we opted for an approach based on machine translation, with models trained on a small parallel corpus. For that, a new corpus was created. With it, two variants of the system were trained: phrase-based translation and syntax-based translation. The two variants were evaluated by automatic measurements -- BLEU and Meteor -- and by humans. The results showed that a phrase-based approach produced better results than a syntax-based one: human evaluators evaluated 60% of phrase-based responses as good, or very good, compared to only 46% of syntax-based responses. Considering the corpus size, we judge this value (60%) as good.
智能手机和平板电脑等新设备正在改变人机交互。这些设备带来了一些挑战,特别是由于它们的小屏幕和键盘。为了在多模态交互中使用文本和语音,必须部署模块将应用程序的内部信息翻译成句子或文本,以便在屏幕上显示或合成。此外,这些模块必须生成用户母语的短语和文本;发展不应需要大量资源;生成的结果应该达到良好的可变性程度。我们的主要目标是提出、实施和评估一种将数据转换为葡萄牙语的方法,这种方法可以用最少的时间和知识开发,但不会影响生成的数据的必要可变性和质量。为药物助理开发的系统旨在用自然语言创建要服用的药物的描述。受最近研究结果的启发,我们选择了一种基于机器翻译的方法,并在一个小的并行语料库上训练模型。为此,我们创建了一个新的语料库。有了它,系统的两个变体被训练:基于短语的翻译和基于语法的翻译。这两种变体是通过自动测量(BLEU和Meteor)和人工进行评估的。结果表明,基于短语的方法比基于语法的方法产生更好的结果:人类评估者将60%的基于短语的回答评为好或非常好,而基于语法的回答只有46%。考虑到语料库的大小,我们认为这个值(60%)是好的。
{"title":"Geração de Linguagem Natural para Conversão de Dados em Texto - Aplicação a um Assistente de Medicação para o Português","authors":"J. C. Pereira, A. Teixeira","doi":"10.21814/LM.7.1.206","DOIUrl":"https://doi.org/10.21814/LM.7.1.206","url":null,"abstract":"New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to deploy modules to translate the internal information of the applications into sentences or texts, in order to display it on screen or synthesize it. Also, these modules must generate phrases and texts in the user's native language; the development should not require considerable resources; and the outcome of the generation should achieve a good degree of variability. Our main objective is to propose, implement and evaluate a method of data conversion to Portuguese which can be developed with a minimum of time and knowledge, but without compromising the necessary variability and quality of what is generated. The developed system, for a Medication Assistant, is intended to create descriptions, in natural language, of medication to be taken. Motivated by recent results, we opted for an approach based on machine translation, with models trained on a small parallel corpus. For that, a new corpus was created. With it, two variants of the system were trained: phrase-based translation and syntax-based translation. The two variants were evaluated by automatic measurements -- BLEU and Meteor -- and by humans. The results showed that a phrase-based approach produced better results than a syntax-based one: human evaluators evaluated 60% of phrase-based responses as good, or very good, compared to only 46% of syntax-based responses. Considering the corpus size, we judge this value (60%) as good.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"7 1","pages":"3-21"},"PeriodicalIF":0.6,"publicationDate":"2015-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A arquitetura de um glossário terminológico Inglês-Português na área de Eletrotécnica 电气技术领域的英语-葡萄牙语术语表的架构
IF 0.6 Q4 LINGUISTICS Pub Date : 2015-07-31 DOI: 10.21814/LM.7.1.204
S. Fadanelli, M. J. B. Finatto
This article describes some of the procedures for the execution of an online English-Portuguese glossary prototype in Eletrical Engineering / Eletrotechnical Field terminology – aimed mainly at beginner students from technical and graduation courses in Electrical Engineering. The methodology is comprised of a corpus of datasheets, documents often used by professionals of the Electrical Engineering area, and the comparison of data obtained from these datasheets with the data gathered from 108 students of Electrical courses. Results point to the relevance of considering the point of view of our target audience to build the glossary properly.
本文描述了在电气工程/电工领域术语中执行在线英语-葡萄牙语词汇表原型的一些过程-主要针对电气工程技术和毕业课程的初学者。该方法由数据表的语料库组成,这些数据表是电气工程领域专业人员经常使用的文件,并将从这些数据表中获得的数据与从108名电气课程学生收集的数据进行比较。结果表明,考虑目标受众的观点与正确构建术语表是相关的。
{"title":"A arquitetura de um glossário terminológico Inglês-Português na área de Eletrotécnica","authors":"S. Fadanelli, M. J. B. Finatto","doi":"10.21814/LM.7.1.204","DOIUrl":"https://doi.org/10.21814/LM.7.1.204","url":null,"abstract":"This article describes some of the procedures for the execution of an online English-Portuguese glossary prototype in Eletrical Engineering / Eletrotechnical Field terminology – aimed mainly at beginner students from technical and graduation courses in Electrical Engineering. The methodology is comprised of a corpus of datasheets, documents often used by professionals of the Electrical Engineering area, and the comparison of data obtained from these datasheets with the data gathered from 108 students of Electrical courses. Results point to the relevance of considering the point of view of our target audience to build the glossary properly.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"65 1","pages":"67-71"},"PeriodicalIF":0.6,"publicationDate":"2015-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uma Comparação Sistemática de Diferentes Abordagens para a Sumarização Automática Extrativa de Textos em Português 葡萄牙语文本自动提取摘要不同方法的系统比较
IF 0.6 Q4 LINGUISTICS Pub Date : 2015-07-31 DOI: 10.21814/LM.7.1.203
M. Costa, Bruno Martins
Automatic document summarization is the task of automatically generating condensed versions of source texts, presenting itself as one of the fundamental problems in the areas of Information Retrieval and Natural Language Processing. In this paper, different extractive approaches are compared in the task of summarizing individual documents corresponding to journalistic texts written in Portuguese. Through the use of the ROUGE package for measuring the quality of the produced summaries, we report on results for two different experimental domains, involving (i) the generation of headlines for news articles written in European Portuguese, and (ii) the generation of summaries for news articles written in Brazilian Portuguese. The results demonstrate that methods based on the selection of the first sentences have the best results  when building extractive news headlines in terms of several ROUGE metrics. Regarding the generation of summaries with more than one sentence, the method that achieved the best results was the LSA Squared algorithm, for the various ROUGE metrics.
自动文档摘要是自动生成源文本的压缩版本的任务,是信息检索和自然语言处理领域的基本问题之一。在本文中,不同的提取方法在总结与葡萄牙语新闻文本对应的单个文件的任务中进行了比较。通过使用ROUGE软件包来衡量生成摘要的质量,我们报告了两个不同实验领域的结果,包括(i)用欧洲葡萄牙语撰写的新闻文章的标题生成,以及(ii)用巴西葡萄牙语撰写的新闻文章的摘要生成。结果表明,基于首句选择的方法在构建提取新闻标题时,在几个ROUGE指标方面具有最佳效果。对于多句摘要的生成,对于各种ROUGE指标,取得最佳效果的方法是LSA Squared算法。
{"title":"Uma Comparação Sistemática de Diferentes Abordagens para a Sumarização Automática Extrativa de Textos em Português","authors":"M. Costa, Bruno Martins","doi":"10.21814/LM.7.1.203","DOIUrl":"https://doi.org/10.21814/LM.7.1.203","url":null,"abstract":"Automatic document summarization is the task of automatically generating condensed versions of source texts, presenting itself as one of the fundamental problems in the areas of Information Retrieval and Natural Language Processing. In this paper, different extractive approaches are compared in the task of summarizing individual documents corresponding to journalistic texts written in Portuguese. Through the use of the ROUGE package for measuring the quality of the produced summaries, we report on results for two different experimental domains, involving (i) the generation of headlines for news articles written in European Portuguese, and (ii) the generation of summaries for news articles written in Brazilian Portuguese. The results demonstrate that methods based on the selection of the first sentences have the best results  when building extractive news headlines in terms of several ROUGE metrics. Regarding the generation of summaries with more than one sentence, the method that achieved the best results was the LSA Squared algorithm, for the various ROUGE metrics.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"7 1","pages":"23-40"},"PeriodicalIF":0.6,"publicationDate":"2015-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Extração de Relações utilizando Features Diferenciadas para Português 葡萄牙语中使用不同特征的关系提取
IF 0.6 Q4 LINGUISTICS Pub Date : 2014-12-26 DOI: 10.21814/LM.6.2.182
Erick Nilsen Pereira de Souza, Daniela Barreio Claro
Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).
关系抽取(RE)是信息抽取(IE)的一项任务,负责发现非结构化文本中概念之间的语义关系。当提取不限于预定义的一组关系时,该任务称为开放关系提取(Open Relation extraction),其主要挑战是在已识别的关系中减少无效提取的比例。当前基于一组特定机器学习特征的方法消除了许多无效的提取。然而,这些解决方案的缺点是高度依赖于语言。考虑到每种语言的特殊性,这种依赖源于很难找到Open RE问题的最具代表性的特性集。在此背景下,本研究提出评估基于葡萄牙语开放关系提取特征的分类困难,旨在建立新的解决方案,减少该任务中的语言依赖性。结果表明,英语中的许多代表性特征不能直接映射到葡萄牙语中,并具有令人满意的分类优点。在评价的分类算法中,J48的F-measure值为84.1%,效果最好,其次是SVM(83.9%)、Perceptron(82.0%)和Naive Bayes(79.9%)。
{"title":"Extração de Relações utilizando Features Diferenciadas para Português","authors":"Erick Nilsen Pereira de Souza, Daniela Barreio Claro","doi":"10.21814/LM.6.2.182","DOIUrl":"https://doi.org/10.21814/LM.6.2.182","url":null,"abstract":"Relation Extraction (RE) is a task of Information Extraction (IE) responsible for the discovery of semantic relationships between concepts in unstructured text. When the extraction is not limited to a predefined set of relations, the task is called Open Relation Extraction, whose main challenge is to reduce the proportion of invalid extractions in the universe of relationships identified. Current methods based on a set of specific machine learning features eliminate much of the invalid extractions. However, these solutions have the disadvantage of being highly language-dependent. This dependence arises from the difficulty in finding the most representative set of features to the Open RE problem, considering the peculiarities of each language. In this context, the present work proposes to assess the difficulties of classification based on features in open relation extraction in Portuguese, aiming to base new solutions that can reduce language dependence in this task. The results indicate that many representative features in English can not be mapped directly to the Portuguese language with satisfactory merits of classification. Among the classification algorithms evaluated, J48 showed the best results with a F-measure value of 84.1%, followed by SVM (83.9%), Perceptron (82.0%) and Naive Bayes (79,9%).","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"55 3 1","pages":"57-65"},"PeriodicalIF":0.6,"publicationDate":"2014-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68370924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Izen+aditz konbinazioen azterketa elebiduna, hizkuntza-aplikazio aurreratuei begira 名称+听力组合测试元素,检查高级语言应用程序
IF 0.6 Q4 LINGUISTICS Pub Date : 2014-12-26 DOI: 10.21814/LM.6.2.188
Uxoa Iñurrieta Urmeneta, I. Aduriz, A. D. D. Ilarraza, Gorka Labaka, K. Sarasola
This article deals with noun+verb combinations in bilingual Basque-Spanish and Spanish-Basque dictionaries. We take a look at morphosyntactic and semantic features of word combinations in both language directions, and compare them to identify differences and similarities. Our work reveals the high complexity of those constructions and, hence, the need to address them specifically in Natural Language Processing tools, for example in Machine Translation. All of our results are publicly available online, where users can query the combinations we have analysed.
本文讨论了巴斯克-西班牙语和西班牙-巴斯克语双语词典中的名词+动词组合。我们研究了两种语言方向的词组合的形态句法和语义特征,并对它们进行了比较,以识别异同。我们的工作揭示了这些结构的高度复杂性,因此,需要在自然语言处理工具中专门解决它们,例如在机器翻译中。我们所有的结果都在网上公开,用户可以查询我们分析的组合。
{"title":"Izen+aditz konbinazioen azterketa elebiduna, hizkuntza-aplikazio aurreratuei begira","authors":"Uxoa Iñurrieta Urmeneta, I. Aduriz, A. D. D. Ilarraza, Gorka Labaka, K. Sarasola","doi":"10.21814/LM.6.2.188","DOIUrl":"https://doi.org/10.21814/LM.6.2.188","url":null,"abstract":"This article deals with noun+verb combinations in bilingual Basque-Spanish and Spanish-Basque dictionaries. We take a look at morphosyntactic and semantic features of word combinations in both language directions, and compare them to identify differences and similarities. Our work reveals the high complexity of those constructions and, hence, the need to address them specifically in Natural Language Processing tools, for example in Machine Translation. All of our results are publicly available online, where users can query the combinations we have analysed.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"6 1","pages":"45-55"},"PeriodicalIF":0.6,"publicationDate":"2014-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
O dicionario de sinónimos como recurso para a expansión de WordNet 同义词词典作为WordNet扩展的资源
IF 0.6 Q4 LINGUISTICS Pub Date : 2014-12-26 DOI: 10.21814/LM.6.2.183
Xavier Gómez Guinovart, Miguel Anxo Solla Portela
In this paper, we present the foundations for a lexical acquisition experiment designed in the framework of the SKATeR research project and aimed to the expansion of the Galician WordNet using the lexicographical data collected in a ``traditional'' Galician dictionary of synonyms.
在本文中,我们提出了在SKATeR研究项目框架下设计的词汇习得实验的基础,该实验旨在利用从“传统”加利西亚同义词词典中收集的词典编纂数据来扩展加利西亚WordNet。
{"title":"O dicionario de sinónimos como recurso para a expansión de WordNet","authors":"Xavier Gómez Guinovart, Miguel Anxo Solla Portela","doi":"10.21814/LM.6.2.183","DOIUrl":"https://doi.org/10.21814/LM.6.2.183","url":null,"abstract":"In this paper, we present the foundations for a lexical acquisition experiment designed in the framework of the SKATeR research project and aimed to the expansion of the Galician WordNet using the lexicographical data collected in a ``traditional'' Galician dictionary of synonyms.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"6 1","pages":"69-74"},"PeriodicalIF":0.6,"publicationDate":"2014-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68370991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Projetos sobre Tradução Automática do Português no Laboratório de Sistemas de Língua Falada do INESC-ID INESC-ID口语系统实验室葡萄牙语机器翻译项目
IF 0.6 Q4 LINGUISTICS Pub Date : 2014-12-26 DOI: 10.21814/LM.6.2.196
Anabela Barreiro, Wang Ling, Luísa Coheur, Fernando Batista, Isabel Trancoso
Language technologies, in particular machine translation applications, have the potential to help break down linguistic and cultural barriers, presenting an important contribution to the globalization and internationalization of the Portuguese language, by allowing content to be shared 'from' and 'to' this language. This article aims to present the research work developed at the Laboratory of Spoken Language Systems of INESC-ID in the field of machine translation, namely the automated speech translation, the translation of microblogs and the creation of a hybrid machine translation system. We will focus on the creation of the hybrid system, which aims at combining linguistic knowledge, in particular semantico-syntactic knowledge, with statistical knowledge, to increase the level of translation quality.
语言技术,特别是机器翻译应用程序,有可能帮助打破语言和文化障碍,通过允许内容“从”和“到”这种语言共享,为葡萄牙语的全球化和国际化做出重要贡献。本文旨在介绍INESC-ID口语系统实验室在机器翻译领域的研究工作,即语音自动翻译、微博翻译和混合机器翻译系统的创建。我们将专注于创建混合系统,旨在将语言学知识,特别是语义句法知识与统计知识相结合,以提高翻译质量水平。
{"title":"Projetos sobre Tradução Automática do Português no Laboratório de Sistemas de Língua Falada do INESC-ID","authors":"Anabela Barreiro, Wang Ling, Luísa Coheur, Fernando Batista, Isabel Trancoso","doi":"10.21814/LM.6.2.196","DOIUrl":"https://doi.org/10.21814/LM.6.2.196","url":null,"abstract":"Language technologies, in particular machine translation applications, have the potential to help break down linguistic and cultural barriers, presenting an important contribution to the globalization and internationalization of the Portuguese language, by allowing content to be shared 'from' and 'to' this language. This article aims to present the research work developed at the Laboratory of Spoken Language Systems of INESC-ID in the field of machine translation, namely the automated speech translation, the translation of microblogs and the creation of a hybrid machine translation system. We will focus on the creation of the hybrid system, which aims at combining linguistic knowledge, in particular semantico-syntactic knowledge, with statistical knowledge, to increase the level of translation quality.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"6 1","pages":"75-85"},"PeriodicalIF":0.6,"publicationDate":"2014-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Euskarazko denbora-egiturak. Azterketa eta etiketatze-esperimentua 媒体时间结构。检验与标签实验
IF 0.6 Q4 LINGUISTICS Pub Date : 2014-12-25 DOI: 10.21814/LM.6.2.184
Begoña Altuna, M.ª Jesús Aranzabe, A. D. D. Ilarraza
Time information extraction is very useful in natural language processing (NLP), as it can be used in text simplification, information extraction and machine translation systems. In this paper we present the first steps of making that information accessible for Basque language: on one hand, Basque structures that convey time have been analysed based on grammars and, on the other hand, first decisions on tagging those on real texts have been taken. Also, we give account of an annotating experiment we have carried out on a financial news corpus.
时间信息提取在自然语言处理(NLP)中非常有用,可以用于文本简化、信息提取和机器翻译系统。在本文中,我们提出了使巴斯克语可以访问这些信息的第一步:一方面,根据语法分析了传达时间的巴斯克语结构,另一方面,已经采取了在真实文本上标记这些结构的初步决定。此外,我们还介绍了我们在财经新闻语料库上进行的注释实验。
{"title":"Euskarazko denbora-egiturak. Azterketa eta etiketatze-esperimentua","authors":"Begoña Altuna, M.ª Jesús Aranzabe, A. D. D. Ilarraza","doi":"10.21814/LM.6.2.184","DOIUrl":"https://doi.org/10.21814/LM.6.2.184","url":null,"abstract":"Time information extraction is very useful in natural language processing (NLP), as it can be used in text simplification, information extraction and machine translation systems. In this paper we present the first steps of making that information accessible for Basque language: on one hand, Basque structures that convey time have been analysed based on grammars and, on the other hand, first decisions on tagging those on real texts have been taken. Also, we give account of an annotating experiment we have carried out on a financial news corpus.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"6 1","pages":"13-24"},"PeriodicalIF":0.6,"publicationDate":"2014-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Avaliação de métodos de desofuscação de palavrões 评价消除脏话混淆的方法
IF 0.6 Q4 LINGUISTICS Pub Date : 2014-12-25 DOI: 10.21814/LM.6.2.191
Gustavo Laboreiro, E. Oliveira
Cursing is a form of expression that is noted by its intensity. When someone uses this form of expression they are emitting a spontaneous and raw form of opinion, usually suppressed for the ``mild ways'' and sensitive people. As it happens, this sort of expression is also valuable when doing some sort of opinion mining and sentiment analysis, now a routine task across the social networks. Therefore in this work we try to evaluate the methods that allow the recovery of this forms of expression, disguised through obfuscation methods, often as a way to escape automatic censorship.
咒骂是一种以其强度著称的表达方式。当有人使用这种表达方式时,他们发出了一种自发的、原始的观点,通常被“温和的方式”和敏感的人所压抑。碰巧的是,这种表达在进行某种意见挖掘和情感分析时也很有价值,现在这是整个社交网络的常规任务。因此,在这项工作中,我们试图评估允许恢复这种表达形式的方法,通过混淆方法伪装,通常作为逃避自动审查的一种方式。
{"title":"Avaliação de métodos de desofuscação de palavrões","authors":"Gustavo Laboreiro, E. Oliveira","doi":"10.21814/LM.6.2.191","DOIUrl":"https://doi.org/10.21814/LM.6.2.191","url":null,"abstract":"Cursing is a form of expression that is noted by its intensity. When someone uses this form of expression they are emitting a spontaneous and raw form of opinion, usually suppressed for the ``mild ways'' and sensitive people. As it happens, this sort of expression is also valuable when doing some sort of opinion mining and sentiment analysis, now a routine task across the social networks. Therefore in this work we try to evaluate the methods that allow the recovery of this forms of expression, disguised through obfuscation methods, often as a way to escape automatic censorship.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"6 1","pages":"25-43"},"PeriodicalIF":0.6,"publicationDate":"2014-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68371242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Linguamatica
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1