文本体裁的结构化数据表示作为一种自动文本处理技术

IF 0.8 0 LANGUAGE & LINGUISTICS Texto Livre-Linguagem e Tecnologia Pub Date : 2022-01-27 DOI:10.35699/1983-3652.2022.35445
Claudia Aparecida Fonseca, M. V. C. Guelpeli, Rafael Santiago de Souza Netto
{"title":"文本体裁的结构化数据表示作为一种自动文本处理技术","authors":"Claudia Aparecida Fonseca, M. V. C. Guelpeli, Rafael Santiago de Souza Netto","doi":"10.35699/1983-3652.2022.35445","DOIUrl":null,"url":null,"abstract":"The present article was developed in the field of Natural Language Processing and Language Studies based on a corpus compiled by computational tools. This study is based on the assumption that it is helpful to trace a close relationship between corpus generation/annotation and the assessment of the constitutive elements of the text genre source. It aims to demonstrate, through specific studies of structured data from the text genre ‘scientific article’, alternatives to automatic text processing techniques. In order to reach the intended goal, the authors created a computational model for the compilation of a linguistic, specialized Corpus, representative of the genre Scientific Article - CorpACE. The object of study includes the constitutive elements of scientific articles, marked in XML, extracted and collected from the SciELO-Scientific Electronic Library On-line database. The final product was a database obtained with information extracted and structured in XML format, which designates and identifies the markups of the genre being analyzed and is available for many tools and applications. The results demonstrate how the representation of constitutive elements of the genre can condense available information with hierarchical and dynamic processes built during the compilation. At the end of the study, it is believed that more research will be required for bringing Language Science and Computer Science closer with emphasis on NLP in the attempt to represent and manipulate linguistic knowledge in its many levels – morphological, syntactic, semantic and discursive – in order to improve implementation and manipulation of automatic text processing.","PeriodicalId":52012,"journal":{"name":"Texto Livre-Linguagem e Tecnologia","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Representation of structured data of the text genre as a technique for automatic text processing\",\"authors\":\"Claudia Aparecida Fonseca, M. V. C. Guelpeli, Rafael Santiago de Souza Netto\",\"doi\":\"10.35699/1983-3652.2022.35445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present article was developed in the field of Natural Language Processing and Language Studies based on a corpus compiled by computational tools. This study is based on the assumption that it is helpful to trace a close relationship between corpus generation/annotation and the assessment of the constitutive elements of the text genre source. It aims to demonstrate, through specific studies of structured data from the text genre ‘scientific article’, alternatives to automatic text processing techniques. In order to reach the intended goal, the authors created a computational model for the compilation of a linguistic, specialized Corpus, representative of the genre Scientific Article - CorpACE. The object of study includes the constitutive elements of scientific articles, marked in XML, extracted and collected from the SciELO-Scientific Electronic Library On-line database. The final product was a database obtained with information extracted and structured in XML format, which designates and identifies the markups of the genre being analyzed and is available for many tools and applications. The results demonstrate how the representation of constitutive elements of the genre can condense available information with hierarchical and dynamic processes built during the compilation. At the end of the study, it is believed that more research will be required for bringing Language Science and Computer Science closer with emphasis on NLP in the attempt to represent and manipulate linguistic knowledge in its many levels – morphological, syntactic, semantic and discursive – in order to improve implementation and manipulation of automatic text processing.\",\"PeriodicalId\":52012,\"journal\":{\"name\":\"Texto Livre-Linguagem e Tecnologia\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Texto Livre-Linguagem e Tecnologia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35699/1983-3652.2022.35445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Texto Livre-Linguagem e Tecnologia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35699/1983-3652.2022.35445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

摘要

本文是在自然语言处理和语言研究领域的基础上,基于计算工具汇编的语料库。本研究是基于这样一个假设:语料库生成/注释与文本体裁源的构成要素评估之间的密切关系是有帮助的。它旨在通过对文本类型“科学文章”的结构化数据的具体研究,展示自动文本处理技术的替代方案。为了达到预期的目标,作者创建了一个计算模型,用于编译一个语言学的,专门的语料库,代表体裁科学文章- CorpACE。研究对象包括从SciELO-Scientific Electronic Library在线数据库中提取并收集的以XML标记的科学文章的构成要素。最终的产品是一个数据库,其中提取了以XML格式结构化的信息,它指定和标识了要分析的类型的标记,并且可供许多工具和应用程序使用。结果表明,该类型的构成要素的表示可以通过编译过程中建立的分层和动态过程来浓缩可用信息。在研究的最后,我们相信将需要更多的研究来拉近语言科学和计算机科学的距离,重点放在NLP上,试图在形态学、句法、语义和话语等多个层面上表示和操纵语言知识,以改进自动文本处理的实现和操作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Representation of structured data of the text genre as a technique for automatic text processing
The present article was developed in the field of Natural Language Processing and Language Studies based on a corpus compiled by computational tools. This study is based on the assumption that it is helpful to trace a close relationship between corpus generation/annotation and the assessment of the constitutive elements of the text genre source. It aims to demonstrate, through specific studies of structured data from the text genre ‘scientific article’, alternatives to automatic text processing techniques. In order to reach the intended goal, the authors created a computational model for the compilation of a linguistic, specialized Corpus, representative of the genre Scientific Article - CorpACE. The object of study includes the constitutive elements of scientific articles, marked in XML, extracted and collected from the SciELO-Scientific Electronic Library On-line database. The final product was a database obtained with information extracted and structured in XML format, which designates and identifies the markups of the genre being analyzed and is available for many tools and applications. The results demonstrate how the representation of constitutive elements of the genre can condense available information with hierarchical and dynamic processes built during the compilation. At the end of the study, it is believed that more research will be required for bringing Language Science and Computer Science closer with emphasis on NLP in the attempt to represent and manipulate linguistic knowledge in its many levels – morphological, syntactic, semantic and discursive – in order to improve implementation and manipulation of automatic text processing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Texto Livre-Linguagem e Tecnologia
Texto Livre-Linguagem e Tecnologia LANGUAGE & LINGUISTICS-
CiteScore
1.10
自引率
16.70%
发文量
32
审稿时长
5 weeks
期刊介绍: Texto Livre: Linguagem e Tecnologia is a quarterly journal, sponsored by the School of Letters of the Federal University of Minas Gerais (Brazil) since 2008. It welcomes submissions of articles, reviews, essays and translations on the relationship between languages and digital media. Its mission is to promote scientific production in the field of language studies, especially analysis of writing and practices for teaching writing through free and open new technologies, and studies on documentation and dissemination of free and open software, providing researchers from Brazil and abroad with the opportunity to share their research and contribute to the debate and scientific progress in the area. Topics of interest to this journal include: intertextuality, usability, computer use in the classroom, free culture, digital inclusion, digital literacy, dissemination of free software and other topics related to language and technology. The journal accepts manuscripts in Portuguese, Spanish, English and French, with no need for a translation into Portuguese. Texto Livre is intended for researchers and for a non-academic audience interested in critical approaches to the related topics addressed by the journal.
期刊最新文献
Los videojuegos independientes en Wikipedia: análisis de las referencias utilizadas para representar juegos con posibilidades educativas Análisis de evidencias evaluativas ante la efectividad del e-liderazgo en Educación Superior Formación y competencia digital del profesorado de Educación Secundaria en España Profesores universitarios: condiciones de teletrabajo y uso de tecnologías en el marco de la enseñanza remota de emergencia Instagram, fast food and Historical-Critical Pedagogy: ingredients for the discourse in favor of healthy eating in the context of teaching English
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1