BrAgriNews: Um Corpus Temporal-Causal (Português-Brasileiro) para a Agricultura

IF 0.3 Q4 LINGUISTICS Linguamatica Pub Date : 2017-01-07 DOI:10.21814/lm.9.1.245
Brett Drury, Robson Fernandes, Alneu de Andrade Lopes
{"title":"BrAgriNews: Um Corpus Temporal-Causal (Português-Brasileiro) para a Agricultura","authors":"Brett Drury, Robson Fernandes, Alneu de Andrade Lopes","doi":"10.21814/lm.9.1.245","DOIUrl":null,"url":null,"abstract":"There has been a recent sharp increase in interest in academia and industry in applying machine learning and artificial intelligence to agricultural problems. Text mining and related natural language processing techniques, have been rarely used to tackle agricultural problems, and at the time of writing there was a single project in the Portuguese language. It is possible that the failure of researchers to use text mining techniques to analyze Portuguese texts to resolve agricultural problems may be due to a lack of freely available corpora. To correct the lack of a Portuguese language agriculture centric corpus we are releasing a Brazilian-Portuguese agricultural language resource, which is described by this paper. The corpus is partially non-contiguous and spans a time period from 1996 to 2016. It consists of news stories that have been scraped from Brazilian News sites that have been annotated with the following information types: causal, sentiment, named entities that include temporal expressions. The corpus has additional resources such as a: treebank, lists of frequent: unigrams, bigrams and trigrams, as well words or phrases that have been identified by journalists as either: ``important'' or domain specific.  It is hoped that the release of this corpus will stimulate the adoption of text mining in agriculture in the Lusophonic research community.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"9 1","pages":"41-54"},"PeriodicalIF":0.3000,"publicationDate":"2017-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/lm.9.1.245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 4

Abstract

There has been a recent sharp increase in interest in academia and industry in applying machine learning and artificial intelligence to agricultural problems. Text mining and related natural language processing techniques, have been rarely used to tackle agricultural problems, and at the time of writing there was a single project in the Portuguese language. It is possible that the failure of researchers to use text mining techniques to analyze Portuguese texts to resolve agricultural problems may be due to a lack of freely available corpora. To correct the lack of a Portuguese language agriculture centric corpus we are releasing a Brazilian-Portuguese agricultural language resource, which is described by this paper. The corpus is partially non-contiguous and spans a time period from 1996 to 2016. It consists of news stories that have been scraped from Brazilian News sites that have been annotated with the following information types: causal, sentiment, named entities that include temporal expressions. The corpus has additional resources such as a: treebank, lists of frequent: unigrams, bigrams and trigrams, as well words or phrases that have been identified by journalists as either: ``important'' or domain specific.  It is hoped that the release of this corpus will stimulate the adoption of text mining in agriculture in the Lusophonic research community.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BrAgriNews:农业的时间-因果语料库(葡萄牙-巴西)
最近,学术界和工业界对将机器学习和人工智能应用于农业问题的兴趣急剧增加。文本挖掘和相关的自然语言处理技术很少用于解决农业问题,在撰写本文时,有一个葡萄牙语的单一项目。研究人员未能使用文本挖掘技术来分析葡萄牙语文本以解决农业问题,可能是由于缺乏免费的语料库。为了纠正葡萄牙语农业中心语料库的缺乏,我们发布了一个巴西-葡萄牙农业语言资源,本文对此进行了描述。语料库部分不连续,时间跨度从1996年到2016年。它由从巴西新闻网站抓取的新闻故事组成,这些新闻故事已经用以下信息类型进行了注释:因果关系、情感、包含时间表达式的命名实体。语料库有额外的资源,如树库,频繁的单字母、双字母和三字母列表,以及被记者识别为“重要”或特定领域的单词或短语。希望这个语料库的发布将刺激在葡语研究界农业文本挖掘的采用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Linguamatica
Linguamatica LINGUISTICS-
CiteScore
1.40
自引率
0.00%
发文量
4
审稿时长
6 weeks
期刊最新文献
A compilação e a análise de métricas textuais de um corpus de redações Classificação da qualidade da argumentação em tweets no domínio da política brasileira Extracção de Relações de Apoio e Oposição em Títulos de Notícias de Política em Português Pais, filhos e outras relações familiares no DIP DIP - Desafio de Identificação de Personagens: objectivo, organização, recursos e resultados
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1