CamPros at CASE 2022 Task 1: Transformer-based Multilingual Protest News Detection

Kumari Neha, Mrinal Anand, Tushar Mohan, P. Kumaraguru, Arun Balaji Buduru
{"title":"CamPros at CASE 2022 Task 1: Transformer-based Multilingual Protest News Detection","authors":"Kumari Neha, Mrinal Anand, Tushar Mohan, P. Kumaraguru, Arun Balaji Buduru","doi":"10.18653/v1/2022.case-1.24","DOIUrl":null,"url":null,"abstract":"Socio-political protests often lead to grave consequences when they occur. The early detection of such protests is very important for taking early precautionary measures. However, the main shortcoming of protest event detection is the scarcity of sufficient training data for specific language categories, which makes it difficult to train data-hungry deep learning models effectively. Therefore, cross-lingual and zero-shot learning models are needed to detect events in various low-resource languages. This paper proposes a multi-lingual cross-document level event detection approach using pre-trained transformer models developed for Shared Task 1 at CASE 2022. The shared task constituted four subtasks for event detection at different granularity levels, i.e., document level to token level, spread over multiple languages (English, Spanish, Portuguese, Turkish, Urdu, and Mandarin). Our system achieves an average F1 score of 0.73 for document-level event detection tasks. Our approach secured 2nd position for the Hindi language in subtask 1 with an F1 score of 0.80. While for Spanish, we secure 4th position with an F1 score of 0.69. Our code is available at https://github.com/nehapspathak/campros/.","PeriodicalId":80307,"journal":{"name":"The Case manager","volume":"7 1","pages":"169-174"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Case manager","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.case-1.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Socio-political protests often lead to grave consequences when they occur. The early detection of such protests is very important for taking early precautionary measures. However, the main shortcoming of protest event detection is the scarcity of sufficient training data for specific language categories, which makes it difficult to train data-hungry deep learning models effectively. Therefore, cross-lingual and zero-shot learning models are needed to detect events in various low-resource languages. This paper proposes a multi-lingual cross-document level event detection approach using pre-trained transformer models developed for Shared Task 1 at CASE 2022. The shared task constituted four subtasks for event detection at different granularity levels, i.e., document level to token level, spread over multiple languages (English, Spanish, Portuguese, Turkish, Urdu, and Mandarin). Our system achieves an average F1 score of 0.73 for document-level event detection tasks. Our approach secured 2nd position for the Hindi language in subtask 1 with an F1 score of 0.80. While for Spanish, we secure 4th position with an F1 score of 0.69. Our code is available at https://github.com/nehapspathak/campros/.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CamPros在CASE 2022任务1:基于变压器的多语言抗议新闻检测
社会政治抗议一旦发生,往往会导致严重后果。及早发现这类抗议对于采取早期预防措施非常重要。然而,抗议事件检测的主要缺点是缺乏足够的特定语言类别的训练数据,这使得难以有效地训练数据饥渴型深度学习模型。因此,需要跨语言和零概率学习模型来检测各种低资源语言中的事件。本文提出了一种多语言跨文档级事件检测方法,该方法使用为CASE 2022共享任务1开发的预训练变压器模型。共享任务构成了四个子任务,用于在不同粒度级别(即从文档级别到令牌级别)进行事件检测,它们分布在多种语言(英语、西班牙语、葡萄牙语、土耳其语、乌尔都语和普通话)上。我们的系统在文档级事件检测任务上的平均F1分数为0.73。我们的方法确保了印地语在子任务1中的第二名,F1得分为0.80。而在西班牙语方面,我们以0.69的F1分数获得了第四名。我们的代码可在https://github.com/nehapspathak/campros/上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Drivers and scorecards to improve hypertension control in primary care practice: Recommendations from the HEARTS in the Americas Innovation Group. NLP4ITF @ Causal News Corpus 2022: Leveraging Linguistic Information for Event Causality Classification SPOCK @ Causal News Corpus 2022: Cause-Effect-Signal Span Detection Using Span-Based and Sequence Tagging Models CEIA-NLP at CASE 2022 Task 1: Protest News Detection for Portuguese Point cloud extraction of aircraft skin butt joint based on adaptive matching calibration algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1