基于NLP的情感分析应用的代码多样性图鲁-英语数据集

Prashanth Kannadaguli
{"title":"基于NLP的情感分析应用的代码多样性图鲁-英语数据集","authors":"Prashanth Kannadaguli","doi":"10.1109/ACTS53447.2021.9708241","DOIUrl":null,"url":null,"abstract":"Due to expanded praxis of social media, there is an elevated interest in the Natural Language Processing (NLP) of textual substance. Code swapping is a ubiquitous paradox in multilingual nation and the social communication shows mixing of a low resourced language with a highly resourced language mostly written in non-native script in the same text. It is essential to refine the code swapped text to support distinctive NLP tasks such as Machine Translation, Automated Conversational Systems and Sentiment Analysis (SA). The preeminent objective of SA is to identify and analyze the attitude, opinion, emotion or the sentiment in the dataset. Though there are multiple systems skilled on monodialectal dataset, all of them break down when it comes for code-diverse data because of the heightened intricacy of blending at various standards of text. Nonetheless, there exist a smaller number of assets for modelling such definitive code-mixed data and the Machine Learning or the Deep Learning algorithms enforcing supervised learning approach yield the better results compared to the unsupervised learning. Such datasets are available for Hindi-English, Tamil-English, Malayalam-English, Bengali-English, German-English, Spanish-English, Japanese-English, Arabic-English etc. Though our research is concentrated towards NLP for emotion and sentiment detection of Tulu, a vibrant south Indian language, to start with, we build the first ever platinum standard corpus for NLP applications of code-diverse text in Tulu-English, as there is no such resource in our native language. The performance analysis of our dataset through Krippendorff’s Alpha value of 0.9 indicates that it is a benchmark in development of Automatic Sentiment Analysis system for Tulu.","PeriodicalId":201741,"journal":{"name":"2021 Advanced Communication Technologies and Signal Processing (ACTS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Code-Diverse Tulu-English Dataset For NLP Based Sentiment Analysis Applications\",\"authors\":\"Prashanth Kannadaguli\",\"doi\":\"10.1109/ACTS53447.2021.9708241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to expanded praxis of social media, there is an elevated interest in the Natural Language Processing (NLP) of textual substance. Code swapping is a ubiquitous paradox in multilingual nation and the social communication shows mixing of a low resourced language with a highly resourced language mostly written in non-native script in the same text. It is essential to refine the code swapped text to support distinctive NLP tasks such as Machine Translation, Automated Conversational Systems and Sentiment Analysis (SA). The preeminent objective of SA is to identify and analyze the attitude, opinion, emotion or the sentiment in the dataset. Though there are multiple systems skilled on monodialectal dataset, all of them break down when it comes for code-diverse data because of the heightened intricacy of blending at various standards of text. Nonetheless, there exist a smaller number of assets for modelling such definitive code-mixed data and the Machine Learning or the Deep Learning algorithms enforcing supervised learning approach yield the better results compared to the unsupervised learning. Such datasets are available for Hindi-English, Tamil-English, Malayalam-English, Bengali-English, German-English, Spanish-English, Japanese-English, Arabic-English etc. Though our research is concentrated towards NLP for emotion and sentiment detection of Tulu, a vibrant south Indian language, to start with, we build the first ever platinum standard corpus for NLP applications of code-diverse text in Tulu-English, as there is no such resource in our native language. The performance analysis of our dataset through Krippendorff’s Alpha value of 0.9 indicates that it is a benchmark in development of Automatic Sentiment Analysis system for Tulu.\",\"PeriodicalId\":201741,\"journal\":{\"name\":\"2021 Advanced Communication Technologies and Signal Processing (ACTS)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Advanced Communication Technologies and Signal Processing (ACTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACTS53447.2021.9708241\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Advanced Communication Technologies and Signal Processing (ACTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACTS53447.2021.9708241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

随着社交媒体应用的不断扩大,人们对文本内容的自然语言处理(NLP)越来越感兴趣。代码交换是多语言国家普遍存在的矛盾现象,社会交际表现为低资源语言与高资源语言在同一文本中以非母语文字书写的混合。为了支持机器翻译、自动对话系统和情感分析(SA)等独特的NLP任务,必须对交换文本的代码进行优化。SA的主要目标是识别和分析数据集中的态度、意见、情感或情绪。虽然有多个系统能够处理单方言数据集,但当涉及到代码多样化的数据时,它们都崩溃了,因为混合不同标准的文本会变得更加复杂。尽管如此,对于这种明确的代码混合数据进行建模的资产数量较少,与无监督学习相比,机器学习或深度学习算法执行监督学习方法产生更好的结果。这些数据集可用于印度语英语,泰米尔语英语,马拉雅拉姆语英语,孟加拉语英语,德语英语,西班牙语英语,日语英语,阿拉伯语英语等。虽然我们的研究主要集中在对图鲁语(一种充满活力的南印度语言)进行情感和情感检测的NLP,但我们首先建立了第一个用于图鲁英语代码多样化文本的NLP应用的白金标准语料库,因为在我们的母语中没有这样的资源。通过Krippendorff的Alpha值为0.9对我们的数据集进行性能分析,表明它是图鲁自动情感分析系统开发的基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Code-Diverse Tulu-English Dataset For NLP Based Sentiment Analysis Applications
Due to expanded praxis of social media, there is an elevated interest in the Natural Language Processing (NLP) of textual substance. Code swapping is a ubiquitous paradox in multilingual nation and the social communication shows mixing of a low resourced language with a highly resourced language mostly written in non-native script in the same text. It is essential to refine the code swapped text to support distinctive NLP tasks such as Machine Translation, Automated Conversational Systems and Sentiment Analysis (SA). The preeminent objective of SA is to identify and analyze the attitude, opinion, emotion or the sentiment in the dataset. Though there are multiple systems skilled on monodialectal dataset, all of them break down when it comes for code-diverse data because of the heightened intricacy of blending at various standards of text. Nonetheless, there exist a smaller number of assets for modelling such definitive code-mixed data and the Machine Learning or the Deep Learning algorithms enforcing supervised learning approach yield the better results compared to the unsupervised learning. Such datasets are available for Hindi-English, Tamil-English, Malayalam-English, Bengali-English, German-English, Spanish-English, Japanese-English, Arabic-English etc. Though our research is concentrated towards NLP for emotion and sentiment detection of Tulu, a vibrant south Indian language, to start with, we build the first ever platinum standard corpus for NLP applications of code-diverse text in Tulu-English, as there is no such resource in our native language. The performance analysis of our dataset through Krippendorff’s Alpha value of 0.9 indicates that it is a benchmark in development of Automatic Sentiment Analysis system for Tulu.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Semantic segmentation of lungs using a modified U-Net architecture through limited Computed Tomography images Throughput Analysis of SWIPT-Enabled Multiuser IoT Networks With Hardware Imperfections Over Nakagami-m Fading Channels Outage Performance of Hybrid Satellite-Aerial-Terrestrial Networks in the Presence of Interference A Code-Diverse Tulu-English Dataset For NLP Based Sentiment Analysis Applications Design of a Modified Tree-fractal Antenna for RFID Reader Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1