Analysing The Sentiments Of Marathi-English Code-Mixed Social Media Data Using Machine Learning Techniques

Varad Patwardhan, Gauri Takawane, Nirmayi Kelkar, Omkar Gaikwad, Rutwik Saraf, S. Sonawane
{"title":"Analysing The Sentiments Of Marathi-English Code-Mixed Social Media Data Using Machine Learning Techniques","authors":"Varad Patwardhan, Gauri Takawane, Nirmayi Kelkar, Omkar Gaikwad, Rutwik Saraf, S. Sonawane","doi":"10.1109/ESCI56872.2023.10100304","DOIUrl":null,"url":null,"abstract":"A vast amount of data is generated every day through social media platforms. Various techniques and methodologies are used to bring different forms of data to use. One such form of data is textual data generated from social media platforms in the form of chats, comments, and tweets. The term “code-mixed data” describes data that combines components of different languages or linguistic subgroups such as text written in several different languages or speech that shifts between languages. Due to increased social media use and worldwide communication, many individuals are using multiple languages in their daily communication, making this type of data even more crucial. Machine translation, speech recognition, and text categorization are just a few examples of natural language processing activities that can be performed on code-mixed data. Research on code-mixed data can also aid in the understanding of multilingual communication. In this paper, we present an empirical study on the problem of word-level language identification and text normalisation for Marathi-English code-mixed text. We have created a new dataset of 1009 sentences that exhibit code-mixing of Marathi (Romanised) and English textual data. This data was collected from Whatsapp chats and Youtube comments.","PeriodicalId":441215,"journal":{"name":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCI56872.2023.10100304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A vast amount of data is generated every day through social media platforms. Various techniques and methodologies are used to bring different forms of data to use. One such form of data is textual data generated from social media platforms in the form of chats, comments, and tweets. The term “code-mixed data” describes data that combines components of different languages or linguistic subgroups such as text written in several different languages or speech that shifts between languages. Due to increased social media use and worldwide communication, many individuals are using multiple languages in their daily communication, making this type of data even more crucial. Machine translation, speech recognition, and text categorization are just a few examples of natural language processing activities that can be performed on code-mixed data. Research on code-mixed data can also aid in the understanding of multilingual communication. In this paper, we present an empirical study on the problem of word-level language identification and text normalisation for Marathi-English code-mixed text. We have created a new dataset of 1009 sentences that exhibit code-mixing of Marathi (Romanised) and English textual data. This data was collected from Whatsapp chats and Youtube comments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用机器学习技术分析马拉语-英语代码混合的社交媒体数据的情绪
每天都有大量数据通过社交媒体平台产生。使用各种技术和方法来使用不同形式的数据。其中一种数据形式是社交媒体平台以聊天、评论和tweet的形式生成的文本数据。术语“代码混合数据”描述了结合了不同语言或语言子组的组件的数据,例如用几种不同语言编写的文本或在语言之间转换的语音。由于社交媒体的使用和全球交流的增加,许多人在日常交流中使用多种语言,这使得这类数据变得更加重要。机器翻译、语音识别和文本分类只是可以在代码混合数据上执行的自然语言处理活动的几个例子。对代码混合数据的研究也有助于理解多语言交流。本文对马拉地语-英语语码混合文本的词级语言识别和文本规范化问题进行了实证研究。我们创建了一个包含1009个句子的新数据集,这些句子展示了马拉地语(罗马化)和英语文本数据的代码混合。这些数据是从Whatsapp聊天和Youtube评论中收集的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Novel Approach to Maze Solving Algorithm Android Based Smart Appointment System (SAS) for Booking and Interacting with Teacher for Counselling A Compact Asymmetric Coplanar Strip (ACS) Antenna for WLAN and Wi-Fi Applications Insight on Human Activity Recognition Using the Deep Learning Approach Patients' Health Analysis using Machine Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1