Resource Construction and Ensemble Learning based Sentiment Analysis for the Low-resource Language Uyghur

網際網路技術學刊 Pub Date : 2023-07-01 DOI:10.53106/160792642023072404018

Azragul Yusup Azragul Yusup, Degang Chen Azragul Yusup, Yifei Ge Degang Chen, Hongliang Mao Yifei Ge, Nujian Wang Hongliang Mao

{"title":"Resource Construction and Ensemble Learning based Sentiment Analysis for the Low-resource Language Uyghur","authors":"Azragul Yusup Azragul Yusup, Degang Chen Azragul Yusup, Yifei Ge Degang Chen, Hongliang Mao Yifei Ge, Nujian Wang Hongliang Mao","doi":"10.53106/160792642023072404018","DOIUrl":null,"url":null,"abstract":"\n To address the problem of scarce low-resource sentiment analysis corpus nowadays, this paper proposes a sentence-level sentiment analysis resource conversion method HTL based on the syntactic-semantic knowledge of the low-resource language Uyghur to convert high-resource corpus to low-resource corpus. In the conversion process, a k-fold cross-filtering method is proposed to reduce the distortion of data samples, which is used to select high-quality samples for conversion; finally, the Uyghur sentiment analysis dataset USD is constructed; the Baseline of this dataset is verified under the LSTM model, and the accuracy and F1 values reach 81.07% and 81.13%, respectively, which can provide a reference for the construction of low-resource language corpus nowadays. The accuracy and F1 values reached 81.07% and 81.13%, respectively, which can provide a reference for the construction of today’s low-resource corpus. Meanwhile, this paper also proposes a sentiment analysis model based on logistic regression ensemble learning, SA-LREL, which combines the advantages of several lightweight network models such as TextCNN, RNN, and RCNN as the base model, and the meta-model is constructed using logistic regression functions for ensemble, and the accuracy and F1 values reach 82.17% and 81.86% respectively in the test set, and the experimental results show that the method can effectively improve the performance of Uyghur sentiment analysis task.\n \n","PeriodicalId":442331,"journal":{"name":"網際網路技術學刊","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"網際網路技術學刊","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53106/160792642023072404018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

To address the problem of scarce low-resource sentiment analysis corpus nowadays, this paper proposes a sentence-level sentiment analysis resource conversion method HTL based on the syntactic-semantic knowledge of the low-resource language Uyghur to convert high-resource corpus to low-resource corpus. In the conversion process, a k-fold cross-filtering method is proposed to reduce the distortion of data samples, which is used to select high-quality samples for conversion; finally, the Uyghur sentiment analysis dataset USD is constructed; the Baseline of this dataset is verified under the LSTM model, and the accuracy and F1 values reach 81.07% and 81.13%, respectively, which can provide a reference for the construction of low-resource language corpus nowadays. The accuracy and F1 values reached 81.07% and 81.13%, respectively, which can provide a reference for the construction of today’s low-resource corpus. Meanwhile, this paper also proposes a sentiment analysis model based on logistic regression ensemble learning, SA-LREL, which combines the advantages of several lightweight network models such as TextCNN, RNN, and RCNN as the base model, and the meta-model is constructed using logistic regression functions for ensemble, and the accuracy and F1 values reach 82.17% and 81.86% respectively in the test set, and the experimental results show that the method can effectively improve the performance of Uyghur sentiment analysis task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于集成学习的低资源语言维吾尔语情感分析

针对当前低资源情感分析语料库稀缺的问题，本文提出了一种基于低资源语言维吾尔语的句法语义知识的句子级情感分析资源转换方法html，实现高资源语料库向低资源语料库的转换。在转换过程中，提出了k-fold交叉滤波方法，以减少数据样本的失真，选择高质量的样本进行转换;最后，构建维吾尔语情感分析数据集USD;在LSTM模型下对该数据集的Baseline进行了验证，准确率和F1值分别达到81.07%和81.13%，可为当前低资源语言语料库的构建提供参考。准确率和F1值分别达到81.07%和81.13%，可为当今低资源语料库的构建提供参考。同时，本文还提出了一种基于逻辑回归集成学习的情感分析模型SA-LREL，该模型结合了TextCNN、RNN、RCNN等几种轻量级网络模型的优点作为基模型，并使用逻辑回归函数进行集成构建元模型，测试集的准确率和F1值分别达到82.17%和81.86%。实验结果表明，该方法可以有效地提高维吾尔语情感分析任务的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

網際網路技術學刊

自引率

0.00%

发文量