{"title":"使用大型语言模型对应用程序评论进行细粒度情感分析:评估研究","authors":"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma","doi":"arxiv-2409.07162","DOIUrl":null,"url":null,"abstract":"Analyzing user reviews for sentiment towards app features can provide\nvaluable insights into users' perceptions of app functionality and their\nevolving needs. Given the volume of user reviews received daily, an automated\nmechanism to generate feature-level sentiment summaries of user reviews is\nneeded. Recent advances in Large Language Models (LLMs) such as ChatGPT have\nshown impressive performance on several new tasks without updating the model's\nparameters i.e. using zero or a few labeled examples. Despite these\nadvancements, LLMs' capabilities to perform feature-specific sentiment analysis\nof user reviews remain unexplored. This study compares the performance of\nstate-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\nextracting app features and associated sentiments under 0-shot, 1-shot, and\n5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\nrule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\n5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\npositive sentiment towards correctly predicted app features, with 5-shot\nenhancing it by 7%. Our study suggests that LLM models are promising for\ngenerating feature-specific sentiment summaries of user reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study\",\"authors\":\"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma\",\"doi\":\"arxiv-2409.07162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analyzing user reviews for sentiment towards app features can provide\\nvaluable insights into users' perceptions of app functionality and their\\nevolving needs. Given the volume of user reviews received daily, an automated\\nmechanism to generate feature-level sentiment summaries of user reviews is\\nneeded. Recent advances in Large Language Models (LLMs) such as ChatGPT have\\nshown impressive performance on several new tasks without updating the model's\\nparameters i.e. using zero or a few labeled examples. Despite these\\nadvancements, LLMs' capabilities to perform feature-specific sentiment analysis\\nof user reviews remain unexplored. This study compares the performance of\\nstate-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\\nextracting app features and associated sentiments under 0-shot, 1-shot, and\\n5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\\nrule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\\n5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\\npositive sentiment towards correctly predicted app features, with 5-shot\\nenhancing it by 7%. Our study suggests that LLM models are promising for\\ngenerating feature-specific sentiment summaries of user reviews.\",\"PeriodicalId\":501278,\"journal\":{\"name\":\"arXiv - CS - Software Engineering\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study
Analyzing user reviews for sentiment towards app features can provide
valuable insights into users' perceptions of app functionality and their
evolving needs. Given the volume of user reviews received daily, an automated
mechanism to generate feature-level sentiment summaries of user reviews is
needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have
shown impressive performance on several new tasks without updating the model's
parameters i.e. using zero or a few labeled examples. Despite these
advancements, LLMs' capabilities to perform feature-specific sentiment analysis
of user reviews remain unexplored. This study compares the performance of
state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for
extracting app features and associated sentiments under 0-shot, 1-shot, and
5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms
rule-based approaches by 23.6% in f1-score with zero-shot feature extraction;
5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting
positive sentiment towards correctly predicted app features, with 5-shot
enhancing it by 7%. Our study suggests that LLM models are promising for
generating feature-specific sentiment summaries of user reviews.