SentiCR:用于代码审查交互的定制情感分析工具

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2017-10-30 DOI:10.1109/ASE.2017.8115623

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi

{"title":"SentiCR:用于代码审查交互的定制情感分析工具","authors":"Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi","doi":"10.1109/ASE.2017.8115623","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"110","resultStr":"{\"title\":\"SentiCR: A customized sentiment analysis tool for code review interactions\",\"authors\":\"Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi\",\"doi\":\"10.1109/ASE.2017.8115623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.\",\"PeriodicalId\":382876,\"journal\":{\"name\":\"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"110\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASE.2017.8115623\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2017.8115623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 110

摘要

为分析社交媒体文本或产品评论而开发的情感分析工具在软件工程(SE)数据集上表现不佳。由于先前的研究发现开发人员在各种SE活动中表达情感，因此需要为SE领域定制情感分析工具。为了实现这一目标，我们手动标记了2000条评论来构建一个训练数据集，并使用我们的数据集来评估七种流行的情感分析工具。现有情感分析工具的糟糕表现促使我们构建SentiCR，这是一个专门为代码审查评论设计的情感分析工具。我们使用8种监督学习算法的100次10倍交叉验证来评估SentiCR。我们发现了一个使用梯度增强树(GBT)算法训练的模型，在识别负面评论方面提供了最高的平均准确率(83%)、最高的平均精度(67.8%)和最高的平均召回率(58.4%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SentiCR: A customized sentiment analysis tool for code review interactions

Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量

期刊最新文献

TiQi: A natural language interface for querying software project data A comprehensive study on real world concurrency bugs in Node.js Managing software evolution through semantic history slicing Software performance self-adaptation through efficient model predictive control Privacy-aware data-intensive applications