SentiCR: A customized sentiment analysis tool for code review interactions

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2017-10-30 DOI:10.1109/ASE.2017.8115623

Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi

引用次数: 110

Abstract

Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SentiCR:用于代码审查交互的定制情感分析工具

为分析社交媒体文本或产品评论而开发的情感分析工具在软件工程(SE)数据集上表现不佳。由于先前的研究发现开发人员在各种SE活动中表达情感，因此需要为SE领域定制情感分析工具。为了实现这一目标，我们手动标记了2000条评论来构建一个训练数据集，并使用我们的数据集来评估七种流行的情感分析工具。现有情感分析工具的糟糕表现促使我们构建SentiCR，这是一个专门为代码审查评论设计的情感分析工具。我们使用8种监督学习算法的100次10倍交叉验证来评估SentiCR。我们发现了一个使用梯度增强树(GBT)算法训练的模型，在识别负面评论方面提供了最高的平均准确率(83%)、最高的平均精度(67.8%)和最高的平均召回率(58.4%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量

期刊最新文献

TiQi: A natural language interface for querying software project data A comprehensive study on real world concurrency bugs in Node.js Managing software evolution through semantic history slicing Software performance self-adaptation through efficient model predictive control Privacy-aware data-intensive applications