Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi
{"title":"SentiCR: A customized sentiment analysis tool for code review interactions","authors":"Toufique Ahmed, Amiangshu Bosu, Anindya Iqbal, S. Rahimi","doi":"10.1109/ASE.2017.8115623","DOIUrl":null,"url":null,"abstract":"Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.","PeriodicalId":382876,"journal":{"name":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"110","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2017.8115623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 110
Abstract
Sentiment Analysis tools, developed for analyzing social media text or product reviews, work poorly on a Software Engineering (SE) dataset. Since prior studies have found developers expressing sentiments during various SE activities, there is a need for a customized sentiment analysis tool for the SE domain. On this goal, we manually labeled 2000 review comments to build a training dataset and used our dataset to evaluate seven popular sentiment analysis tools. The poor performances of the existing sentiment analysis tools motivated us to build SentiCR, a sentiment analysis tool especially designed for code review comments. We evaluated SentiCR using one hundred 10-fold cross-validations of eight supervised learning algorithms. We found a model, trained using the Gradient Boosting Tree (GBT) algorithm, providing the highest mean accuracy (83%), the highest mean precision (67.8%), and the highest mean recall (58.4%) in identifying negative review comments.