Application of back-translation: a transfer learning approach to identify ambiguous software requirements

Proceedings of the 2021 ACM Southeast Conference Pub Date : 2021-04-15 DOI:10.1145/3409334.3452068

Isha Subedi, Maninder Singh, Vijayalakshmi Ramasamy, G. Walia

{"title":"Application of back-translation: a transfer learning approach to identify ambiguous software requirements","authors":"Isha Subedi, Maninder Singh, Vijayalakshmi Ramasamy, G. Walia","doi":"10.1145/3409334.3452068","DOIUrl":null,"url":null,"abstract":"Ambiguous requirements are problematic in requirement engineering as various stakeholders can debate on the interpretation of the requirements leading to a variety of issues in the development stages. Since requirement specifications are usually written in natural language, analyzing ambiguous requirements is currently a manual process as it has not been fully automated to meet the industry standards. In this paper, we used transfer learning by using ULMFiT where we pre-trained our model to a general-domain corpus and then fine-tuned it to classify ambiguous vs unambiguous requirements (target task). We then compared its accuracy with machine learning classifiers like SVM, Linear Regression, and Multinomial Naive Bayes. We also used back translation (BT) as a text augmentation technique to see if it improved the classification accuracy. Our results showed that ULMFiT achieved higher accuracy than SVM (Support Vector Machines), Logistic Regression and Multinomial Naive Bayes for our initial data set. Further by augmenting requirements using BT, ULMFiT got a higher accuracy than SVM, Logistic Regression, and Multinomial Naive Bayes classifier, improving the initial performance by 5.371%. Our proposed research provides some promising insights on how transfer learning and text augmentation can be applied to small data sets in requirements engineering.","PeriodicalId":148741,"journal":{"name":"Proceedings of the 2021 ACM Southeast Conference","volume":"201 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM Southeast Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409334.3452068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Ambiguous requirements are problematic in requirement engineering as various stakeholders can debate on the interpretation of the requirements leading to a variety of issues in the development stages. Since requirement specifications are usually written in natural language, analyzing ambiguous requirements is currently a manual process as it has not been fully automated to meet the industry standards. In this paper, we used transfer learning by using ULMFiT where we pre-trained our model to a general-domain corpus and then fine-tuned it to classify ambiguous vs unambiguous requirements (target task). We then compared its accuracy with machine learning classifiers like SVM, Linear Regression, and Multinomial Naive Bayes. We also used back translation (BT) as a text augmentation technique to see if it improved the classification accuracy. Our results showed that ULMFiT achieved higher accuracy than SVM (Support Vector Machines), Logistic Regression and Multinomial Naive Bayes for our initial data set. Further by augmenting requirements using BT, ULMFiT got a higher accuracy than SVM, Logistic Regression, and Multinomial Naive Bayes classifier, improving the initial performance by 5.371%. Our proposed research provides some promising insights on how transfer learning and text augmentation can be applied to small data sets in requirements engineering.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

反向翻译的应用:一种识别模糊软件需求的迁移学习方法

在需求工程中，模糊的需求是有问题的，因为不同的涉众可能会对需求的解释进行辩论，从而导致开发阶段中的各种问题。由于需求说明通常是用自然语言编写的，分析模棱两可的需求目前是一个手工过程，因为它还没有完全自动化以满足行业标准。在本文中，我们通过使用ULMFiT使用迁移学习，其中我们将模型预训练到通用领域语料库，然后对其进行微调以分类模糊与非模糊的需求(目标任务)。然后，我们将其与机器学习分类器(如SVM、线性回归和多项朴素贝叶斯)的准确性进行了比较。我们还使用反向翻译(BT)作为文本增强技术，看看它是否提高了分类精度。结果表明，对于我们的初始数据集，ULMFiT比SVM(支持向量机)、Logistic回归和多项朴素贝叶斯获得了更高的精度。此外，通过使用BT增强需求，ULMFiT获得了比SVM、Logistic回归和多项朴素贝叶斯分类器更高的准确率，初始性能提高了5.371%。我们提出的研究为如何将迁移学习和文本增强应用于需求工程中的小数据集提供了一些有希望的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2021 ACM Southeast Conference

自引率

0.00%

发文量