Ali Majidzadeh, Mehrdad Ashtiani, Morteza Zakeri-Nasrabadi
{"title":"通过代码数据增强和微调进行多类型需求可追溯性预测 MS-CodeBERT","authors":"Ali Majidzadeh, Mehrdad Ashtiani, Morteza Zakeri-Nasrabadi","doi":"10.1016/j.csi.2024.103850","DOIUrl":null,"url":null,"abstract":"<div><p>Requirement traceability is a crucial quality factor that highly impacts the software evolution process and maintenance costs. Automated traceability links recovery techniques are required for a reliable and low-cost software development life cycle. Pre-trained language models have shown promising results on many natural language tasks. However, using such pre-trained models for requirement traceability needs large and quality traceability datasets and accurate fine-tuning mechanisms. This paper proposes code augmentation and fine-tuning techniques to prepare the MS-CodeBERT pre-trained language model for various types of requirements traceability prediction including documentation-to-method, issue-to-commit, and issue-to-method links. Three program transformation operations, namely, Rename Variable, Swap Operands, and Swap Statements are designed to generate new quality samples increasing the sample diversity of the traceability datasets. A 2-stage and 3-stage fine-tuning mechanism is proposed to fine-tune the language model for the three types of requirement traceability prediction on provided datasets. Experiments on 14 Java projects demonstrate a 6.2% to 8.5% improvement in the precision, 2.5% to 5.2% improvement in the recall, and 3.8% to 7.3% improvement in the F1 score of the traceability prediction models compared to the best results from the state-of-the-art methods.</p></div>","PeriodicalId":50635,"journal":{"name":"Computer Standards & Interfaces","volume":"90 ","pages":"Article 103850"},"PeriodicalIF":4.1000,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-type requirements traceability prediction by code data augmentation and fine-tuning MS-CodeBERT\",\"authors\":\"Ali Majidzadeh, Mehrdad Ashtiani, Morteza Zakeri-Nasrabadi\",\"doi\":\"10.1016/j.csi.2024.103850\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Requirement traceability is a crucial quality factor that highly impacts the software evolution process and maintenance costs. Automated traceability links recovery techniques are required for a reliable and low-cost software development life cycle. Pre-trained language models have shown promising results on many natural language tasks. However, using such pre-trained models for requirement traceability needs large and quality traceability datasets and accurate fine-tuning mechanisms. This paper proposes code augmentation and fine-tuning techniques to prepare the MS-CodeBERT pre-trained language model for various types of requirements traceability prediction including documentation-to-method, issue-to-commit, and issue-to-method links. Three program transformation operations, namely, Rename Variable, Swap Operands, and Swap Statements are designed to generate new quality samples increasing the sample diversity of the traceability datasets. A 2-stage and 3-stage fine-tuning mechanism is proposed to fine-tune the language model for the three types of requirement traceability prediction on provided datasets. Experiments on 14 Java projects demonstrate a 6.2% to 8.5% improvement in the precision, 2.5% to 5.2% improvement in the recall, and 3.8% to 7.3% improvement in the F1 score of the traceability prediction models compared to the best results from the state-of-the-art methods.</p></div>\",\"PeriodicalId\":50635,\"journal\":{\"name\":\"Computer Standards & Interfaces\",\"volume\":\"90 \",\"pages\":\"Article 103850\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Standards & Interfaces\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0920548924000199\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Standards & Interfaces","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0920548924000199","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Multi-type requirements traceability prediction by code data augmentation and fine-tuning MS-CodeBERT
Requirement traceability is a crucial quality factor that highly impacts the software evolution process and maintenance costs. Automated traceability links recovery techniques are required for a reliable and low-cost software development life cycle. Pre-trained language models have shown promising results on many natural language tasks. However, using such pre-trained models for requirement traceability needs large and quality traceability datasets and accurate fine-tuning mechanisms. This paper proposes code augmentation and fine-tuning techniques to prepare the MS-CodeBERT pre-trained language model for various types of requirements traceability prediction including documentation-to-method, issue-to-commit, and issue-to-method links. Three program transformation operations, namely, Rename Variable, Swap Operands, and Swap Statements are designed to generate new quality samples increasing the sample diversity of the traceability datasets. A 2-stage and 3-stage fine-tuning mechanism is proposed to fine-tune the language model for the three types of requirement traceability prediction on provided datasets. Experiments on 14 Java projects demonstrate a 6.2% to 8.5% improvement in the precision, 2.5% to 5.2% improvement in the recall, and 3.8% to 7.3% improvement in the F1 score of the traceability prediction models compared to the best results from the state-of-the-art methods.
期刊介绍:
The quality of software, well-defined interfaces (hardware and software), the process of digitalisation, and accepted standards in these fields are essential for building and exploiting complex computing, communication, multimedia and measuring systems. Standards can simplify the design and construction of individual hardware and software components and help to ensure satisfactory interworking.
Computer Standards & Interfaces is an international journal dealing specifically with these topics.
The journal
• Provides information about activities and progress on the definition of computer standards, software quality, interfaces and methods, at national, European and international levels
• Publishes critical comments on standards and standards activities
• Disseminates user''s experiences and case studies in the application and exploitation of established or emerging standards, interfaces and methods
• Offers a forum for discussion on actual projects, standards, interfaces and methods by recognised experts
• Stimulates relevant research by providing a specialised refereed medium.