{"title":"用增强的数据增强方法改进多类代码可读性分类(130)","authors":"Qing Mi, Luo Wang, Lisha Hu, Liwei Ou, Yang Yu","doi":"10.1142/s0218194022500656","DOIUrl":null,"url":null,"abstract":"Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"20 1","pages":"1709-1731"},"PeriodicalIF":0.6000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)\",\"authors\":\"Qing Mi, Luo Wang, Lisha Hu, Liwei Ou, Yang Yu\",\"doi\":\"10.1142/s0218194022500656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.\",\"PeriodicalId\":50288,\"journal\":{\"name\":\"International Journal of Software Engineering and Knowledge Engineering\",\"volume\":\"20 1\",\"pages\":\"1709-1731\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218194022500656\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500656","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Improving Multi-Class Code Readability Classification with An Enhanced Data Augmentation Approach (130)
Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.
期刊介绍:
The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published:
Research papers reporting original research results
Technology trend surveys reviewing an area of research in software engineering and knowledge engineering
Survey articles surveying a broad area in software engineering and knowledge engineering
In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome.
A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.