CodeLabeller:一个基于web的Java设计模式和摘要代码注释工具

IF 0.6 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Software Engineering and Knowledge Engineering Pub Date : 2023-06-16 DOI:10.1142/s0218194023500213

Najam Nazar, Norman Chen, Chun Yong Chong

{"title":"CodeLabeller:一个基于web的Java设计模式和摘要代码注释工具","authors":"Najam Nazar, Norman Chen, Chun Yong Chong","doi":"10.1142/s0218194023500213","DOIUrl":null,"url":null,"abstract":"While constructing supervised learning models, we require labeled examples to build a corpus and train a machine learning model. However, most studies have built the labeled dataset manually, which, on many occasions, is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labeling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labeling each Java source file with their respective design patterns and summaries. Twenty-five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labeled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries\",\"authors\":\"Najam Nazar, Norman Chen, Chun Yong Chong\",\"doi\":\"10.1142/s0218194023500213\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While constructing supervised learning models, we require labeled examples to build a corpus and train a machine learning model. However, most studies have built the labeled dataset manually, which, on many occasions, is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labeling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labeling each Java source file with their respective design patterns and summaries. Twenty-five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labeled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.\",\"PeriodicalId\":50288,\"journal\":{\"name\":\"International Journal of Software Engineering and Knowledge Engineering\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Knowledge Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218194023500213\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0218194023500213","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在构建监督学习模型时，我们需要标记示例来构建语料库并训练机器学习模型。然而，大多数研究都是手动构建标记数据集，这在很多情况下是一项艰巨的任务。为了缓解这个问题，我们构建了一个名为CodeLabeller的在线工具。CodeLabeller是一个基于web的工具，旨在通过改进整个数据收集过程，提供一种有效的方法来处理大规模监督学习方法的源代码文件标记过程。CodeLabeller的测试方法是构造一个由上千个源代码文件组成的语料库，这些文件来自大量开源Java项目，并用各自的设计模式和摘要标记每个Java源文件。软件工程领域的25位专家使用标准的用户体验问卷在线调查参与了该工具的可用性评估。调查结果表明，该工具在享乐和语用质量标准上达到Good标准，易于使用，满足监督分类器标注语料库的需求。除了帮助研究人员众包标记数据集之外，该工具在软件工程教育中具有实际适用性，并有助于为软件工件建立专家评级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries

While constructing supervised learning models, we require labeled examples to build a corpus and train a machine learning model. However, most studies have built the labeled dataset manually, which, on many occasions, is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labeling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labeling each Java source file with their respective design patterns and summaries. Twenty-five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labeled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Software Engineering and Knowledge Engineering 工程技术-工程：电子与电气

CiteScore

1.90

自引率

11.10%

发文量

审稿时长

16 months

期刊介绍： The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.