CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries

IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Software Engineering and Knowledge Engineering Pub Date : 2023-06-16 DOI:10.1142/s0218194023500213
Najam Nazar, Norman Chen, Chun Yong Chong
{"title":"CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries","authors":"Najam Nazar, Norman Chen, Chun Yong Chong","doi":"10.1142/s0218194023500213","DOIUrl":null,"url":null,"abstract":"While constructing supervised learning models, we require labeled examples to build a corpus and train a machine learning model. However, most studies have built the labeled dataset manually, which, on many occasions, is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labeling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labeling each Java source file with their respective design patterns and summaries. Twenty-five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labeled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0218194023500213","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

While constructing supervised learning models, we require labeled examples to build a corpus and train a machine learning model. However, most studies have built the labeled dataset manually, which, on many occasions, is a daunting task. To mitigate this problem, we have built an online tool called CodeLabeller. CodeLabeller is a web-based tool that aims to provide an efficient approach to handling the process of labeling source code files for supervised learning methods at scale by improving the data collection process throughout. CodeLabeller is tested by constructing a corpus of over a thousand source files obtained from a large collection of open source Java projects and labeling each Java source file with their respective design patterns and summaries. Twenty-five experts in the field of software engineering participated in a usability evaluation of the tool using the standard User Experience Questionnaire online survey. The survey results demonstrate that the tool achieves the Good standard on hedonic and pragmatic quality standards, is easy to use and meets the needs of annotating the corpus for supervised classifiers. Apart from assisting researchers in crowdsourcing a labeled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CodeLabeller:一个基于web的Java设计模式和摘要代码注释工具
在构建监督学习模型时,我们需要标记示例来构建语料库并训练机器学习模型。然而,大多数研究都是手动构建标记数据集,这在很多情况下是一项艰巨的任务。为了缓解这个问题,我们构建了一个名为CodeLabeller的在线工具。CodeLabeller是一个基于web的工具,旨在通过改进整个数据收集过程,提供一种有效的方法来处理大规模监督学习方法的源代码文件标记过程。CodeLabeller的测试方法是构造一个由上千个源代码文件组成的语料库,这些文件来自大量开源Java项目,并用各自的设计模式和摘要标记每个Java源文件。软件工程领域的25位专家使用标准的用户体验问卷在线调查参与了该工具的可用性评估。调查结果表明,该工具在享乐和语用质量标准上达到Good标准,易于使用,满足监督分类器标注语料库的需求。除了帮助研究人员众包标记数据集之外,该工具在软件工程教育中具有实际适用性,并有助于为软件工件建立专家评级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.90
自引率
11.10%
发文量
71
审稿时长
16 months
期刊介绍: The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.
期刊最新文献
An Empirical Study of Fault Localization on Novice Programs and Addressing the Tie Problem An Empirical Study of the Impact of Class Overlap on the Performance and Interpretability of Cross-Version Defect Prediction Quantum Software: The Brain of Quantum Quantum Software Encompasses Classical Software: Density Matrix from the Laplacian A noise validation for Quantum Circuit Scheduling through a Service-Oriented Architecture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1