Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering最新文献

Arabic Cyberbullying Detection Using Machine Learning: State of the Art Survey 阿拉伯网络欺凌检测使用机器学习:现状的艺术调查

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593968

Norah Alsunaidi, Sarah Aljbali, Y. Yasin, Hamoud Aljamaan

Cyberbullying (CB) is a global dilemma that is growing rapidly to affect more individuals including minors. The devastating consequences of CB indicate a pressing necessity to regulate unethical or illegal users' online behaviors. A remarkable number of researchers attempted to harness the potential of machine learning to detect and prevent such harmful behaviors, however, the existing studies targeting Arabic-based content are still emerging. Therefore, this paper provides a comprehensive review of the published empirical studies in CB detection in Arabic-based content with an emphasis on the adapted methodologies, gaps, and challenges. We hope this work would support researchers in the area of CB-detection to foster a safe online environment and protect against any harmful consequences of CB among users.

网络欺凌(CB)是一个全球性的难题，它正在迅速发展，影响到包括未成年人在内的更多个人。CB的破坏性后果表明，迫切需要规范不道德或非法用户的在线行为。大量研究人员试图利用机器学习的潜力来检测和预防这种有害行为，然而，针对阿拉伯语内容的现有研究仍在兴起。因此，本文对已发表的基于阿拉伯语内容的CB检测实证研究进行了全面回顾，重点是适应的方法、差距和挑战。我们希望这项工作能够支持CB检测领域的研究人员，以营造一个安全的在线环境，并保护用户免受CB的任何有害后果。

引用次数: 0

Do Developers Benefit from Recommendations when Repairing Inconsistent Design Models? a Controlled Experiment 当修复不一致的设计模型时，开发人员是否从建议中受益?对照实验

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593482

Luciano Marchezan, W. K. Assunção, G. Michelon, Alexander Egyed

Repairing design models is a laborious task that requires a considerable amount of time and effort from developers. Repair recommendation (RR) approaches focus on reducing the effort and improving the quality of the repairs performed. Such approaches have been evaluated in terms of scalability, correctness, and minimalism. These evaluations, however, have not investigated how developers can benefit from using RRs and how they perceive the difficulty of applying RRs. Investigating and discussing the use of RRs from the developers’ perspective is important to demonstrate the benefits of applying such approaches in practice. We explore this opportunity by conducting a controlled experiment carried out with 24 developers where they repaired UML design models in eight different tasks, with and without RRs. The findings indicate that developers can benefit from RRs in complex tasks by improving their effectiveness and efficiency. The results also evidence that the use of RRs does not impact the developers’ perceived difficulty and confidence when repairing models. Furthermore, our findings show that not all developers choose the same RR, but rather, have varied preferences. Thus, the provision of RRs leads to developers considering additional alternatives to repair an inconsistency.

修复设计模型是一项费力的任务，需要开发人员花费大量的时间和精力。维修建议(RR)方法侧重于减少工作量和提高维修质量。这些方法已经在可伸缩性、正确性和极简性方面进行了评估。然而，这些评估并没有调查开发人员如何从使用rr中获益，以及他们如何理解应用rr的困难。从开发人员的角度调查和讨论rr的使用对于演示在实践中应用此类方法的好处非常重要。我们通过对24个开发人员进行控制实验来探索这个机会，他们在8个不同的任务中修复UML设计模型，有或没有rr。研究结果表明，在复杂的任务中，开发人员可以从rr中受益，从而提高他们的效率和效率。结果还表明，在修复模型时，rr的使用不会影响开发人员的感知难度和信心。此外，我们的研究结果表明，并不是所有的开发者都选择相同的RR，而是有不同的偏好。因此，rr的提供会导致开发人员考虑修复不一致的其他替代方案。

{"title":"Do Developers Benefit from Recommendations when Repairing Inconsistent Design Models? a Controlled Experiment","authors":"Luciano Marchezan, W. K. Assunção, G. Michelon, Alexander Egyed","doi":"10.1145/3593434.3593482","DOIUrl":"https://doi.org/10.1145/3593434.3593482","url":null,"abstract":"Repairing design models is a laborious task that requires a considerable amount of time and effort from developers. Repair recommendation (RR) approaches focus on reducing the effort and improving the quality of the repairs performed. Such approaches have been evaluated in terms of scalability, correctness, and minimalism. These evaluations, however, have not investigated how developers can benefit from using RRs and how they perceive the difficulty of applying RRs. Investigating and discussing the use of RRs from the developers’ perspective is important to demonstrate the benefits of applying such approaches in practice. We explore this opportunity by conducting a controlled experiment carried out with 24 developers where they repaired UML design models in eight different tasks, with and without RRs. The findings indicate that developers can benefit from RRs in complex tasks by improving their effectiveness and efficiency. The results also evidence that the use of RRs does not impact the developers’ perceived difficulty and confidence when repairing models. Furthermore, our findings show that not all developers choose the same RR, but rather, have varied preferences. Thus, the provision of RRs leads to developers considering additional alternatives to repair an inconsistency.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134131938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional Size Measurement in Agile Development: Velocity in Agile Sprints 敏捷开发中的功能规模度量:敏捷sprint中的速度

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593486

Thomas Fehlmann, Andrea Gelli

Agile teams measure their velocity for performance, based on Story Points. However, such velocity does not allow predicting when the product will be finished. Story points measure effort only. They do not discriminate between creating functionality and other tasks. Non-functional requirements (NFR), such as enhancing product quality, test coverage, removing technical debt, as well as process-related NFR such as agreeing with stakeholders, getting requirements right, or documenting, consume effort but do not add functionality. Thus, it remains unclear whether the product makes any progress, or the team is just looping around. Euro Project Office has therefore developed a method how to complement a product backlog by functional size, indicating progress and completeness in unambiguous terms. The method is based on the international standard ISO/IEC 14143 and ISO/IEC 19761. NFR are understood as in . Tools are available as open source and can be used by development teams with minimum investment into training.

敏捷团队基于故事点来衡量他们的性能速度。然而，这样的速度无法预测产品何时完成。故事点只度量工作。它们不区分创建功能和其他任务。非功能性需求(NFR)，例如增强产品质量、测试覆盖率、消除技术债务，以及与过程相关的NFR，例如与涉众达成一致、获得正确的需求或文档化，消耗了工作，但不增加功能。因此，目前还不清楚产品是否取得了任何进展，或者团队只是在原地打转。因此，Euro Project Office开发了一种方法，如何通过功能大小来补充产品待办事项列表，以明确的方式指示进度和完成情况。该方法基于国际标准ISO/IEC 14143和ISO/IEC 19761。NFR可以理解为。这些工具是开源的，开发团队可以用最少的培训投资来使用它们。

引用次数: 0

Training Bachelor Students to Design Better Quality Web Apps: Preliminary Results from a Prospective Empirical Investigation 培养本科学生设计高质量Web应用程序:一项前瞻性实证调查的初步结果

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593957

Sabato Nocera, R. Francese, G. Scanniello

Background: There are a number of academic courses in the Bachelor Program in Computer Science (CS) on the design of Web apps. Often the internal and external quality of the developed Web apps is not adequately taken into account. Aim: We aimed to (i) estimate the quality of Web apps developed by bachelor CS students in a Software Technologies for the Web (STW) course (a.y. 2021-22) and (ii) define a training plan (on the base of the results of the first step) for the students enrolled to this course for the a.y. 2022-23 to let them design and implement better Web apps, and (iii) experimenting the training plan by comparing the quality of Web apps developed in a.y. 2021-22 and a.y. 2022-23. Method: We designed a prospective empirical investigation to study STW with respect to the training of bachelor students with respect to the quality (internal and external) of the developed Web apps. Results: We observed that quality concerns are widespread in the code of the Web apps the STW students developed in the a.y. 2021-22. Therefore, we plan to ask the students of the a.y. 2022-23 to use in their development pipeline a Static Analysis Tool (SAT) to detect quality concerns in the developed Web apps and deal with them. This second step represents an ongoing stage of our research. Conclusions: Our preliminary outcomes suggest that students must be aware that quality is of primary relevance for the development of Web apps and prepared to use SAT in the development pipeline.

背景:在计算机科学学士课程(CS)中有许多关于Web应用程序设计的学术课程。通常开发的Web应用程序的内部和外部质量没有得到充分的考虑。目的:我们的目的(i)估计的质量学士CS学生在软件开发的网络应用技术为网络(STW)课程(a.y。2021 - 22)和(2)定义一个培训计划(第一步的结果的基础上)的学生本课程a.y。2022 - 23,让他们设计和实现更好的Web应用程序,和(3)试验培训计划通过比较a.y开发的Web应用程序的质量。2021 - 22和a.y。2022 - 23所示。方法:我们设计了一项前瞻性实证调查，从开发的Web应用程序的质量(内部和外部)两方面来研究STW对本科学生培训的影响。结果:我们观察到，在2021- 2022年期间，STW学生开发的Web应用程序代码中，质量问题普遍存在。因此，我们计划要求2022-23年级的学生在他们的开发管道中使用静态分析工具(SAT)来检测开发的Web应用程序中的质量问题并处理它们。这第二步代表了我们正在进行的研究阶段。结论:我们的初步结果表明，学生必须意识到质量对Web应用程序的开发至关重要，并准备在开发管道中使用SAT。

{"title":"Training Bachelor Students to Design Better Quality Web Apps: Preliminary Results from a Prospective Empirical Investigation","authors":"Sabato Nocera, R. Francese, G. Scanniello","doi":"10.1145/3593434.3593957","DOIUrl":"https://doi.org/10.1145/3593434.3593957","url":null,"abstract":"Background: There are a number of academic courses in the Bachelor Program in Computer Science (CS) on the design of Web apps. Often the internal and external quality of the developed Web apps is not adequately taken into account. Aim: We aimed to (i) estimate the quality of Web apps developed by bachelor CS students in a Software Technologies for the Web (STW) course (a.y. 2021-22) and (ii) define a training plan (on the base of the results of the first step) for the students enrolled to this course for the a.y. 2022-23 to let them design and implement better Web apps, and (iii) experimenting the training plan by comparing the quality of Web apps developed in a.y. 2021-22 and a.y. 2022-23. Method: We designed a prospective empirical investigation to study STW with respect to the training of bachelor students with respect to the quality (internal and external) of the developed Web apps. Results: We observed that quality concerns are widespread in the code of the Web apps the STW students developed in the a.y. 2021-22. Therefore, we plan to ask the students of the a.y. 2022-23 to use in their development pipeline a Static Analysis Tool (SAT) to detect quality concerns in the developed Web apps and deal with them. This second step represents an ongoing stage of our research. Conclusions: Our preliminary outcomes suggest that students must be aware that quality is of primary relevance for the development of Web apps and prepared to use SAT in the development pipeline.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130823834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impact of Architectural Smells on Software Performance: an Exploratory Study 架构气味对软件性能的影响:一项探索性研究

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593442

Francesca Arcelli Fontana, Mateo Camilli, Davide Rendina, Andrei Gabriel Taraboi, Catia Trubiani

Architectural smells have been studied in the literature looking at several aspects, such as their impact on maintainability as a source of architectural debt, their correlations with code smells, and their evolution in the history of complex projects. The goal of this paper is to extend the study of architectural smells from a different perspective. We focus our attention on software performance, and we aim to quantify the impact of architectural smells as support to explain the root causes of system performance hindrances. Our method consists of a study design matching the occurrence of architectural smells with performance metrics. We exploit state-of-the-art tools for architectural smell detection, software performance profiling, and testing the systems under analysis. The removal of architectural smells generates new versions of systems from which we derive some observations on design changes improving/worsening performance metrics. Our experimentation considers two complex open-source projects, and results show that the detection and removal of two common types of architectural smells yield lower response time (up to ) with a large effect size, i.e., for - of the hotspot methods. The median memory consumption is also lower (up to ) with a large effect size for all the services.

文献从几个方面研究了体系结构气味，例如它们作为体系结构债务的来源对可维护性的影响，它们与代码气味的相关性，以及它们在复杂项目历史中的演变。本文的目的是从不同的角度扩展建筑气味的研究。我们把注意力集中在软件性能上，我们的目标是量化架构气味的影响，作为解释系统性能障碍的根本原因的支持。我们的方法包括一个研究设计，将建筑气味的发生与性能指标相匹配。我们利用最先进的工具进行架构气味检测、软件性能分析和测试分析中的系统。架构气味的消除产生了系统的新版本，从中我们得出了一些关于设计变化改善/恶化性能指标的观察结果。我们的实验考虑了两个复杂的开源项目，结果表明，检测和去除两种常见类型的建筑气味可以产生更短的响应时间(最多)，并且具有较大的效应大小，即热点方法中的for -。内存消耗的中位数也较低(最多为)，对所有服务都有很大的影响。

{"title":"Impact of Architectural Smells on Software Performance: an Exploratory Study","authors":"Francesca Arcelli Fontana, Mateo Camilli, Davide Rendina, Andrei Gabriel Taraboi, Catia Trubiani","doi":"10.1145/3593434.3593442","DOIUrl":"https://doi.org/10.1145/3593434.3593442","url":null,"abstract":"Architectural smells have been studied in the literature looking at several aspects, such as their impact on maintainability as a source of architectural debt, their correlations with code smells, and their evolution in the history of complex projects. The goal of this paper is to extend the study of architectural smells from a different perspective. We focus our attention on software performance, and we aim to quantify the impact of architectural smells as support to explain the root causes of system performance hindrances. Our method consists of a study design matching the occurrence of architectural smells with performance metrics. We exploit state-of-the-art tools for architectural smell detection, software performance profiling, and testing the systems under analysis. The removal of architectural smells generates new versions of systems from which we derive some observations on design changes improving/worsening performance metrics. Our experimentation considers two complex open-source projects, and results show that the detection and removal of two common types of architectural smells yield lower response time (up to ) with a large effect size, i.e., for - of the hotspot methods. The median memory consumption is also lower (up to ) with a large effect size for all the services.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Analysis of Bug Report Qualities with Fixing Time using a Bayesian Network 基于贝叶斯网络的Bug报告质量与修复时间分析

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593484

Sien Reeve Ordonez Peralta, H. Washizaki, Y. Fukazawa, Yuki Noyori, Shuhei Nojiri, Hideyuki Kanuka

Most client software employs a bug-tracking system, which utilizes user-submitted reports (bug reports) that contain information necessary for software developers to fix bugs. The quality of bug reports drastically differs. Bug reports can include severity, priority, and associated issues determined by researching the addressed bug. Herein we investigate the influence of bug report qualities on successfully fixing a bug and estimating the fixing time. We also examine the claim in previous studies that bias and differences in the treatment of bug reports exist due to broad expertness among the reporters. Our approach examines the relationship between the qualities within the bug-fixing cycle and modeling graphical causal dependencies through a Bayesian Network. Bug reports with attachments, dependencies on another bug, and frequent discussions are more likely to be fixed. In addition, bug reports with a high severity tend to be fixed faster. Moreover, the difficulty of the bug itself may influence the fixing rate such that a straightforward bug will be fixed easier and faster regardless of the bug report quality.

大多数客户端软件采用错误跟踪系统，该系统利用用户提交的报告(错误报告)，其中包含软件开发人员修复错误所需的信息。bug报告的质量差别很大。Bug报告可以包括严重性、优先级，以及通过研究所解决的Bug确定的相关问题。本文研究了bug报告质量对成功修复bug和估计修复时间的影响。我们还检查了先前研究中的说法，即由于报告者的广泛专业知识，在处理错误报告方面存在偏见和差异。我们的方法检查了bug修复周期内的质量之间的关系，并通过贝叶斯网络建模图形因果关系。带有附件的Bug报告、对另一个Bug的依赖以及频繁的讨论更有可能得到修复。此外，具有高严重性的bug报告往往修复得更快。此外，错误本身的困难程度可能会影响修复速度，因此，无论错误报告的质量如何，简单的错误都会更容易更快地修复。

{"title":"Analysis of Bug Report Qualities with Fixing Time using a Bayesian Network","authors":"Sien Reeve Ordonez Peralta, H. Washizaki, Y. Fukazawa, Yuki Noyori, Shuhei Nojiri, Hideyuki Kanuka","doi":"10.1145/3593434.3593484","DOIUrl":"https://doi.org/10.1145/3593434.3593484","url":null,"abstract":"Most client software employs a bug-tracking system, which utilizes user-submitted reports (bug reports) that contain information necessary for software developers to fix bugs. The quality of bug reports drastically differs. Bug reports can include severity, priority, and associated issues determined by researching the addressed bug. Herein we investigate the influence of bug report qualities on successfully fixing a bug and estimating the fixing time. We also examine the claim in previous studies that bias and differences in the treatment of bug reports exist due to broad expertness among the reporters. Our approach examines the relationship between the qualities within the bug-fixing cycle and modeling graphical causal dependencies through a Bayesian Network. Bug reports with attachments, dependencies on another bug, and frequent discussions are more likely to be fixed. In addition, bug reports with a high severity tend to be fixed faster. Moreover, the difficulty of the bug itself may influence the fixing rate such that a straightforward bug will be fixed easier and faster regardless of the bug report quality.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125628075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NxtUnit: Automated Unit Test Generation for Go nxunit:自动生成Go的单元测试

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593443

Siwei Wang, Xue Mao, Ziguang Cao, Yujun Gao, Qucheng Shen, Chao Peng

Automated test generation has been extensively studied for dynamically compiled or typed programming languages like Java and Python. However, Go, a popular statically compiled and typed programming language for server application development, has received limited support from existing tools. To address this gap, we present NxtUnit, an automatic unit test generation tool for Go that uses random testing and is well-suited for microservice architecture. NxtUnit employs a random approach to generate unit tests quickly, making it ideal for smoke testing and providing quick quality feedback. It comes with three types of interfaces: an integrated development environment (IDE) plugin, a command-line interface (CLI), and a browser-based platform. The plugin and CLI tool allow engineers to write unit tests more efficiently, while the platform provides unit test visualization and asynchronous unit test generation. We evaluated NxtUnit by generating unit tests for 13 open-source repositories and 500 ByteDance in-house repositories, resulting in a code coverage of 20.74% for in-house repositories. We conducted a survey among Bytedance engineers and found that NxtUnit can save them 48% of the time on writing unit tests. We have made the CLI tool available at https://github.com/bytedance/nxt_unit.

对于动态编译或类型化编程语言(如Java和Python)，自动化测试生成已经进行了广泛的研究。然而，Go，一种流行的用于服务器应用程序开发的静态编译和类型编程语言，从现有工具中获得的支持有限。为了解决这个问题，我们提出了NxtUnit，这是一个用于Go的自动单元测试生成工具，它使用随机测试，非常适合微服务架构。nxunit采用随机方法快速生成单元测试，使其成为冒烟测试的理想选择，并提供快速的质量反馈。它带有三种类型的接口:集成开发环境(IDE)插件、命令行接口(CLI)和基于浏览器的平台。插件和CLI工具允许工程师更有效地编写单元测试，而平台提供单元测试可视化和异步单元测试生成。我们通过为13个开源存储库和500个ByteDance内部存储库生成单元测试来评估nxunit，从而使内部存储库的代码覆盖率达到20.74%。我们在字节跳动工程师中进行了一项调查，发现nxunit可以为他们节省48%的编写单元测试的时间。我们已经在https://github.com/bytedance/nxt_unit上提供了CLI工具。

{"title":"NxtUnit: Automated Unit Test Generation for Go","authors":"Siwei Wang, Xue Mao, Ziguang Cao, Yujun Gao, Qucheng Shen, Chao Peng","doi":"10.1145/3593434.3593443","DOIUrl":"https://doi.org/10.1145/3593434.3593443","url":null,"abstract":"Automated test generation has been extensively studied for dynamically compiled or typed programming languages like Java and Python. However, Go, a popular statically compiled and typed programming language for server application development, has received limited support from existing tools. To address this gap, we present NxtUnit, an automatic unit test generation tool for Go that uses random testing and is well-suited for microservice architecture. NxtUnit employs a random approach to generate unit tests quickly, making it ideal for smoke testing and providing quick quality feedback. It comes with three types of interfaces: an integrated development environment (IDE) plugin, a command-line interface (CLI), and a browser-based platform. The plugin and CLI tool allow engineers to write unit tests more efficiently, while the platform provides unit test visualization and asynchronous unit test generation. We evaluated NxtUnit by generating unit tests for 13 open-source repositories and 500 ByteDance in-house repositories, resulting in a code coverage of 20.74% for in-house repositories. We conducted a survey among Bytedance engineers and found that NxtUnit can save them 48% of the time on writing unit tests. We have made the CLI tool available at https://github.com/bytedance/nxt_unit.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132657386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Outside the Sandbox: A Study of Input/Output Methods in Java 沙箱之外:Java输入/输出方法的研究

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593501

Matúš Sulír, Sergej Chodarev, Milan Nosáľ

Programming languages often demarcate the internal sandbox, consisting of entities such as objects and variables, from the outside world, e.g., files or network. Although communication with the external world poses fundamental challenges for live programming, reversible debugging, testing, and program analysis in general, studies about this phenomenon are rare. In this paper, we present a preliminary empirical study about the prevalence of input/output (I/O) method usage in Java. We manually categorized 1435 native methods in a Java Standard Edition distribution into non-I/O and I/O-related methods, which were further classified into areas such as desktop or file-related ones. According to the static analysis of a call graph for 798 projects, about 57% of methods potentially call I/O natives. The results of dynamic analysis on 16 benchmarks showed that 21% of the executed methods directly or indirectly called an I/O native. We conclude that neglecting I/O is not a viable option for tool designers and suggest the integration of I/O-related metadata with source code to facilitate their querying.

编程语言通常划分内部沙箱，由对象和变量等实体组成，与外部世界(如文件或网络)区分开来。尽管与外部世界的通信通常对实时编程、可逆调试、测试和程序分析提出了根本性的挑战，但关于这种现象的研究很少。在本文中，我们对Java中输入/输出(I/O)方法的使用进行了初步的实证研究。我们手动将Java标准版发行版中的1435种本地方法分为非I/O和与I/O相关的方法，这些方法进一步分为桌面或文件相关的方法。根据对798个项目的调用图的静态分析，大约57%的方法可能调用I/O本机。对16个基准的动态分析结果表明，21%的执行方法直接或间接调用I/O本机。我们得出的结论是，忽略I/O对工具设计人员来说不是一个可行的选择，并建议将I/O相关的元数据与源代码集成，以方便他们的查询。

{"title":"Outside the Sandbox: A Study of Input/Output Methods in Java","authors":"Matúš Sulír, Sergej Chodarev, Milan Nosáľ","doi":"10.1145/3593434.3593501","DOIUrl":"https://doi.org/10.1145/3593434.3593501","url":null,"abstract":"Programming languages often demarcate the internal sandbox, consisting of entities such as objects and variables, from the outside world, e.g., files or network. Although communication with the external world poses fundamental challenges for live programming, reversible debugging, testing, and program analysis in general, studies about this phenomenon are rare. In this paper, we present a preliminary empirical study about the prevalence of input/output (I/O) method usage in Java. We manually categorized 1435 native methods in a Java Standard Edition distribution into non-I/O and I/O-related methods, which were further classified into areas such as desktop or file-related ones. According to the static analysis of a call graph for 798 projects, about 57% of methods potentially call I/O natives. The results of dynamic analysis on 16 benchmarks showed that 21% of the executed methods directly or indirectly called an I/O native. We conclude that neglecting I/O is not a viable option for tool designers and suggest the integration of I/O-related metadata with source code to facilitate their querying.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimized Tokenization Process for Open-Vocabulary Code Completion: An Empirical Study 开放词汇码补全的优化标记化过程:实证研究

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3594236

Yasir Hussain, Zhiqiu Huang, Yu Zhou, I. A. Khan, Nasrullah Khan, Muhammad Zahid Abbas

Studies have substantiated the efficacy of deep learning-based models in various source code modeling tasks. These models are usually trained on large datasets that are divided into smaller units, known as tokens, utilizing either an open or closed vocabulary system. The selection of a tokenization method can have a profound impact on the number of tokens generated, which in turn can significantly influence the performance of the model. This study investigates the effect of different tokenization methods on source code modeling and proposes an optimized tokenizer to enhance the tokenization performance. The proposed tokenizer employs a hybrid approach that initializes with a global vocabulary based on the most frequent unigrams and incrementally builds an open-vocabulary system. The proposed tokenizer is evaluated against popular tokenization methods such as Closed, Unigram, WordPiece, and BPE tokenizers, as well as tokenizers provided by large pre-trained models such as PolyCoder and CodeGen. The results indicate that the choice of tokenization method can significantly impact the number of sub-tokens generated, which can ultimately influence the modeling performance of a model. Furthermore, our empirical evaluation demonstrates that the proposed tokenizer outperforms other baselines, achieving improved tokenization performance both in terms of a reduced number of sub-tokens and time cost. In conclusion, this study highlights the significance of the choice of tokenization method in source code modeling and the potential for improvement through optimized tokenization techniques.

研究已经证实了基于深度学习的模型在各种源代码建模任务中的有效性。这些模型通常在大型数据集上进行训练，这些数据集被分成更小的单元，称为标记，利用开放或封闭的词汇系统。标记化方法的选择会对生成的标记数量产生深远的影响，而这反过来又会显著影响模型的性能。本文研究了不同的标记化方法对源代码建模的影响，并提出了一种优化的标记器来提高标记化性能。所提出的标记器采用了一种混合方法，该方法初始化基于最常见字母的全局词汇表，并逐步构建开放词汇表系统。提出的标记器是针对流行的标记器方法(如Closed, Unigram, WordPiece和BPE标记器)以及由大型预训练模型(如PolyCoder和CodeGen)提供的标记器进行评估的。结果表明，标记化方法的选择会显著影响生成的子标记的数量，最终影响模型的建模性能。此外，我们的经验评估表明，所提出的标记器优于其他基准，在减少子标记数量和时间成本方面实现了改进的标记化性能。总之，本研究强调了在源代码建模中选择标记化方法的重要性以及通过优化标记化技术进行改进的潜力。

{"title":"Optimized Tokenization Process for Open-Vocabulary Code Completion: An Empirical Study","authors":"Yasir Hussain, Zhiqiu Huang, Yu Zhou, I. A. Khan, Nasrullah Khan, Muhammad Zahid Abbas","doi":"10.1145/3593434.3594236","DOIUrl":"https://doi.org/10.1145/3593434.3594236","url":null,"abstract":"Studies have substantiated the efficacy of deep learning-based models in various source code modeling tasks. These models are usually trained on large datasets that are divided into smaller units, known as tokens, utilizing either an open or closed vocabulary system. The selection of a tokenization method can have a profound impact on the number of tokens generated, which in turn can significantly influence the performance of the model. This study investigates the effect of different tokenization methods on source code modeling and proposes an optimized tokenizer to enhance the tokenization performance. The proposed tokenizer employs a hybrid approach that initializes with a global vocabulary based on the most frequent unigrams and incrementally builds an open-vocabulary system. The proposed tokenizer is evaluated against popular tokenization methods such as Closed, Unigram, WordPiece, and BPE tokenizers, as well as tokenizers provided by large pre-trained models such as PolyCoder and CodeGen. The results indicate that the choice of tokenization method can significantly impact the number of sub-tokens generated, which can ultimately influence the modeling performance of a model. Furthermore, our empirical evaluation demonstrates that the proposed tokenizer outperforms other baselines, achieving improved tokenization performance both in terms of a reduced number of sub-tokens and time cost. In conclusion, this study highlights the significance of the choice of tokenization method in source code modeling and the potential for improvement through optimized tokenization techniques.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131806734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Analyzing the Resource Usage Overhead of Mobile App Development Frameworks 分析移动应用开发框架的资源使用开销

Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

Pub Date : 2023-06-14 DOI: 10.1145/3593434.3593487

Wellington Oliveira, Bernardo Moraes, Fernando Castor, J. Fernandes

Mobile app development frameworks lower the effort to write and deploy apps across different execution platforms. At the same time, their use may limit native optimizations and impose overhead, increasing resource usage. In this paper, we analyze the resource usage of Android benchmarks and apps based on three mobile app development frameworks, Flutter, React Native, and Ionic, comparing them to functionally equivalent, native variants written in Java. These frameworks, besides being in widespread use, represent three different approaches for developing multiplatform apps: Flutter supports the deployment of apps that are compiled and run fully natively, React Native runs interpreted JavaScript code combined with native views for different platforms, and Ionic is based on web apps, which means that it does not depend on platform-specific details. We measure the energy consumption, execution time, and memory usage of ten optimized, CPU-intensive benchmarks, to gauge overhead in a controlled manner, and two applications, to measure their impact when running commonly mobile app functionalities. Our results show that cross-platform and hybrid frameworks can be competitive in CPU-intensive applications. In five of the ten benchmarks, at least one framework-based version exhibits lower energy consumption and execution time than its native counterpart, up to a reduction of 81% in energy and 83% in execution time. Furthermore, in three other benchmarks, framework-based and native versions achieved similar results. Overall, Flutter, usually imposes the least overhead in execution time and energy, while React Native imposes the highest in all the benchmarks. However, in an app that continuously animates multiple images on the screen, without interaction, the React Native version uses the least CPU and energy, up to a reduction of 96% in energy compared to the second-best framework-based version. These findings highlight the importance of analyzing expected application behavior before committing to a specific framework.

移动应用开发框架降低了跨不同执行平台编写和部署应用的工作量。同时，它们的使用可能会限制本机优化并增加开销，从而增加资源使用。在本文中，我们分析了基于三种移动应用开发框架(Flutter, React Native和Ionic)的Android基准测试和应用程序的资源使用情况，并将它们与用Java编写的功能等效的本地变体进行了比较。这些框架除了被广泛使用之外，还代表了开发多平台应用程序的三种不同方法:Flutter支持部署完全本地编译和运行的应用程序，React Native运行与不同平台的本地视图相结合的解释JavaScript代码，Ionic基于web应用程序，这意味着它不依赖于特定于平台的细节。我们测量了十个优化的cpu密集型基准测试的能耗、执行时间和内存使用情况，以可控的方式衡量开销;我们还测量了两个应用程序，以衡量它们在运行常用移动应用程序功能时的影响。我们的结果表明，跨平台和混合框架在cpu密集型应用程序中具有竞争力。在十个基准测试中的五个中，至少有一个基于框架的版本显示出比其原生版本更低的能耗和执行时间，最多可减少81%的能耗和83%的执行时间。此外，在其他三个基准测试中，基于框架的版本和本机版本取得了类似的结果。总的来说，Flutter通常在执行时间和精力上的开销最少，而React Native在所有基准测试中施加的开销最高。然而，在一个在屏幕上连续动画多个图像的应用程序中，没有交互，React Native版本使用最少的CPU和能量，与第二好的基于框架的版本相比，最多减少96%的能量。这些发现强调了在使用特定框架之前分析预期应用程序行为的重要性。

{"title":"Analyzing the Resource Usage Overhead of Mobile App Development Frameworks","authors":"Wellington Oliveira, Bernardo Moraes, Fernando Castor, J. Fernandes","doi":"10.1145/3593434.3593487","DOIUrl":"https://doi.org/10.1145/3593434.3593487","url":null,"abstract":"Mobile app development frameworks lower the effort to write and deploy apps across different execution platforms. At the same time, their use may limit native optimizations and impose overhead, increasing resource usage. In this paper, we analyze the resource usage of Android benchmarks and apps based on three mobile app development frameworks, Flutter, React Native, and Ionic, comparing them to functionally equivalent, native variants written in Java. These frameworks, besides being in widespread use, represent three different approaches for developing multiplatform apps: Flutter supports the deployment of apps that are compiled and run fully natively, React Native runs interpreted JavaScript code combined with native views for different platforms, and Ionic is based on web apps, which means that it does not depend on platform-specific details. We measure the energy consumption, execution time, and memory usage of ten optimized, CPU-intensive benchmarks, to gauge overhead in a controlled manner, and two applications, to measure their impact when running commonly mobile app functionalities. Our results show that cross-platform and hybrid frameworks can be competitive in CPU-intensive applications. In five of the ten benchmarks, at least one framework-based version exhibits lower energy consumption and execution time than its native counterpart, up to a reduction of 81% in energy and 83% in execution time. Furthermore, in three other benchmarks, framework-based and native versions achieved similar results. Overall, Flutter, usually imposes the least overhead in execution time and energy, while React Native imposes the highest in all the benchmarks. However, in an app that continuously animates multiple images on the screen, without interaction, the React Native version uses the least CPU and energy, up to a reduction of 96% in energy compared to the second-best framework-based version. These findings highlight the importance of analyzing expected application behavior before committing to a specific framework.","PeriodicalId":178596,"journal":{"name":"Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115270511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2