ACM Transactions on Software Engineering and Methodology最新文献

Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning 通过图像嵌入和强化学习进行有效、独立于平台的图形用户界面测试

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-21 DOI: 10.1145/3674728

Shengcheng Yu, Chunrong Fang, Xin Li, Yuchen Ling, Zhenyu Chen, Zhendong Su

Software applications (apps) have been playing an increasingly important role in various aspects of society. In particular, mobile apps and web apps are the most prevalent among all applications and are widely used in various industries as well as in people’s daily lives. To help ensure mobile and web app quality, many approaches have been introduced to improve app GUI testing via automated exploration, including random testing, model-based testing, learning-based testing, etc. Despite the extensive effort, existing approaches are still limited in reaching high code coverage, constructing high-quality models, and being generally applicable. Reinforcement learning-based approaches, as a group of representative and advanced approaches for automated GUI exploration testing, are faced with difficult challenges, including effective app state abstraction, reward function design, etc. Moreover, they heavily depend on the specific execution platforms (i.e., Android or Web), thus leading to poor generalizability and being unable to adapt to different platforms.

This work specifically tackles these challenges based on the high-level observation that apps from distinct platforms share commonalities in GUI design. Indeed, we propose PIRLT_EST, an effective platform-independent approach for app testing. Specifically, PIRLT_EST utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. It extracts the GUI widgets from GUI pages and characterizes the corresponding GUI layouts, embedding the GUI pages as states. The app GUI state combines the macroscopic perspective (app GUI layout) and the microscopic perspective (app GUI widget), and attaches the critical semantic information from GUI images. This enables PIRLT_EST to be platform-independent and makes the testing approach generally applicable on different platforms. PIRLT_EST explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs to encourage more exploration in uncovered pages without platform dependency. The exploration will be assigned with rewards for all actions, which are designed considering both the app GUI states and the concrete widgets, to help the framework explore more uncovered pages. We conduct an empirical study on 20 mobile apps and 5 web apps, and the results show that PIRLT_EST is zero-cost when being adapted to different platforms, and can perform better than the baselines, covering 6.3–41.4% more code on mobile apps and 1.5–51.1% more code on web apps. PIRLT_EST is capable of detecting 128 unique bugs on mobile and web apps, including 100 bugs that cannot be detected by the baselines.

软件应用程序（应用程序）在社会的各个方面发挥着越来越重要的作用。其中，移动应用程序和网络应用程序在所有应用程序中最为普遍，被广泛应用于各行各业和人们的日常生活中。为了帮助确保移动应用程序和网络应用程序的质量，人们引入了许多方法来通过自动探索改进应用程序图形用户界面测试，包括随机测试、基于模型的测试、基于学习的测试等。尽管付出了大量努力，但现有方法在实现高代码覆盖率、构建高质量模型和普遍适用性方面仍有局限。基于强化学习的方法作为图形用户界面自动探索测试的一组具有代表性的先进方法，面临着有效的应用程序状态抽象、奖励函数设计等难题。此外，这些方法严重依赖于特定的执行平台（如 Android 或 Web），因此导致普适性差，无法适应不同的平台。这项工作基于对不同平台的应用程序在图形用户界面设计方面具有共性的高层次观察，专门应对这些挑战。事实上，我们提出的 PIRLTEST 是一种独立于平台的有效应用程序测试方法。具体来说，PIRLTEST 以一种新颖、协同的方式利用计算机视觉和强化学习技术进行自动测试。它从图形用户界面页面中提取图形用户界面部件，并描述相应的图形用户界面布局，将图形用户界面页面嵌入为状态。应用程序图形用户界面状态结合了宏观视角（应用程序图形用户界面布局）和微观视角（应用程序图形用户界面部件），并附加了图形用户界面图像的关键语义信息。这使得 PIRLTEST 与平台无关，并使测试方法普遍适用于不同平台。PIRLTEST 在好奇心驱动策略的指导下探索应用程序，该策略使用 Q 网络来估算特定状态-动作对的值，以鼓励在不依赖平台的情况下对未覆盖页面进行更多探索。探索过程中的所有操作都将获得奖励，奖励的设计同时考虑了应用程序图形用户界面的状态和具体的小部件，以帮助框架探索更多未覆盖的页面。我们在 20 个移动应用程序和 5 个网络应用程序上进行了实证研究，结果表明，PIRLTEST 在适应不同平台时是零成本的，而且性能比基线更好，在移动应用程序上覆盖的代码比基线多 6.3-41.4%，在网络应用程序上覆盖的代码比基线多 1.5-51.1%。PIRLTEST 能够在移动和网络应用程序中检测出 128 个独特的错误，其中包括 100 个基线无法检测到的错误。

{"title":"Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning","authors":"Shengcheng Yu, Chunrong Fang, Xin Li, Yuchen Ling, Zhenyu Chen, Zhendong Su","doi":"10.1145/3674728","DOIUrl":"https://doi.org/10.1145/3674728","url":null,"abstract":"Software applications (apps) have been playing an increasingly important role in various aspects of society. In particular, mobile apps and web apps are the most prevalent among all applications and are widely used in various industries as well as in people’s daily lives. To help ensure mobile and web app quality, many approaches have been introduced to improve app GUI testing via automated exploration, including random testing, model-based testing, learning-based testing, etc. Despite the extensive effort, existing approaches are still limited in reaching high code coverage, constructing high-quality models, and being generally applicable. Reinforcement learning-based approaches, as a group of representative and advanced approaches for automated GUI exploration testing, are faced with difficult challenges, including effective app state abstraction, reward function design, etc. Moreover, they heavily depend on the specific execution platforms (i.e., Android or Web), thus leading to poor generalizability and being unable to adapt to different platforms.This work specifically tackles these challenges based on the high-level observation that apps from distinct platforms share commonalities in GUI design. Indeed, we propose PIRLTEST, an effective platform-independent approach for app testing. Specifically, PIRLTEST utilizes computer vision and reinforcement learning techniques in a novel, synergistic manner for automated testing. It extracts the GUI widgets from GUI pages and characterizes the corresponding GUI layouts, embedding the GUI pages as states. The app GUI state combines the macroscopic perspective (app GUI layout) and the microscopic perspective (app GUI widget), and attaches the critical semantic information from GUI images. This enables PIRLTEST to be platform-independent and makes the testing approach generally applicable on different platforms. PIRLTEST explores apps with the guidance of a curiosity-driven strategy, which uses a Q-network to estimate the values of specific state-action pairs to encourage more exploration in uncovered pages without platform dependency. The exploration will be assigned with rewards for all actions, which are designed considering both the app GUI states and the concrete widgets, to help the framework explore more uncovered pages. We conduct an empirical study on 20 mobile apps and 5 web apps, and the results show that PIRLTEST is zero-cost when being adapted to different platforms, and can perform better than the baselines, covering 6.3–41.4% more code on mobile apps and 1.5–51.1% more code on web apps. PIRLTEST is capable of detecting 128 unique bugs on mobile and web apps, including 100 bugs that cannot be detected by the baselines.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"51 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bitmap-Based Security Monitoring for Deeply Embedded Systems 基于位图的深度嵌入式系统安全监测

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-18 DOI: 10.1145/3672460

Anni Peng, Dongliang Fang, Le Guan, Erik van der Kouwe, Yin Li, Wenwen Wang, Limin Sun, Yuqing Zhang

Deeply embedded systems powered by microcontrollers are becoming popular with the emergence of Internet of Things (IoT) technology. However, these devices primarily run C/C++ code and are susceptible to memory bugs, which can potentially lead to both control data attacks and non-control data attacks. Existing defense mechanisms (such as control flow integrity (CFI), data flow integrity (DFI) and write integrity testing (WIT), etc.) consume a massive amount of resources, making them less practical in real products. To make it lightweight, we design a bitmap-based allowlist mechanism to unify the storage of the runtime data for protecting both control data and non-control data. The memory requirements are constant and small, regardless of the number of deployed defense mechanisms. We store the allowlist in the TrustZone to ensure its integrity and confidentiality. Meanwhile, we perform an offline analysis to detect potential collisions and make corresponding adjustments when if happens. We have implemented our idea on an ARM Cortex-M based development board. Our evaluation results show a substantial reduction in memory consumption when deploying the proposed CFI and DFI mechanisms, without compromising runtime performance. Specifically, our prototype enforces CFI and DFI at a cost of just 2.09% performance overhead and 32.56% memory overhead on average.

随着物联网（IoT）技术的出现，由微控制器驱动的深度嵌入式系统正变得越来越流行。然而，这些设备主要运行 C/C++ 代码，容易受到内存漏洞的影响，从而可能导致控制数据攻击和非控制数据攻击。现有的防御机制（如控制流完整性（CFI）、数据流完整性（DFI）和写完整性测试（WIT）等）需要消耗大量资源，在实际产品中实用性较差。为了实现轻量化，我们设计了一种基于位图的允许列表机制，统一存储运行时数据，以保护控制数据和非控制数据。无论部署了多少防御机制，内存需求都是恒定且较小的。我们将允许列表存储在信任区（TrustZone）中，以确保其完整性和保密性。同时，我们进行离线分析，检测潜在的碰撞，并在发生碰撞时做出相应调整。我们在基于 ARM Cortex-M 的开发板上实现了我们的想法。我们的评估结果表明，在不影响运行时性能的情况下，部署所提出的 CFI 和 DFI 机制可大幅减少内存消耗。具体来说，我们的原型执行 CFI 和 DFI 时，平均性能开销仅为 2.09%，内存开销为 32.56%。

{"title":"Bitmap-Based Security Monitoring for Deeply Embedded Systems","authors":"Anni Peng, Dongliang Fang, Le Guan, Erik van der Kouwe, Yin Li, Wenwen Wang, Limin Sun, Yuqing Zhang","doi":"10.1145/3672460","DOIUrl":"https://doi.org/10.1145/3672460","url":null,"abstract":"Deeply embedded systems powered by microcontrollers are becoming popular with the emergence of Internet of Things (IoT) technology. However, these devices primarily run C/C++ code and are susceptible to memory bugs, which can potentially lead to both control data attacks and non-control data attacks. Existing defense mechanisms (such as control flow integrity (CFI), data flow integrity (DFI) and write integrity testing (WIT), etc.) consume a massive amount of resources, making them less practical in real products. To make it lightweight, we design a bitmap-based allowlist mechanism to unify the storage of the runtime data for protecting both control data and non-control data. The memory requirements are constant and small, regardless of the number of deployed defense mechanisms. We store the allowlist in the TrustZone to ensure its integrity and confidentiality. Meanwhile, we perform an offline analysis to detect potential collisions and make corresponding adjustments when if happens. We have implemented our idea on an ARM Cortex-M based development board. Our evaluation results show a substantial reduction in memory consumption when deploying the proposed CFI and DFI mechanisms, without compromising runtime performance. Specifically, our prototype enforces CFI and DFI at a cost of just 2.09% performance overhead and 32.56% memory overhead on average.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"19 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harmonising Contributions: Exploring Diversity in Software Engineering through CQA Mining on Stack Overflow 协调贡献：通过 Stack Overflow 上的 CQA 挖掘探索软件工程的多样性

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-18 DOI: 10.1145/3672453

Elijah Zolduoarrati, Sherlock A. Licorish, Nigel Stanger

The need for collective intelligence in technology means that online Q&A platforms, such as Stack Overflow and Reddit, have become invaluable in building the global knowledge ecosystem. Despite literature demonstrating a prevalence of inclusion and contribution disparities in online communities, studies investigating the underlying reasons behind such fluctuations remain scarce. The current study examines Stack Overflow users’ contribution profiles, both in isolation and relative to various diversity metrics, including GDP and access to electricity. This study also examines whether such profiles propagate to the city and state levels, supplemented by granular data such as per capita income and education, before validating quantitative findings using content analysis. We selected 143 countries and compared the profiles of their respective users to assess implicit diversity-related complications that impact how users contribute. Results show that countries with high GDP, prominent R&D presence, less wealth inequality, and sufficient access to infrastructure tend to have more users, regardless of their development status. Similarly, cities and states where technology is more prevalent (e.g., San Francisco and New York) have more users who tend to contribute more often. Qualitative analysis reveals distinct communication styles based on users’ locations. Urban users exhibited assertive, solution-oriented behaviour, actively sharing information. Conversely, rural users engaged through inquiries and discussions, incorporating personal anecdotes, gratitude, and conciliatory language. Findings from this study may benefit scholars and practitioners, allowing them to develop sustainable mechanisms to bridge the inclusion and diversity gaps.

技术领域对集体智慧的需求意味着，在线问答平台（如 Stack Overflow 和 Reddit）已成为构建全球知识生态系统的无价之宝。尽管有文献表明在线社区中普遍存在包容度和贡献度的差异，但调查这种波动背后深层原因的研究仍然很少。本研究考察了 Stack Overflow 用户的贡献情况，既包括单独的贡献情况，也包括相对于各种多样性指标（包括 GDP 和用电情况）的贡献情况。在使用内容分析法验证定量分析结果之前，本研究还通过人均收入和教育程度等细粒度数据，研究了这些特征是否会传播到城市和州一级。我们选择了 143 个国家，比较了这些国家各自用户的情况，以评估影响用户贡献方式的隐含多样性相关复杂因素。结果表明，国内生产总值高、研发实力雄厚、贫富差距较小、基础设施充足的国家往往拥有更多用户，无论其发展状况如何。同样，技术更为普及的城市和州（如旧金山和纽约）也拥有更多的用户，他们往往会更频繁地做出贡献。定性分析揭示了不同地区用户的不同交流风格。城市用户表现出自信、以解决问题为导向的行为，积极分享信息。与此相反，农村用户则通过询问和讨论的方式进行交流，并融入个人轶事、感激之情以及和解性语言。这项研究的结果可能会对学者和从业人员有所裨益，使他们能够制定可持续的机制，弥合包容性和多样性方面的差距。

{"title":"Harmonising Contributions: Exploring Diversity in Software Engineering through CQA Mining on Stack Overflow","authors":"Elijah Zolduoarrati, Sherlock A. Licorish, Nigel Stanger","doi":"10.1145/3672453","DOIUrl":"https://doi.org/10.1145/3672453","url":null,"abstract":"The need for collective intelligence in technology means that online Q&A platforms, such as Stack Overflow and Reddit, have become invaluable in building the global knowledge ecosystem. Despite literature demonstrating a prevalence of inclusion and contribution disparities in online communities, studies investigating the underlying reasons behind such fluctuations remain scarce. The current study examines Stack Overflow users’ contribution profiles, both in isolation and relative to various diversity metrics, including GDP and access to electricity. This study also examines whether such profiles propagate to the city and state levels, supplemented by granular data such as per capita income and education, before validating quantitative findings using content analysis. We selected 143 countries and compared the profiles of their respective users to assess implicit diversity-related complications that impact how users contribute. Results show that countries with high GDP, prominent R&D presence, less wealth inequality, and sufficient access to infrastructure tend to have more users, regardless of their development status. Similarly, cities and states where technology is more prevalent (e.g., San Francisco and New York) have more users who tend to contribute more often. Qualitative analysis reveals distinct communication styles based on users’ locations. Urban users exhibited assertive, solution-oriented behaviour, actively sharing information. Conversely, rural users engaged through inquiries and discussions, incorporating personal anecdotes, gratitude, and conciliatory language. Findings from this study may benefit scholars and practitioners, allowing them to develop sustainable mechanisms to bridge the inclusion and diversity gaps.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141502748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GIST: Generated Inputs Sets Transferability in Deep Learning GIST：深度学习中的生成输入集可移植性

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-13 DOI: 10.1145/3672457

Florian Tambon, Foutse Khomh, Giuliano Antoniol

To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed.

When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models.

This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.

为了提高深度神经网络（DNN）的可验证性和可测试性，目前正在开发越来越多的测试用例生成技术方法。然而，用户需要对每种技术和每个被测 DNN 模型进行测试，成本可能会很高。因此，范式的转变可以使测试过程受益：我们可以从现有的 DNN 模型中转移测试集，而不是为每个被测 DNN 模型独立地重新生成测试集。本文介绍了 GIST（生成输入集可转移性），这是一种高效转移测试集的新方法。给定用户选择的属性（如覆盖的神经元、故障），GIST 可以从该属性的角度在可用测试集中选择好的测试集。这样，用户就能在转移的测试集上恢复与使用测试用例生成技术从头生成测试集时相似的属性。实验结果表明，GIST 可以针对给定属性选择有效的测试集进行转移。此外，在被测 DNN 模型上，GIST 比从头开始重新应用测试用例生成技术具有更好的扩展性。

{"title":"GIST: Generated Inputs Sets Transferability in Deep Learning","authors":"Florian Tambon, Foutse Khomh, Giuliano Antoniol","doi":"10.1145/3672457","DOIUrl":"https://doi.org/10.1145/3672457","url":null,"abstract":"To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed.When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models.This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"131 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Empirical Study on the Characteristics of Database Access Bugs in Java Applications 关于 Java 应用程序中数据库访问错误特点的实证研究

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-13 DOI: 10.1145/3672449

Wei Liu, Shouvick Mondal, Tse-Hsun (Peter) Chen

Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.

数据库支持的应用程序依靠数据库访问代码与底层数据库管理系统（DBMS）进行交互。尽管之前的许多研究都针对数据库访问问题，如 SQL 反模式或 SQL 代码气味，但缺乏对数据库支持应用程序维护过程中数据库访问错误的研究。在本文中，我们对从七个使用关系数据库管理系统（如 MySQL 或 PostgreSQL）的大型 Java 开源应用程序中收集到的 423 个数据库访问错误进行了实证研究。我们通过人工检查错误报告和提交历史记录来研究错误的特征（如发生率和根本原因）。我们发现，报告的数据库和非数据库访问错误的数量有相似的趋势，但它们在错误修复提交中修改的文件却不同。此外，我们对数据库访问错误的根本原因进行了归纳分类，其中包括五大类（SQL 查询、模式、API、配置、SQL 查询结果）和 25 个独特的根本原因。我们发现，在所有研究的应用程序中，与 SQL 查询、模式和 API 相关的错误占数据库访问错误的 84.2%。其中，SQL 查询错误（54%）和 API 错误（38.7%）分别是使用 JDBC 和 Hibernate 时最常见的问题。最后，我们就研究结果对开发人员和研究人员的影响进行了讨论。

{"title":"An Empirical Study on the Characteristics of Database Access Bugs in Java Applications","authors":"Wei Liu, Shouvick Mondal, Tse-Hsun (Peter) Chen","doi":"10.1145/3672449","DOIUrl":"https://doi.org/10.1145/3672449","url":null,"abstract":"Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"264 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-planning Code Generation with Large Language Models 利用大型语言模型生成自规划代码

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-13 DOI: 10.1145/3672456

Xue Jiang, Yihong Dong, Lecheng Wang, Fang Zheng, Qiwei Shang, Ge Li, Zhi Jin, Wenpin Jiao

Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-planning code generation approach with large language models, which consists of two phases, namely planning phase and implementation phase. Specifically, in the planning phase, LLM plans out concise solution steps from the intent combined with few-shot prompting. Subsequently, in the implementation phase, the model generates code step by step, guided by the preceding solution steps. We conduct extensive experiments on various code-generation benchmarks across multiple programming languages. Experimental results show that self-planning code generation achieves a relative improvement of up to 25.4% in Pass@1 compared to direct code generation, and up to 11.9% compared to Chain-of-Thought of code generation. Moreover, our self-planning approach also enhances the quality of the generated code with respect to correctness, readability, and robustness, as assessed by humans.

尽管大型语言模型（LLMs）在代码生成方面已经表现出了令人印象深刻的能力，但它们在处理人类提供的复杂意图方面仍然举步维艰。人们普遍认为，人类通常会使用规划来分解复杂问题，并在执行之前安排解决方案步骤。为此，我们将规划引入代码生成，帮助模型理解复杂的意图，降低解决问题的难度。本文提出了一种使用大型语言模型的自规划代码生成方法，该方法包括两个阶段，即规划阶段和实施阶段。具体来说，在规划阶段，LLM 根据意图结合少量提示规划出简明的解决步骤。随后，在执行阶段，该模型会在前一个解决方案步骤的指导下逐步生成代码。我们在多种编程语言的各种代码生成基准上进行了广泛的实验。实验结果表明，与直接代码生成相比，自我规划代码生成在 Pass@1 中实现了高达 25.4% 的相对改进，与 Chain-of-Thought 代码生成相比，实现了高达 11.9% 的相对改进。此外，我们的自我规划方法还在正确性、可读性和鲁棒性方面提高了生成代码的质量（由人类进行评估）。

{"title":"Self-planning Code Generation with Large Language Models","authors":"Xue Jiang, Yihong Dong, Lecheng Wang, Fang Zheng, Qiwei Shang, Ge Li, Zhi Jin, Wenpin Jiao","doi":"10.1145/3672456","DOIUrl":"https://doi.org/10.1145/3672456","url":null,"abstract":"Although large language models (LLMs) have demonstrated impressive ability in code generation, they are still struggling to address the complicated intent provided by humans. It is widely acknowledged that humans typically employ planning to decompose complex problems and schedule solution steps prior to implementation. To this end, we introduce planning into code generation to help the model understand complex intent and reduce the difficulty of problem-solving. This paper proposes a self-planning code generation approach with large language models, which consists of two phases, namely planning phase and implementation phase. Specifically, in the planning phase, LLM plans out concise solution steps from the intent combined with few-shot prompting. Subsequently, in the implementation phase, the model generates code step by step, guided by the preceding solution steps. We conduct extensive experiments on various code-generation benchmarks across multiple programming languages. Experimental results show that self-planning code generation achieves a relative improvement of up to 25.4% in Pass@1 compared to direct code generation, and up to 11.9% compared to Chain-of-Thought of code generation. Moreover, our self-planning approach also enhances the quality of the generated code with respect to correctness, readability, and robustness, as assessed by humans.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"18 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neuron Sensitivity Guided Test Case Selection 神经元灵敏度指导测试用例选择

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-12 DOI: 10.1145/3672454

Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Xiaofei Xie, Junjie Chen, Heming Cui

Deep Neural Networks (DNNs) have been widely deployed in software to address various tasks (e.g., autonomous driving, medical diagnosis). However, they can also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal and repair incorrect behaviors in DNNs, developers often collect rich, unlabeled datasets from the natural world and label them to test DNN models. However, properly labeling a large number of datasets is a highly expensive and time-consuming task.

To address the above-mentioned problem, we propose NSS, Neuron Sensitivity Guided Test Case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the information of the internal neuron induced by the test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluated NSS with four widely used datasets and four well-designed DNN models compared to the state-of-the-art (SOTA) baseline methods. The results show that NSS performs well in assessing the probability of failure triggering in test cases and in the improvement capabilities of the model. Specifically, compared to the baseline approaches, NSS achieves a higher fault detection rate (e.g., when selecting 5% of the test cases from the unlabeled dataset in the MNIST&LeNet1 experiment, NSS can obtain an 81.8% fault detection rate, which is a 20% increase compared with SOTA baseline strategies).

深度神经网络（DNN）已被广泛应用于软件中，以解决各种任务（如自动驾驶、医疗诊断）。然而，它们也可能产生错误行为，导致经济损失，甚至威胁人类安全。为了揭示和修复 DNN 中的不正确行为，开发人员通常会从自然世界中收集丰富的未标记数据集，并对其进行标记，以测试 DNN 模型。为了解决上述问题，我们提出了神经元灵敏度指导测试用例选择（NSS，Neuron Sensitivity Guided Test Case Selection），它可以从未标明的数据集中选择有价值的测试用例，从而缩短标注时间。NSS 利用测试用例诱导的内部神经元信息来选择有价值的测试用例，这些测试用例在导致模型出现错误行为方面具有很高的可信度。我们使用四个广泛使用的数据集和四个精心设计的 DNN 模型对 NSS 进行了评估，并与最先进的（SOTA）基线方法进行了比较。结果表明，NSS 在评估测试用例中触发故障的概率和模型改进能力方面表现出色。具体而言，与基线方法相比，NSS 实现了更高的故障检测率（例如，在 MNIST&LeNet1 实验中，从未标明数据集中选择 5% 的测试用例时，NSS 可以获得 81.8% 的故障检测率，与 SOTA 基线策略相比提高了 20%）。

{"title":"Neuron Sensitivity Guided Test Case Selection","authors":"Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Xiaofei Xie, Junjie Chen, Heming Cui","doi":"10.1145/3672454","DOIUrl":"https://doi.org/10.1145/3672454","url":null,"abstract":"Deep Neural Networks (DNNs) have been widely deployed in software to address various tasks (e.g., autonomous driving, medical diagnosis). However, they can also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal and repair incorrect behaviors in DNNs, developers often collect rich, unlabeled datasets from the natural world and label them to test DNN models. However, properly labeling a large number of datasets is a highly expensive and time-consuming task.To address the above-mentioned problem, we propose NSS, Neuron Sensitivity Guided Test Case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the information of the internal neuron induced by the test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluated NSS with four widely used datasets and four well-designed DNN models compared to the state-of-the-art (SOTA) baseline methods. The results show that NSS performs well in assessing the probability of failure triggering in test cases and in the improvement capabilities of the model. Specifically, compared to the baseline approaches, NSS achieves a higher fault detection rate (e.g., when selecting 5% of the test cases from the unlabeled dataset in the MNIST&LeNet1 experiment, NSS can obtain an 81.8% fault detection rate, which is a 20% increase compared with SOTA baseline strategies).","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"34 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-collaboration Code Generation via ChatGPT 通过 ChatGPT 生成自我协作代码

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-12 DOI: 10.1145/3672459

Yihong Dong, Xue Jiang, Zhi Jin, Ge Li

Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLM agents act as distinct ‘experts’, each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other’s work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three LLM roles (i.e., analyst, coder, and tester) responsible for software development’s analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to the base LLM agent. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex repository-level tasks that are not readily solved by the single LLM agent.

尽管大型语言模型（LLMs）已经展示出了非凡的代码生成能力，但它们在处理复杂任务时仍然举步维艰。在现实世界的软件开发中，人类通常通过团队协作来处理复杂任务，这种策略能显著控制开发复杂性并提高软件质量。受此启发，我们以 ChatGPT 为例，提出了一个利用 LLM 生成代码的自我协作框架。具体来说，通过角色指示，1）多个 LLM 代理充当不同的 "专家"，各自负责复杂任务中的特定子任务；2）指定协作和交互方式，使不同角色组成一个虚拟团队，相互促进工作，最终使虚拟团队在无需人工干预的情况下协作完成代码生成任务。为了有效地组织和管理这个虚拟团队，我们将软件开发方法纳入了框架。因此，我们组建了一个由三个 LLM 角色（即分析员、编码员和测试员）组成的基本团队，负责软件开发的分析、编码和测试阶段。我们在各种代码生成基准上进行了综合实验。实验结果表明，与基本 LLM 代理相比，自我协作代码生成相对提高了 29.9%-47.1% 的 Pass@1。此外，我们还展示了自我协作可能使 LLM 高效地处理单个 LLM 代理无法轻松解决的复杂资源库级任务。

{"title":"Self-collaboration Code Generation via ChatGPT","authors":"Yihong Dong, Xue Jiang, Zhi Jin, Ge Li","doi":"10.1145/3672459","DOIUrl":"https://doi.org/10.1145/3672459","url":null,"abstract":"Although Large Language Models (LLMs) have demonstrated remarkable code-generation ability, they still struggle with complex tasks. In real-world software development, humans usually tackle complex tasks through collaborative teamwork, a strategy that significantly controls development complexity and enhances software quality. Inspired by this, we present a self-collaboration framework for code generation employing LLMs, exemplified by ChatGPT. Specifically, through role instructions, 1) Multiple LLM agents act as distinct ‘experts’, each responsible for a specific subtask within a complex task; 2) Specify the way to collaborate and interact, so that different roles form a virtual team to facilitate each other’s work, ultimately the virtual team addresses code generation tasks collaboratively without the need for human intervention. To effectively organize and manage this virtual team, we incorporate software-development methodology into the framework. Thus, we assemble an elementary team consisting of three LLM roles (i.e., analyst, coder, and tester) responsible for software development’s analysis, coding, and testing stages. We conduct comprehensive experiments on various code-generation benchmarks. Experimental results indicate that self-collaboration code generation relatively improves 29.9%-47.1% Pass@1 compared to the base LLM agent. Moreover, we showcase that self-collaboration could potentially enable LLMs to efficiently handle complex repository-level tasks that are not readily solved by the single LLM agent.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"196 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Code to Qed, the Project Manager's Guide to Proof Engineering 代码到 Qed，项目经理的证明工程指南

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-04 DOI: 10.1145/3664807

Nicolas Dejon, Chrystel Gaber, Gilles Grimaud, Narjes Jomaa

Despite growing efforts and encouraging successes in the last decades, fully formally-verified projects are still rare in the industrial landscape. The industry often lacks the tools and methodologies to efficiently scale the proof development process. In this work, we give a comprehensible overview of the proof development process for proof developers and project managers. The goal is to support proof developers by rationalizing the proof development process, which currently relies heavily on their intuition and expertise, and by facilitating communication with the management line. To this end, we concentrate on the aspect of proof manufacturing and highlight the most significant sources of proof effort. We propose means to mitigate the latter through proof practices (proof structuring, proof strategies, and proof planning), proof metrics, and tools. Our approach is project-agnostic, independent of specific proof expertise, and computed estimations do not assume prior similar developments. We evaluate our guidelines using a separation kernel undergoing formal verification, driving the proof process in an optimised way. Feedback from a project manager unfamiliar with proof development confirms the benefits of detailed planning of the proof development steps, clear progress communication to the hierarchy line, and alignment with established practices in the software industry.

尽管在过去的几十年里，人们做出了越来越多的努力，也取得了令人鼓舞的成功，但在工业领域，完全正式验证的项目仍然很少。业界往往缺乏有效扩展证明开发流程的工具和方法。在这项工作中，我们为证明开发人员和项目经理提供了一个可理解的证明开发流程概览。我们的目标是通过合理化证明开发流程（目前主要依赖于证明开发人员的直觉和专业知识）以及促进与管理层的沟通，为证明开发人员提供支持。为此，我们将重点放在证明制造方面，并强调证明工作的最主要来源。我们提出了通过论证实践（论证结构、论证策略和论证规划）、论证指标和工具来减少论证工作量的方法。我们的方法与项目无关，独立于具体的证明专业知识，计算出的估算值不假定先前的类似开发。我们使用正在进行形式验证的分离内核来评估我们的准则，以优化的方式推动证明过程。一位不熟悉证明开发的项目经理的反馈证实了详细规划证明开发步骤、与层级部门进行清晰的进度沟通以及与软件行业既定实践保持一致的好处。

{"title":"Code to Qed, the Project Manager's Guide to Proof Engineering","authors":"Nicolas Dejon, Chrystel Gaber, Gilles Grimaud, Narjes Jomaa","doi":"10.1145/3664807","DOIUrl":"https://doi.org/10.1145/3664807","url":null,"abstract":"Despite growing efforts and encouraging successes in the last decades, fully formally-verified projects are still rare in the industrial landscape. The industry often lacks the tools and methodologies to efficiently scale the proof development process. In this work, we give a comprehensible overview of the proof development process for proof developers and project managers. The goal is to support proof developers by rationalizing the proof development process, which currently relies heavily on their intuition and expertise, and by facilitating communication with the management line. To this end, we concentrate on the aspect of proof manufacturing and highlight the most significant sources of proof effort. We propose means to mitigate the latter through proof practices (proof structuring, proof strategies, and proof planning), proof metrics, and tools. Our approach is project-agnostic, independent of specific proof expertise, and computed estimations do not assume prior similar developments. We evaluate our guidelines using a separation kernel undergoing formal verification, driving the proof process in an optimised way. Feedback from a project manager unfamiliar with proof development confirms the benefits of detailed planning of the proof development steps, clear progress communication to the hierarchy line, and alignment with established practices in the software industry.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"25 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141255746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Technical Debt Monitoring Decision Making with Skin in the Game 技术债务监控决策过程中的切身利益

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-06-01 DOI: 10.1145/3664805

Suwichak Fungprasertkul, Rami Bahsoon, Rick Kazman

Technical Debt Management (TDM) can suffer from unpredictability, communication gaps and the inaccessibility of relevant information, which hamper the effectiveness of its decision making. These issues can stem from division among decision-makers which takes root in unfair consequences of decisions among different decision-makers. One mitigation route is Skin in the Game thinking, which enforces transparency, fairness and shared responsibility during collective decision-making under uncertainty. This paper illustrates characteristics which require Skin in the Game thinking in Technical Debt (TD) identification, measurement, prioritisation and monitoring. We point out crucial problems in TD monitoring rooted in asymmetric information and asymmetric payoff between different factions of decision-makers. A systematic TD monitoring method is presented to mitigate the said problems. The method leverages Replicator Dynamics and Behavioural Learning. The method supports decision-makers with automated TD monitoring decisions; it informs decision-makers when human interventions are required. Two publicly available industrial projects with a non-trivial number of TD and timestamps are utilised to evaluate the application of our method. Mann-Whitney U hypothesis tests are conducted on samples of decisions from our method and the baseline. The statistical evidence indicates that our method can produce cost-effective and contextual TD monitoring decisions.

技术债务管理（TDM）可能存在不可预测性、沟通障碍和无法获取相关信息等问题，这些问题阻碍了决策的有效性。这些问题可能源于决策者之间的分歧，这种分歧的根源在于不同决策者之间的决策后果不公平。一种缓解途径是 "游戏中的皮肤"（Skin in the Game）思维，它能在不确定情况下的集体决策过程中实现透明、公平和责任分担。本文阐述了在技术债务（TD）识别、衡量、优先排序和监控中需要 "游戏中的皮肤"（Skin in the Game）思维的特征。我们指出了技术债务监控中的关键问题，其根源在于不同决策者之间的信息不对称和回报不对称。为缓解上述问题，我们提出了一种系统的 TD 监控方法。该方法利用了复制器动力学和行为学习。该方法通过自动 TD 监测决策为决策者提供支持，并在需要人工干预时通知决策者。为了评估我们方法的应用情况，我们利用了两个公开的工业项目，这些项目具有数量不小的 TD 和时间戳。对我们的方法和基准的决策样本进行了 Mann-Whitney U 假设检验。统计结果表明，我们的方法可以产生具有成本效益且符合实际情况的 TD 监控决策。

{"title":"Technical Debt Monitoring Decision Making with Skin in the Game","authors":"Suwichak Fungprasertkul, Rami Bahsoon, Rick Kazman","doi":"10.1145/3664805","DOIUrl":"https://doi.org/10.1145/3664805","url":null,"abstract":"Technical Debt Management (TDM) can suffer from unpredictability, communication gaps and the inaccessibility of relevant information, which hamper the effectiveness of its decision making. These issues can stem from division among decision-makers which takes root in unfair consequences of decisions among different decision-makers. One mitigation route is Skin in the Game thinking, which enforces transparency, fairness and shared responsibility during collective decision-making under uncertainty. This paper illustrates characteristics which require Skin in the Game thinking in Technical Debt (TD) identification, measurement, prioritisation and monitoring. We point out crucial problems in TD monitoring rooted in asymmetric information and asymmetric payoff between different factions of decision-makers. A systematic TD monitoring method is presented to mitigate the said problems. The method leverages Replicator Dynamics and Behavioural Learning. The method supports decision-makers with automated TD monitoring decisions; it informs decision-makers when human interventions are required. Two publicly available industrial projects with a non-trivial number of TD and timestamps are utilised to evaluate the application of our method. Mann-Whitney U hypothesis tests are conducted on samples of decisions from our method and the baseline. The statistical evidence indicates that our method can produce cost-effective and contextual TD monitoring decisions.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"33 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141196672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0