IEEE Transactions on Software Engineering最新文献_第9页

ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance ChatAssert：利用外部工具辅助基于 LLM 的测试 Oracle 生成

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-16 DOI: 10.1109/TSE.2024.3519159

Ishrak Hayet;Adam Scott;Marcelo d'Amorim

Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the state-of-the-art technique teco is only 27.5% on its dataset including 3,540 test cases. We propose ChatAssert, a prompt engineering framework designed for oracle generation that uses dynamic and static information to iteratively refine prompts for querying large language models (LLMs). ChatAssert uses code summaries and examples to assist an LLM in generating candidate test oracles, uses a lightweight static analysis to assist the LLM in repairing generated oracles that fail to compile, and uses dynamic information obtained from test runs to help the LLM in repairing oracles that compile but do not pass. Experimental results using an independent publicly-available dataset show that ChatAssert improves the state-of-the-art technique, teco, on key evaluation metrics. For example, it improves Acc@1 by 15%. Overall, results provide initial yet strong evidence that using external tools in the formulation of prompts is an important aid in LLM-based oracle generation.

测试oracle生成是一个重要而富有挑战性的问题。基于神经的解决方案最近被提出用于oracle生成，但它们仍然不准确。例如，最先进的技术teco在其包含3540个测试用例的数据集上的准确性仅为27.5%。我们提出了ChatAssert，这是一个为oracle生成设计的提示工程框架，它使用动态和静态信息来迭代地优化查询大型语言模型（llm）的提示。ChatAssert使用代码摘要和示例来帮助LLM生成候选测试oracle，使用轻量级静态分析来帮助LLM修复生成的无法编译的oracle，并使用从测试运行中获得的动态信息来帮助LLM修复编译但未通过的oracle。使用独立的公开数据集的实验结果表明，ChatAssert在关键评估指标上提高了最先进的技术。例如，它提高了Acc@1 15%。总的来说，结果提供了初步但强有力的证据，表明在制定提示时使用外部工具是基于llm的oracle生成的重要辅助。

{"title":"ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance","authors":"Ishrak Hayet;Adam Scott;Marcelo d'Amorim","doi":"10.1109/TSE.2024.3519159","DOIUrl":"10.1109/TSE.2024.3519159","url":null,"abstract":"Test oracle generation is an important and challenging problem. Neural-based solutions have been recently proposed for oracle generation but they are still inaccurate. For example, the accuracy of the state-of-the-art technique <sc>teco is only 27.5% on its dataset including 3,540 test cases. We propose <sc>ChatAssert, a prompt engineering framework designed for oracle generation that uses dynamic and static information to iteratively refine prompts for querying large language models (LLMs). <sc>ChatAssert uses code summaries and examples to assist an LLM in generating candidate test oracles, uses a lightweight static analysis to assist the LLM in repairing generated oracles that fail to compile, and uses dynamic information obtained from test runs to help the LLM in repairing oracles that compile but do not pass. Experimental results using an independent publicly-available dataset show that <sc>ChatAssert improves the state-of-the-art technique, <sc>teco, on key evaluation metrics. For example, it improves <italic>Acc@1 by 15%. Overall, results provide initial yet strong evidence that using external tools in the formulation of prompts is an important aid in LLM-based oracle generation.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"305-319"},"PeriodicalIF":6.5,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration 通过图像和文本语义理解和特征集成增强众包测试报告优先级

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-12 DOI: 10.1109/TSE.2024.3516372

Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen

Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose EncrePrior, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. EncrePrior extracts distinctive features from crowdsourced test reports. For app screenshots, EncrePrior considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, EncrePrior considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that EncrePrior outperforms the representative baseline approach DeepPrior by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of EncrePrior.

众包测试在软件测试领域获得了突出的地位，因为它能够有效地解决手机应用测试中碎片化问题所带来的挑战。众包测试固有的开放性为测试结果带来了多样性。然而，它也给应用程序开发人员带来了检查大量测试报告的挑战。为了帮助应用开发者尽早发现众包测试报告中的bug，众包测试报告优先级排序作为一种有效的技术应运而生，通过建立系统的最优报告检测顺序。尽管如此，众包测试报告由应用截图和文本描述组成，但目前的优先级排序方法主要依赖于文本描述，有些人可能会在图像整体级别或小部件级别添加矢量化图像功能。它们在准确地描述众包测试报告的独特特征方面仍然缺乏精确性。在优先排序策略方面，主流方法采用简单的基于特征组合的优先排序，仅使用加权系数，没有充分考虑语义，可能导致结果有偏差和无效。在本文中，我们提出了一种通过图像和文本语义理解和特征集成的增强众包测试报告优先级方法EncrePrior。EncrePrior从众包测试报告中提取出独特的特征。对于应用截图，EncrePrior会考虑结构（即GUI布局）和内容（即GUI小部件），分别从宏观和微观的角度来查看应用截图。对于文本描述，EncrePrior将Bug描述和复制步骤视为Bug上下文。在优先级划分过程中，我们不会直接将特征与权重合并来指导优先级划分。相反，为了全面考虑语义，我们采用了优先级-重新优先级策略。这种做法通过考虑它们各自的等级，将不同的特征组合在一起。报告首先分别对四个特性进行优先级排序。然后，使用四个序列上的排名，结合应用程序截图和文本描述的功能，按字典顺序重新排列测试报告的优先级。实证研究结果表明，在不同的应用中，EncrePrior的准确率在2.99% ~ 63.64%之间，平均优于具有代表性的基线方法deepreor 15.61%，而新提出的特征和优先级策略都是EncrePrior表现优异的原因。

{"title":"Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration","authors":"Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen","doi":"10.1109/TSE.2024.3516372","DOIUrl":"10.1109/TSE.2024.3516372","url":null,"abstract":"Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose \u0000<sc>EncrePrior\u0000, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. \u0000<sc>EncrePrior\u0000 extracts distinctive features from crowdsourced test reports. For app screenshots, \u0000<sc>EncrePrior\u0000 considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, \u0000<sc>EncrePrior\u0000 considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that \u0000<sc>EncrePrior\u0000 outperforms the representative baseline approach \u0000<sc>DeepPrior\u0000 by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of \u0000<sc>EncrePrior\u0000.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"283-304"},"PeriodicalIF":6.5,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Compiler Error Recovery Defects via Program Mutation Exploration 通过程序突变探索检测编译器错误恢复缺陷

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-11 DOI: 10.1109/TSE.2024.3510912

Yixuan Tang;Jingxuan Zhang;Xiaochen Li;Zhiqiu Huang;He Jiang

Compiler error recovery diagnostics facilitates software development as it provides the possible causes and suggestions on potential programming errors. However, due to compiler bugs, error recovery diagnostics could be erroneous, spurious, missing, or even crashing for mature production compilers like GCC and Clang. Compiler testing is one of the most widely used ways of ensuring its quality. However, existing compiler diagnostics testing approaches (e.g., DIPROM) only consider the typically syntactically valid test programs as inputs, which are unlikely to trigger compiler error recovery defects. Therefore, in this paper, we propose the first mutation based approach for Compiler Error Recovery diagnostics Testing, called CERTest. Specifically, CERTest first explores the mutation space for a given seed program, and leverages a series of mutation configurations (which are referred as a series of mutators applying for a seed) to iteratively mutate the structures of the seed, so as to generate error-sensitive program variants for triggering compiler error recovery mechanisms. To effectively construct error-sensitive structures, CERTest then applies a novel furthest-first based selection approach to select a set of representative mutation configurations to generate program variants in each iteration. With the generated program variants, CERTest finally leverages differential testing to detect error recovery defects in different compilers. The experiments on GCC and Clang demonstrate that CERTest outperforms five state-of-the-art approaches (i.e., DIPROM, Ccoft, Clang-fuzzer, AFL++, and HiCOND) by up to 13.10%

$sim$

221.61% on average in the term of bug-finding capability, and CERTest detects 9 new error recovery defects, 5 of which have been confirmed or fixed by developers.

{"title":"Detecting Compiler Error Recovery Defects via Program Mutation Exploration","authors":"Yixuan Tang;Jingxuan Zhang;Xiaochen Li;Zhiqiu Huang;He Jiang","doi":"10.1109/TSE.2024.3510912","DOIUrl":"10.1109/TSE.2024.3510912","url":null,"abstract":"Compiler error recovery diagnostics facilitates software development as it provides the possible causes and suggestions on potential programming errors. However, due to compiler bugs, error recovery diagnostics could be erroneous, spurious, missing, or even crashing for mature production compilers like GCC and Clang. Compiler testing is one of the most widely used ways of ensuring its quality. However, existing compiler diagnostics testing approaches (e.g., DIPROM) only consider the typically syntactically valid test programs as inputs, which are unlikely to trigger compiler error recovery defects. Therefore, in this paper, we propose the first mutation based approach for Compiler Error Recovery diagnostics Testing, called CERTest. Specifically, CERTest first explores the mutation space for a given seed program, and leverages a series of mutation configurations (which are referred as a series of mutators applying for a seed) to iteratively mutate the structures of the seed, so as to generate error-sensitive program variants for triggering compiler error recovery mechanisms. To effectively construct error-sensitive structures, CERTest then applies a novel furthest-first based selection approach to select a set of representative mutation configurations to generate program variants in each iteration. With the generated program variants, CERTest finally leverages differential testing to detect error recovery defects in different compilers. The experiments on GCC and Clang demonstrate that CERTest outperforms five state-of-the-art approaches (i.e., DIPROM, Ccoft, Clang-fuzzer, AFL++, and HiCOND) by up to 13.10%<inline-formula><tex-math>$sim$</tex-math></inline-formula>221.61% on average in the term of bug-finding capability, and CERTest detects 9 new error recovery defects, 5 of which have been confirmed or fixed by developers.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"389-412"},"PeriodicalIF":6.5,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142809247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations 数据重采样对基于深度学习的日志异常检测的影响：见解和建议

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-09 DOI: 10.1109/TSE.2024.3513413

Xiaoxue Ma;Huiqi Zou;Pinjia He;Jacky Keung;Yishu Li;Xiao Yu;Federica Sarro

Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid sampling methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. These improvements are attributed to the increased attention given to important tokens. By exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. By addressing the challenge of class imbalance, researchers and practitioners can enhance DLLAD performance.

许多基于深度学习（DL）的方法在软件日志异常检测（LAD）中得到了关注，但训练数据中的类不平衡仍然是一个挑战，异常通常占雷鸟等数据集的不到1%。现有的DLLAD方法在严重不平衡的数据集中可能表现不佳。虽然数据重采样已被证明在其他软件工程任务中是有效的，但它尚未在LAD中进行探索。本研究旨在通过从两个不同的角度深入分析不同数据重采样方法对现有DLLAD方法的影响来填补这一空白。首先，我们评估了这些DLLAD方法在四个不同类别失衡水平的数据集上的性能，并探讨了正常数据与异常数据的重采样比对DLLAD方法的影响。其次，利用最优的正态数据和异常数据的重采样比，评估了数据重采样方法的有效性。我们的研究结果表明，过采样方法通常优于欠采样和混合采样方法。在原始数据上的数据重采样比在特征空间上的数据重采样效果更好。这些改进归功于对重要代币的更多关注。通过探索正常数据与异常数据的重采样比，我们建议通过过采样为少数类生成更多的数据，而通过欠采样从多数类中去除更少的数据。总之，我们的研究为数据重采样方法与DLLAD之间的复杂关系提供了有价值的见解。通过解决阶级不平衡的挑战，研究人员和实践者可以提高DLLAD的性能。

{"title":"On the Influence of Data Resampling for Deep Learning-Based Log Anomaly Detection: Insights and Recommendations","authors":"Xiaoxue Ma;Huiqi Zou;Pinjia He;Jacky Keung;Yishu Li;Xiao Yu;Federica Sarro","doi":"10.1109/TSE.2024.3513413","DOIUrl":"10.1109/TSE.2024.3513413","url":null,"abstract":"Numerous Deep Learning (DL)-based approaches have gained attention in software Log Anomaly Detection (LAD), yet class imbalance in training data remains a challenge, with anomalies often comprising less than 1% of datasets like Thunderbird. Existing DLLAD methods may underperform in severely imbalanced datasets. Although data resampling has proven effective in other software engineering tasks, it has not been explored in LAD. This study aims to fill this gap by providing an in-depth analysis of the impact of diverse data resampling methods on existing DLLAD approaches from two distinct perspectives. Firstly, we assess the performance of these DLLAD approaches across four datasets with different levels of class imbalance, and we explore the impact of resampling ratios of normal to abnormal data on DLLAD approaches. Secondly, we evaluate the effectiveness of the data resampling methods when utilizing optimal resampling ratios of normal to abnormal data. Our findings indicate that oversampling methods generally outperform undersampling and hybrid sampling methods. Data resampling on raw data yields superior results compared to data resampling in the feature space. These improvements are attributed to the increased attention given to important tokens. By exploring the resampling ratio of normal to abnormal data, we suggest generating more data for minority classes through oversampling while removing less data from majority classes through undersampling. In conclusion, our study provides valuable insights into the intricate relationship between data resampling methods and DLLAD. By addressing the challenge of class imbalance, researchers and practitioners can enhance DLLAD performance.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"243-261"},"PeriodicalIF":6.5,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FM-PRO: A Feature Modeling Process FM-PRO：一个特征建模过程

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-09 DOI: 10.1109/TSE.2024.3513635

Johan Martinson;Wardah Mahmood;Jude Gyimah;Thorsten Berger

Almost any software system needs to exist in multiple variants. While branching or forking—a.k.a. clone & own—are simple and inexpensive strategies, they do not scale well with the number of variants created. Software platforms—a.k.a. software product lines—scale and allow to derive variants by selecting the desired features in an automated, tool-supported process. However, product lines are difficult to adopt and to evolve, requiring mechanisms to manage features and their implementations in complex codebases. Such systems can easily have thousands of features with intricate dependencies. Feature models have arguably become the most popular notation to model and manage features, mainly due to their intuitive, tree-like representation. Introduced more than 30 years ago, thousands of techniques relying on feature models have been presented, including model configuration, synthesis, analysis, and evolution techniques. However, despite many success stories, organizations still struggle with adopting software product lines, limiting the usefulness of such techniques. Surprisingly, no modeling process exists to systematically create feature models, despite them being the main artifact of a product line. This challenges organizations, even hindering the adoption of product lines altogether. We present FM-PRO, a process to engineer feature models. It can be used with different adoption strategies for product lines, including creating one from scratch (pro-active adoption) and re-engineering one from existing cloned variants (extractive adoption). The resulting feature models can be used for configuration, planning, evolution, reasoning about variants, or keeping an overview understanding of complex software platforms. We systematically engineered the process based on empirically elicited modeling principles. We evaluated and refined it in a real-world industrial case study, two surveys with industrial and academic feature-modeling experts, as well as an open-source case study. We hope that FM-PRO helps to adopt feature models and that it facilitates higher-level, feature-oriented engineering practices, establishing features as a better and more abstract way to manage increasingly complex codebases.

几乎任何软件系统都需要以多种变体存在。当分支或分叉——也就是。克隆和自有——这是简单且廉价的策略，但它们并不能很好地扩展变体的数量。软件platforms-a.k.a。软件产品线——通过在自动化的、工具支持的过程中选择所需的特性，扩展并允许派生变体。然而，产品线很难采用和发展，需要在复杂的代码库中管理特性及其实现的机制。这样的系统可以很容易地拥有数千个具有复杂依赖关系的特性。特征模型已经成为建模和管理特征的最流行的表示法，主要是因为它们具有直观的树状表示法。30多年前，已经出现了数千种依赖于特征模型的技术，包括模型配置、综合、分析和进化技术。然而，尽管有许多成功的案例，组织仍然在努力采用软件产品线，限制了这些技术的有用性。令人惊讶的是，不存在系统化地创建特征模型的建模过程，尽管它们是产品线的主要工件。这给组织带来了挑战，甚至阻碍了产品线的采用。我们提出了FM-PRO，一个设计特征模型的过程。它可以用于产品线的不同采用策略，包括从头创建一个（主动采用）和从现有克隆变体重新设计一个（提取采用）。得到的特征模型可以用于配置、规划、演化、对变量的推理，或者保持对复杂软件平台的总体理解。我们根据经验得出的建模原则系统地设计了这个过程。我们在一个真实的工业案例研究中评估并改进了它，与工业和学术特征建模专家进行了两次调查，以及一个开源案例研究。我们希望FM-PRO能够帮助采用特征模型，并促进更高层次的、面向特征的工程实践，将特征作为一种更好、更抽象的方式来管理日益复杂的代码库。

{"title":"FM-PRO: A Feature Modeling Process","authors":"Johan Martinson;Wardah Mahmood;Jude Gyimah;Thorsten Berger","doi":"10.1109/TSE.2024.3513635","DOIUrl":"10.1109/TSE.2024.3513635","url":null,"abstract":"Almost any software system needs to exist in multiple variants. While branching or forking—a.k.a. clone & own—are simple and inexpensive strategies, they do not scale well with the number of variants created. Software platforms—a.k.a. software product lines—scale and allow to derive variants by selecting the desired features in an automated, tool-supported process. However, product lines are difficult to adopt and to evolve, requiring mechanisms to manage features and their implementations in complex codebases. Such systems can easily have thousands of features with intricate dependencies. Feature models have arguably become the most popular notation to model and manage features, mainly due to their intuitive, tree-like representation. Introduced more than 30 years ago, thousands of techniques relying on feature models have been presented, including model configuration, synthesis, analysis, and evolution techniques. However, despite many success stories, organizations still struggle with adopting software product lines, limiting the usefulness of such techniques. Surprisingly, no modeling process exists to systematically create feature models, despite them being the main artifact of a product line. This challenges organizations, even hindering the adoption of product lines altogether. We present FM-PRO, a process to engineer feature models. It can be used with different adoption strategies for product lines, including creating one from scratch (\u0000<italic>pro-active adoption\u0000) and re-engineering one from existing cloned variants (\u0000<italic>extractive adoption\u0000). The resulting feature models can be used for configuration, planning, evolution, reasoning about variants, or keeping an overview understanding of complex software platforms. We systematically engineered the process based on empirically elicited modeling principles. We evaluated and refined it in a real-world industrial case study, two surveys with industrial and academic feature-modeling experts, as well as an open-source case study. We hope that FM-PRO helps to adopt feature models and that it facilitates higher-level, feature-oriented engineering practices, establishing features as a better and more abstract way to manage increasingly complex codebases.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"262-282"},"PeriodicalIF":6.5,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142796778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MoCo: Fuzzing Deep Learning Libraries via Assembling Code MoCo：通过汇编代码模糊化深度学习库

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-12-02 DOI: 10.1109/TSE.2024.3509975

Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao

The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by MoCo are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. MoCo first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, MoCo detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that MoCo can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.

{"title":"MoCo: Fuzzing Deep Learning Libraries via Assembling Code","authors":"Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao","doi":"10.1109/TSE.2024.3509975","DOIUrl":"10.1109/TSE.2024.3509975","url":null,"abstract":"The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose <monospace>MoCo</monospace>, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by <monospace>MoCo</monospace> are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. <monospace>MoCo</monospace> first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, <monospace>MoCo</monospace> can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of <monospace>MoCo</monospace> using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, <monospace>MoCo</monospace> detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that <monospace>MoCo</monospace> can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"371-388"},"PeriodicalIF":6.5,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142759889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development Sprint2Vec：迭代软件开发中对sprint的深入描述

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-11-29 DOI: 10.1109/TSE.2024.3509016

Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta

Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.

像敏捷Scrum这样的迭代方法通常被用来增强软件开发过程。然而，进度和预算超支等挑战仍然存在于许多软件项目中。有几种方法使用机器学习技术，特别是分类，来促进迭代软件开发中的决策。现有的方法通常集中于描述冲刺，以预测生产率。我们介绍了Sprint2Vec，它利用了sprint信息的三个方面——sprint属性、问题属性和sprint中涉及的开发人员，来全面地描述它，以预测sprint的生产力和质量结果。我们的方法结合了传统的特征提取技术和基于自动深度学习的无监督特征学习技术。我们利用长短期记忆（LSTM）等方法来增强我们的特征学习过程。这使我们能够从非结构化数据中学习特性，例如问题的文本描述和开发人员活动的序列。我们在两个回归任务上对我们的方法进行了评估：预测可交付性（即，从冲刺中交付的工作量）和冲刺的质量（即，需要返工的交付工作量）。在五个知名的开源项目（Apache、Atlassian、Jenkins、Spring和Talendforge）上的评估结果表明，与基线和替代方法相比，我们的方法具有优越的性能。

{"title":"Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development","authors":"Morakot Choetkiertikul;Peerachai Banyongrakkul;Chaiyong Ragkhitwetsagul;Suppawong Tuarob;Hoa Khanh Dam;Thanwadee Sunetnanta","doi":"10.1109/TSE.2024.3509016","DOIUrl":"10.1109/TSE.2024.3509016","url":null,"abstract":"Iterative approaches like Agile Scrum are commonly adopted to enhance the software development process. However, challenges such as schedule and budget overruns still persist in many software projects. Several approaches employ machine learning techniques, particularly classification, to facilitate decision-making in iterative software development. Existing approaches often concentrate on characterizing a sprint to predict solely productivity. We introduce Sprint2Vec, which leverages three aspects of sprint information – sprint attributes, issue attributes, and the developers involved in a sprint, to comprehensively characterize it for predicting both productivity and quality outcomes of the sprints. Our approach combines traditional feature extraction techniques with automated deep learning-based unsupervised feature learning techniques. We utilize methods like Long Short-Term Memory (LSTM) to enhance our feature learning process. This enables us to learn features from unstructured data, such as textual descriptions of issues and sequences of developer activities. We conducted an evaluation of our approach on two regression tasks: predicting the deliverability (i.e., the amount of work delivered from a sprint) and quality of a sprint (i.e., the amount of delivered work that requires rework). The evaluation results on five well-known open-source projects (Apache, Atlassian, Jenkins, Spring, and Talendforge) demonstrate our approach's superior performance compared to baseline and alternative approaches.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"220-242"},"PeriodicalIF":6.5,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10771809","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PackHunter: Recovering Missing Packages for C/C++ Projects 为C/ c++项目恢复丢失的包

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-11-27 DOI: 10.1109/TSE.2024.3506629

Rongxin Wu;Zhiling Huang;Zige Tian;Chengpeng Wang;Xiangyu Zhang

The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces PackHunter, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, PackHunter can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, PackHunter leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, PackHunter introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented PackHunter as a tool and evaluated it on 30 real-world projects. The results demonstrate that PackHunter can recover missing packages efficiently, achieving 26.59

$boldsymbol{times}$

speed up over the state-of-the-art approach. The effectiveness of PackHunter highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.

软件工件的可再现性是软件开发和应用程序的一个关键方面。然而，目前的研究表明，相当一部分C/ c++项目遇到了由构建失败引起的不可再现性问题，主要归因于缺乏必要的包。本文介绍了一种在C/ c++项目中自动恢复丢失包的新技术PackHunter。通过在项目构建过程中识别丢失的文件，PackHunter可以确定可能丢失的包并合成一个安装脚本。具体来说，它通过程序缩减来简化C/ c++项目，以减少构建开销，并通过模拟构建来模拟缺失文件的存在，以确保成功构建以探测缺失文件。此外，PackHunter利用复杂的设计来消除不包含所需丢失文件的包，有效地减少了搜索空间。此外，PackHunter引入了贪心策略来对包进行优先级排序，最终通过很少的包枚举来恢复丢失的包。我们已经将PackHunter作为工具实现，并在30个实际项目中对其进行了评估。结果表明，PackHunter可以有效地恢复丢失的包，达到26.59美元的速度比最先进的方法。PackHunter的有效性突出了它在帮助开发人员构建C/ c++工件和提高软件可重复性方面的潜力。

{"title":"PackHunter: Recovering Missing Packages for C/C++ Projects","authors":"Rongxin Wu;Zhiling Huang;Zige Tian;Chengpeng Wang;Xiangyu Zhang","doi":"10.1109/TSE.2024.3506629","DOIUrl":"10.1109/TSE.2024.3506629","url":null,"abstract":"The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces \u0000PackHunter\u0000, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, \u0000PackHunter\u0000 can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, \u0000PackHunter\u0000 leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, \u0000PackHunter\u0000 introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented \u0000PackHunter\u0000 as a tool and evaluated it on 30 real-world projects. The results demonstrate that \u0000PackHunter\u0000 can recover missing packages efficiently, achieving 26.59\u0000<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>\u0000 speed up over the state-of-the-art approach. The effectiveness of \u0000PackHunter\u0000 highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"206-219"},"PeriodicalIF":6.5,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142753721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups 即时语法高亮：泛化和加速

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-11-26 DOI: 10.1109/TSE.2024.3506040

Marco Edoardo Palma;Alex Wolf;Pasquale Salza;Harald C. Gall

On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance due to the widespread use of online software development tools, which frequently display source code and heavily rely on efficient syntax highlighting mechanisms. In this context, resolvers must address three key demands: speed, accuracy, and development costs. Speed constraints are crucial for ensuring usability, providing responsive feedback for end users and minimizing system overhead. At the same time, precise syntax highlighting is essential for improving code comprehension. Achieving such accuracy, however, requires the ability to perform grammatical analysis, even in cases of varying correctness. Additionally, the development costs associated with supporting multiple programming languages pose a significant challenge. The technical challenges in balancing these three aspects explain why developers today experience significantly worse code syntax highlighting online compared to what they have locally. The current state-of-the-art relies on leveraging programming languages’ original lexers and parsers to generate syntax highlighting oracles, which are used to train base Recurrent Neural Network models. However, questions of generalisation remain. This paper addresses this gap by extending previous work validation dataset to six mainstream programming languages thus providing a more thorough evaluation. In response to limitations related to evaluation performance and training costs, this work introduces a novel Convolutional Neural Network (CNN) based model, specifically designed to mitigate these issues. Furthermore, this work addresses an area previously unexplored performance gains when deploying such models on GPUs. The evaluation demonstrates that the new CNN-based implementation is significantly faster than existing state-of-the-art methods, while still delivering the same near-perfect accuracy.

{"title":"On-the-Fly Syntax Highlighting: Generalisation and Speed-Ups","authors":"Marco Edoardo Palma;Alex Wolf;Pasquale Salza;Harald C. Gall","doi":"10.1109/TSE.2024.3506040","DOIUrl":"10.1109/TSE.2024.3506040","url":null,"abstract":"On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance due to the widespread use of online software development tools, which frequently display source code and heavily rely on efficient syntax highlighting mechanisms. In this context, resolvers must address three key demands: speed, accuracy, and development costs. Speed constraints are crucial for ensuring usability, providing responsive feedback for end users and minimizing system overhead. At the same time, precise syntax highlighting is essential for improving code comprehension. Achieving such accuracy, however, requires the ability to perform grammatical analysis, even in cases of varying correctness. Additionally, the development costs associated with supporting multiple programming languages pose a significant challenge. The technical challenges in balancing these three aspects explain why developers today experience significantly worse code syntax highlighting online compared to what they have locally. The current state-of-the-art relies on leveraging programming languages’ original lexers and parsers to generate syntax highlighting oracles, which are used to train base Recurrent Neural Network models. However, questions of generalisation remain. This paper addresses this gap by extending previous work validation dataset to six mainstream programming languages thus providing a more thorough evaluation. In response to limitations related to evaluation performance and training costs, this work introduces a novel Convolutional Neural Network (CNN) based model, specifically designed to mitigate these issues. Furthermore, this work addresses an area previously unexplored performance gains when deploying such models on GPUs. The evaluation demonstrates that the new CNN-based implementation is significantly faster than existing state-of-the-art methods, while still delivering the same near-perfect accuracy.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"355-370"},"PeriodicalIF":6.5,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142718350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work 三倍峰值日：混合工作中软件开发人员的工作节奏

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering

Pub Date : 2024-11-22 DOI: 10.1109/TSE.2024.3504831

Javier Hernandez;Vedant Das Swain;Jina Suh;Daniel McDuff;Judith Amores;Gonzalo Ramos;Kael Rowan;Brian Houck;Shamsi Iqbal;Mary Czerwinski

The future of work is rapidly changing, with remote and hybrid settings blurring the boundaries between professional and personal life. To understand how work rhythms vary across different work settings, we conducted a month-long study of 65 software developers, collecting anonymized computer activity data as well as daily ratings for perceived stress, productivity, and work setting. In addition to confirming the double-peak pattern of activity at 10:00 am and 2:00 pm observed in prior research, we observed a significant third peak around 9:00 pm. This third peak was associated with higher perceived productivity during remote days but increased stress during onsite and hybrid days, highlighting a nuanced interplay between work demands and work settings. Additionally, we found strong correlations between computer activity, productivity, and stress, including an inverted U-shaped relationship where productivity peaked at around six hours of computer activity before declining on more active days. These findings provide new insights into evolving work rhythms and highlight the impact of different work settings on productivity and stress.

{"title":"Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work","authors":"Javier Hernandez;Vedant Das Swain;Jina Suh;Daniel McDuff;Judith Amores;Gonzalo Ramos;Kael Rowan;Brian Houck;Shamsi Iqbal;Mary Czerwinski","doi":"10.1109/TSE.2024.3504831","DOIUrl":"10.1109/TSE.2024.3504831","url":null,"abstract":"The future of work is rapidly changing, with remote and hybrid settings blurring the boundaries between professional and personal life. To understand how work rhythms vary across different work settings, we conducted a month-long study of 65 software developers, collecting anonymized computer activity data as well as daily ratings for perceived stress, productivity, and work setting. In addition to confirming the double-peak pattern of activity at 10:00 am and 2:00 pm observed in prior research, we observed a significant third peak around 9:00 pm. This third peak was associated with higher perceived productivity during remote days but increased stress during onsite and hybrid days, highlighting a nuanced interplay between work demands and work settings. Additionally, we found strong correlations between computer activity, productivity, and stress, including an inverted U-shaped relationship where productivity peaked at around six hours of computer activity before declining on more active days. These findings provide new insights into evolving work rhythms and highlight the impact of different work settings on productivity and stress.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"344-354"},"PeriodicalIF":6.5,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142690736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0