ACM Transactions on Software Engineering and Methodology最新文献_第9页

Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic - RCR Report 了解开发人员的福祉和生产力：COVID-19 大流行期间的两年纵向分析 - RCR 报告

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-11 DOI: 10.1145/3640338

Daniel Russo, Paul H. P. Hanel, Niels van Berkel

The artifact accompanying the paper “Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic” provides a comprehensive set of tools, data, and scripts that were utilized in the longitudinal study. Spanning 24 months, from April 2020 to April 2022, the study delves into the shifts in well-being, productivity, social contacts, needs, and several other variables of software engineers during the COVID-19 pandemic. The artifact facilitates the reproduction of the study’s findings, offering a deeper insight into the systematic changes observed in various variables, such as well-being, quality of social contacts, and emotional loneliness. By providing access to the evidence-generating mechanisms and the generated data, the artifact ensures transparency, reproducibility, and allow researchers to use our rich dataset to test their own research question. This Replicated Computational Results report aims to detail the contents of the artifact, its relevance to the main paper, and guidelines for its effective utilization.

了解开发人员的幸福感和生产力：COVID-19 大流行期间的 2 年纵向分析》一文的随附作品提供了纵向研究中使用的一整套工具、数据和脚本。该研究从 2020 年 4 月到 2022 年 4 月，历时 24 个月，深入探讨了 COVID-19 大流行期间软件工程师在幸福感、工作效率、社会交往、需求和其他一些变量方面的变化。该工具有助于再现研究结果，让人们更深入地了解在幸福感、社会交往质量和情感孤独感等各种变量中观察到的系统性变化。通过提供对证据生成机制和生成数据的访问，该工具确保了透明度和可重复性，并允许研究人员使用我们丰富的数据集来测试他们自己的研究问题。本复制计算结果报告旨在详细介绍该工具的内容、其与主要论文的相关性以及有效利用该工具的指南。

{"title":"Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic - RCR Report","authors":"Daniel Russo, Paul H. P. Hanel, Niels van Berkel","doi":"10.1145/3640338","DOIUrl":"https://doi.org/10.1145/3640338","url":null,"abstract":"The artifact accompanying the paper “Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic” provides a comprehensive set of tools, data, and scripts that were utilized in the longitudinal study. Spanning 24 months, from April 2020 to April 2022, the study delves into the shifts in well-being, productivity, social contacts, needs, and several other variables of software engineers during the COVID-19 pandemic. The artifact facilitates the reproduction of the study’s findings, offering a deeper insight into the systematic changes observed in various variables, such as well-being, quality of social contacts, and emotional loneliness. By providing access to the evidence-generating mechanisms and the generated data, the artifact ensures transparency, reproducibility, and allow researchers to use our rich dataset to test their own research question. This Replicated Computational Results report aims to detail the contents of the artifact, its relevance to the main paper, and guidelines for its effective utilization.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"82 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying and Explaining Safety-critical Scenarios for Autonomous Vehicles via Key Features 通过关键特征识别和解释自动驾驶汽车的安全关键场景

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-11 DOI: 10.1145/3640335

Neelofar Neelofar, Aldeida Aleti

Ensuring the safety of autonomous vehicles (AVs) is of utmost importance, and testing them in simulated environments is a safer option than conducting in-field operational tests. However, generating an exhaustive test suite to identify critical test scenarios is computationally expensive as the representation of each test is complex and contains various dynamic and static features, such as the AV under test, road participants (vehicles, pedestrians, and static obstacles), environmental factors (weather and light), and the road’s structural features (lanes, turns, road speed, etc.). In this paper, we present a systematic technique that uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs. ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in 2D. This visualisation helps to identify untested regions of the instance space and provides an indicator of the quality of the test suite in terms of the percentage of feature space covered by testing. To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe. The high precision, recall, and F1 scores indicate that our proposed approach is effective in predicting the outcome of a test scenario without executing it and can be used for test generation, selection, and prioritisation.

确保自动驾驶汽车（AV）的安全性至关重要，而在模拟环境中对其进行测试是比进行现场运行测试更安全的选择。然而，生成详尽的测试套件以确定关键测试场景的计算成本很高，因为每个测试的表示都很复杂，包含各种动态和静态特征，如被测自动驾驶汽车、道路参与者（车辆、行人和静态障碍物）、环境因素（天气和光线）以及道路结构特征（车道、转弯、道路速度等）。在本文中，我们提出了一种系统技术，利用实例空间分析（ISA）来识别影响测试场景揭示自动驾驶汽车不安全行为能力的重要特征。ISA 能够识别出最能区分安全关键场景与正常驾驶的特征，并以二维形式直观显示这些特征对测试场景结果（安全/不安全）的影响。这种可视化有助于识别实例空间中未经测试的区域，并根据测试覆盖的特征空间百分比提供测试套件的质量指标。为了测试已识别特征的预测能力，我们训练了五个机器学习分类器，将测试场景划分为安全或不安全。高精确度、高召回率和高 F1 分数表明，我们提出的方法可以在不执行测试的情况下有效预测测试场景的结果，并可用于测试的生成、选择和优先级排序。

{"title":"Identifying and Explaining Safety-critical Scenarios for Autonomous Vehicles via Key Features","authors":"Neelofar Neelofar, Aldeida Aleti","doi":"10.1145/3640335","DOIUrl":"https://doi.org/10.1145/3640335","url":null,"abstract":"Ensuring the safety of autonomous vehicles (AVs) is of utmost importance, and testing them in simulated environments is a safer option than conducting in-field operational tests. However, generating an exhaustive test suite to identify critical test scenarios is computationally expensive as the representation of each test is complex and contains various dynamic and static features, such as the AV under test, road participants (vehicles, pedestrians, and static obstacles), environmental factors (weather and light), and the road’s structural features (lanes, turns, road speed, etc.). In this paper, we present a systematic technique that uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs. ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in 2D. This visualisation helps to identify untested regions of the instance space and provides an indicator of the quality of the test suite in terms of the percentage of feature space covered by testing. To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe. The high precision, recall, and F1 scores indicate that our proposed approach is effective in predicting the outcome of a test scenario without executing it and can be used for test generation, selection, and prioritisation.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"45 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ARCTURUS: Full Coverage Binary Similarity Analysis with Reachability-Guided Emulation ARCTURUS：利用可达性引导仿真进行全覆盖二进制相似性分析

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-11 DOI: 10.1145/3640337

Anshunkang Zhou, Yikun Hu, Xiangzhe Xu, Charles Zhang

Binary code similarity analysis is extremely useful since it provides rich information about an unknown binary, such as revealing its functionality and identifying reused libraries. Robust binary similarity analysis is challenging as heavy compiler optimizations can make semantically similar binaries have gigantic syntactic differences. Unfortunately, existing semantic-based methods still suffer from either incomplete coverage or low accuracy.

In this paper, we propose ARCTURUS, a new technique that can achieve high code coverage and high accuracy simultaneously by manipulating program execution under the guidance of code reachability. Our key insight is that the compiler must preserve program semantics (e.g., dependences between code fragments) during compilation; therefore, the code reachability, which implies the interdependence between code, is invariant across code transformations. Based on the above insight, our key idea is to leverage the stability of code reachability to manipulate the program execution such that deep code logic can also be covered in a consistent way. Experimental results show that ARCTURUS achieves an average precision of 87.8% with 100% block coverage, outperforming compared methods by 38.4% on average. ARCTURUS takes only 0.15 seconds to process one function on average, indicating that it is efficient for practical use.

二进制代码相似性分析非常有用，因为它能提供有关未知二进制代码的丰富信息，例如揭示其功能和识别重复使用的库。稳健的二进制代码相似性分析具有挑战性，因为编译器的大量优化会使语义相似的二进制代码在语法上存在巨大差异。遗憾的是，现有的基于语义的方法仍然存在覆盖面不全或准确率低的问题。在本文中，我们提出了一种新技术 ARCTURUS，它能在代码可达性的指导下操纵程序的执行，从而同时实现高代码覆盖率和高精确度。我们的主要观点是，编译器必须在编译过程中保留程序语义（如代码片段之间的依赖关系）；因此，代码可达性意味着代码之间的相互依赖关系，在代码转换过程中是不变的。基于上述观点，我们的主要想法是利用代码可达性的稳定性来操纵程序的执行，从而以一致的方式覆盖深层代码逻辑。实验结果表明，ARCTURUS 实现了 87.8% 的平均精度和 100% 的代码块覆盖率，平均比其他方法高出 38.4%。ARCTURUS 处理一个函数平均只需 0.15 秒，这表明它在实际应用中非常高效。

{"title":"ARCTURUS: Full Coverage Binary Similarity Analysis with Reachability-Guided Emulation","authors":"Anshunkang Zhou, Yikun Hu, Xiangzhe Xu, Charles Zhang","doi":"10.1145/3640337","DOIUrl":"https://doi.org/10.1145/3640337","url":null,"abstract":"Binary code similarity analysis is extremely useful since it provides rich information about an unknown binary, such as revealing its functionality and identifying reused libraries. Robust binary similarity analysis is challenging as heavy compiler optimizations can make semantically similar binaries have gigantic syntactic differences. Unfortunately, existing semantic-based methods still suffer from either incomplete coverage or low accuracy. In this paper, we propose ARCTURUS, a new technique that can achieve high code coverage and high accuracy simultaneously by manipulating program execution under the guidance of code reachability. Our key insight is that the compiler must preserve program semantics (e.g., dependences between code fragments) during compilation; therefore, the code reachability, which implies the interdependence between code, is invariant across code transformations. Based on the above insight, our key idea is to leverage the stability of code reachability to manipulate the program execution such that deep code logic can also be covered in a consistent way. Experimental results show that ARCTURUS achieves an average precision of 87.8% with 100% block coverage, outperforming compared methods by 38.4% on average. ARCTURUS takes only 0.15 seconds to process one function on average, indicating that it is efficient for practical use.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"20 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Compiler Autotuning through Multiple Phase Learning 通过多阶段学习实现编译器自动调整

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-11 DOI: 10.1145/3640330

Mingxuan Zhu, Dan Hao, Junjie Chen

Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler program. Due to the large number of optimization flags and their combination, it is difficult for compiler users to manually tune compiler optimization flags. In the literature, a number of autotuning techniques have been proposed, which tune optimization flags for a compiled program by comparing its actual runtime performance with different optimization flag combination. Due to the huge search space and heavy actual runtime cost, these techniques suffer from the widely-recognized efficiency problem. To reduce the heavy runtime cost, in this paper we propose a lightweight learning approach which uses a small number of actual runtime performance data to predict the runtime performance of a compiled program with various optimization flag combinations. Furthermore, to reduce the search space, we design a novel particle swarm algorithm which tunes compiler optimization flags with the prediction model. To evaluate the performance of the proposed approach CompTuner, we conduct an extensive experimental study on two popular C compilers GCC and LLVM with two widely used benchmarks cBench and PolyBench. The experimental results show that CompTuner significantly outperforms the six compared techniques, including the state-of-art technique BOCA.

GCC 和 LLVM 等广泛使用的编译器通常有数百个由优化标志控制的优化功能，这些优化标志在编译过程中被启用或禁用，以提高编译程序的运行时性能（如较小的执行时间）。由于优化标志及其组合数量众多，编译器用户很难手动调整编译器优化标志。文献中提出了许多自动调整技术，通过比较不同优化标志组合的实际运行性能来调整编译程序的优化标志。由于搜索空间巨大、实际运行时间成本高昂，这些技术都存在公认的效率问题。为了降低高昂的运行时间成本，本文提出了一种轻量级学习方法，即使用少量实际运行时间性能数据来预测编译程序在不同优化标志组合下的运行时间性能。此外，为了减少搜索空间，我们还设计了一种新颖的粒子群算法，该算法可根据预测模型调整编译器优化标志。为了评估所提出的 CompTuner 方法的性能，我们对两种流行的 C 编译器 GCC 和 LLVM 以及两种广泛使用的基准 cBench 和 PolyBench 进行了广泛的实验研究。实验结果表明，CompTuner 的性能明显优于六种比较过的技术，包括最先进的技术 BOCA。

{"title":"Compiler Autotuning through Multiple Phase Learning","authors":"Mingxuan Zhu, Dan Hao, Junjie Chen","doi":"10.1145/3640330","DOIUrl":"https://doi.org/10.1145/3640330","url":null,"abstract":"Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler program. Due to the large number of optimization flags and their combination, it is difficult for compiler users to manually tune compiler optimization flags. In the literature, a number of autotuning techniques have been proposed, which tune optimization flags for a compiled program by comparing its actual runtime performance with different optimization flag combination. Due to the huge search space and heavy actual runtime cost, these techniques suffer from the widely-recognized efficiency problem. To reduce the heavy runtime cost, in this paper we propose a lightweight learning approach which uses a small number of actual runtime performance data to predict the runtime performance of a compiled program with various optimization flag combinations. Furthermore, to reduce the search space, we design a novel particle swarm algorithm which tunes compiler optimization flags with the prediction model. To evaluate the performance of the proposed approach CompTuner, we conduct an extensive experimental study on two popular C compilers GCC and LLVM with two widely used benchmarks cBench and PolyBench. The experimental results show that CompTuner significantly outperforms the six compared techniques, including the state-of-art technique BOCA.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"1 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Industry Practices for Challenging Autonomous Driving Systems with Critical Scenarios 用关键场景挑战自动驾驶系统的行业实践

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-11 DOI: 10.1145/3640334

Qunying Song, Emelie Engström, Per Runeson

Testing autonomous driving systems for safety and reliability is essential, yet complex. A primary challenge is identifying relevant test scenarios, especially the critical ones that may expose hazards or harm to autonomous vehicles and other road users. Although numerous approaches and tools for critical scenario identification are proposed, the industry practices for selection, implementation, and limitations of approaches, are not well understood. Therefore, we aim to explore practical aspects of how autonomous driving systems are tested, particularly the identification and use of critical scenarios. We interviewed 13 practitioners from 7 companies in autonomous driving in Sweden. We used thematic modeling to analyse and synthesize the interview data. As a result, we present 9 themes of practices and 4 themes of challenges related to critical scenarios. Our analysis indicates there is little joint effort in the industry, despite every approach has its own limitations, and tools and platforms are lacking. To that end, we recommend the industry and academia combine different approaches, collaborate among different stakeholders, and continuously learn the field. The contributions of our study are exploration and synthesis of industry practices and related challenges for critical scenario identification and testing, and potential increase of industry relevance for future studies.

测试自动驾驶系统的安全性和可靠性至关重要，但也十分复杂。一个主要的挑战是识别相关的测试场景，尤其是可能对自动驾驶车辆和其他道路使用者造成危险或伤害的关键场景。虽然已经提出了许多用于识别关键场景的方法和工具，但业界对这些方法的选择、实施和局限性并不十分了解。因此，我们旨在探索自动驾驶系统测试的实践方面，特别是关键场景的识别和使用。我们采访了来自瑞典 7 家自动驾驶公司的 13 名从业人员。我们采用主题建模法对访谈数据进行分析和综合。因此，我们提出了与关键情景相关的 9 个实践主题和 4 个挑战主题。我们的分析表明，尽管每种方法都有其自身的局限性，但行业内几乎没有共同努力，也缺乏工具和平台。为此，我们建议业界和学术界结合不同的方法，在不同利益相关者之间开展合作，并不断学习该领域的知识。我们这项研究的贡献在于探索和总结了行业实践以及关键情景识别和测试的相关挑战，并为未来研究提供了潜在的行业相关性。

{"title":"Industry Practices for Challenging Autonomous Driving Systems with Critical Scenarios","authors":"Qunying Song, Emelie Engström, Per Runeson","doi":"10.1145/3640334","DOIUrl":"https://doi.org/10.1145/3640334","url":null,"abstract":"Testing autonomous driving systems for safety and reliability is essential, yet complex. A primary challenge is identifying relevant test scenarios, especially the critical ones that may expose hazards or harm to autonomous vehicles and other road users. Although numerous approaches and tools for critical scenario identification are proposed, the industry practices for selection, implementation, and limitations of approaches, are not well understood. Therefore, we aim to explore practical aspects of how autonomous driving systems are tested, particularly the identification and use of critical scenarios. We interviewed 13 practitioners from 7 companies in autonomous driving in Sweden. We used thematic modeling to analyse and synthesize the interview data. As a result, we present 9 themes of practices and 4 themes of challenges related to critical scenarios. Our analysis indicates there is little joint effort in the industry, despite every approach has its own limitations, and tools and platforms are lacking. To that end, we recommend the industry and academia combine different approaches, collaborate among different stakeholders, and continuously learn the field. The contributions of our study are exploration and synthesis of industry practices and related challenges for critical scenario identification and testing, and potential increase of industry relevance for future studies.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"30 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement 描述 PyPI 中的深度学习软件包供应链：领域、集群和脱离

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-10 DOI: 10.1145/3640336

Kai Gao, Runzhi He, Bing Xie, Minghui Zhou

Deep learning (DL) frameworks have become the cornerstone of the rapidly developing DL field. Through installation dependencies specified in the distribution metadata, numerous packages directly or transitively depend on DL frameworks, layer after layer, forming DL package supply chains (SCs), which are critical for DL frameworks to remain competitive. However, vital knowledge on how to nurture and sustain DL package SCs is still lacking. Achieving this knowledge may help DL frameworks formulate effective measures to strengthen their SCs to remain competitive and shed light on dependency issues and practices in the DL SC for researchers and practitioners. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85% of popular packages in either SC and TensorFlow and PyTorch SC have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, while Tree and Forest clusters account for most packages (Tensorflow SC: 70.7%, PyTorch SC: 92.9%). We identify three groups of reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common reason in TensorFlow SC is dependency incompatibility and in PyTorch SC is to simplify functionalities and reduce installation size. Our study provides rich implications for DL framework vendors, researchers, and practitioners on the maintenance and dependency management practices of PyPI DL SCs.

深度学习（DL）框架已成为快速发展的 DL 领域的基石。通过分发元数据中指定的安装依赖关系，众多软件包直接或过渡依赖于 DL 框架，一层又一层，形成了 DL 软件包供应链（SC），这对 DL 框架保持竞争力至关重要。然而，关于如何培育和维持 DL 软件包供应链的重要知识仍然缺乏。获得这方面的知识可以帮助 DL 框架制定有效措施，加强其 SC 以保持竞争力，并为研究人员和从业人员揭示 DL SC 中的依赖性问题和实践。在本文中，我们探讨了两个具有代表性的 PyPI DL 软件包 SC 中软件包的领域、集群和脱离情况，以弥补这一知识空白。我们分析了近六百万个 PyPI 软件包发行版的元数据，并为两个流行的 DL 框架构建了对版本敏感的 SC：TensorFlow 和 PyTorch。我们发现，这两个 SC 中的流行软件包（以月下载量衡量）涵盖了属于 8 个类别的 34 个领域。应用、基础架构和科学类别占两个 SC 中流行软件包的 85% 以上，TensorFlow 和 PyTorch SC 分别开发了基础架构和应用软件包。我们采用莱顿社区检测算法，在两个 SC 中分别检测到 131 个和 100 个聚类。这些聚类主要呈现出四种形状：箭形、星形、树形和森林形，依赖复杂度依次增加。大多数聚类是箭头型或星型，而树型和森林型聚类则占了大多数软件包（Tensorflow SC：70.7%，PyTorch SC：92.9%）。我们发现，软件包脱离 SC（即从安装依赖关系中移除 DL 框架及其依赖关系）有三类原因：依赖关系问题、功能改进和安装方便。在 TensorFlow SC 中，最常见的原因是依赖关系不兼容，而在 PyTorch SC 中，最常见的原因是简化功能和减少安装体积。我们的研究就 PyPI DL SC 的维护和依赖性管理实践为 DL 框架供应商、研究人员和从业人员提供了丰富的启示。

{"title":"Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement","authors":"Kai Gao, Runzhi He, Bing Xie, Minghui Zhou","doi":"10.1145/3640336","DOIUrl":"https://doi.org/10.1145/3640336","url":null,"abstract":"Deep learning (DL) frameworks have become the cornerstone of the rapidly developing DL field. Through installation dependencies specified in the distribution metadata, numerous packages directly or transitively depend on DL frameworks, layer after layer, forming DL package supply chains (SCs), which are critical for DL frameworks to remain competitive. However, vital knowledge on how to nurture and sustain DL package SCs is still lacking. Achieving this knowledge may help DL frameworks formulate effective measures to strengthen their SCs to remain competitive and shed light on dependency issues and practices in the DL SC for researchers and practitioners. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85% of popular packages in either SC and TensorFlow and PyTorch SC have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, while Tree and Forest clusters account for most packages (Tensorflow SC: 70.7%, PyTorch SC: 92.9%). We identify three groups of reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common reason in TensorFlow SC is dependency incompatibility and in PyTorch SC is to simplify functionalities and reduce installation size. Our study provides rich implications for DL framework vendors, researchers, and practitioners on the maintenance and dependency management practices of PyPI DL SCs.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"264 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139413140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine Translation Testing via Syntactic Tree Pruning 通过句法树修剪进行机器翻译测试

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2024-01-01 DOI: 10.1145/3640329

Quanjun Zhang, Juan Zhai, Chunrong Fang, Jiawei Liu, Weisong Sun, Haichuan Hu, Qingyu Wang

Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structures and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP accurately finds 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective in detecting translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.

机器翻译系统已被广泛应用于我们的日常生活，使生活变得更加轻松便捷。遗憾的是，错误的翻译可能会导致严重的后果，如经济损失。这就需要提高机器翻译系统的准确性和可靠性。然而，由于底层神经模型的复杂性和不可操作性，测试机器翻译系统具有挑战性。为了应对这些挑战，我们提出了一种新颖的元测试方法，即通过句法树剪枝（STP）来验证机器翻译系统。我们的主要观点是，与原始句子相比，修剪后的句子应具有相似的关键语义。具体来说，STP (1) 通过在句法树表示层面上的基本句子结构和依赖关系，提出了一种保留核心语义的剪枝策略；(2) 根据变形关系生成源句对；(3) 通过词袋模型报告译文破坏一致性的可疑问题。我们还在两个最先进的机器翻译系统（即谷歌翻译和必应微软翻译）上以 1,200 个源句为输入对 STP 进行了评估。结果显示，STP 在谷歌翻译系统中准确找到了 5,073 个错误译文，在必应微软翻译系统中准确找到了 5,100 个错误译文（比最先进技术高出 400%），准确率分别为 64.5% 和 65.4%。报告的错误翻译类型各不相同，其中 90% 以上是由最先进的技术发现的。STP 独有的错误翻译有 9,393 个，比最先进技术多出 711.9%。此外，STP 在检测原始句子的翻译错误方面相当有效，召回率达到 74.0%，平均比最新技术提高了 55.1%。

{"title":"Machine Translation Testing via Syntactic Tree Pruning","authors":"Quanjun Zhang, Juan Zhai, Chunrong Fang, Jiawei Liu, Weisong Sun, Haichuan Hu, Qingyu Wang","doi":"10.1145/3640329","DOIUrl":"https://doi.org/10.1145/3640329","url":null,"abstract":"Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structures and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP accurately finds 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective in detecting translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"1 10","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139457522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic 了解开发人员的福祉和生产力：COVID-19 大流行期间的两年纵向分析

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2023-12-22 DOI: 10.1145/3638244

Daniel Russo, Paul H. P. Hanel, Niels van Berkel

The COVID-19 pandemic has brought significant and enduring shifts in various aspects of life, including increased flexibility in work arrangements. In a longitudinal study, spanning 24 months with six measurement points from April 2020 to April 2022, we explore changes in well-being, productivity, social contacts, and needs of software engineers during this time. Our findings indicate systematic changes in various variables. For example, well-being and quality of social contacts increased while emotional loneliness decreased as lockdown measures were relaxed. Conversely, people’s boredom and productivity, remained stable. Furthermore, a preliminary investigation into the future of work at the end of the pandemic revealed a consensus among developers for a preference of hybrid work arrangements. We also discovered that prior job changes and low job satisfaction were consistently linked to intentions to change jobs if current work conditions do not meet developers’ needs. This highlights the need for software organizations to adapt to various work arrangements to remain competitive employers. Building upon our findings and the existing literature, we introduce the Integrated Job Demands-Resources and Self-Determination (IJARS) Model as a comprehensive framework to explain the well-being and productivity of software engineers during the COVID-19 pandemic.

COVID-19 大流行给生活的各个方面带来了重大而持久的变化，其中包括工作安排灵活性的提高。在一项从 2020 年 4 月到 2022 年 4 月、为期 24 个月、共六个测量点的纵向研究中，我们探讨了软件工程师在此期间在幸福感、工作效率、社会交往和需求方面的变化。我们的研究结果表明，各种变量都发生了系统性变化。例如，随着封锁措施的放松，幸福感和社会交往的质量都有所提高，而情感孤独感则有所下降。相反，人们的无聊感和工作效率则保持稳定。此外，对大流行病结束后未来工作的初步调查显示，开发人员一致倾向于混合工作安排。我们还发现，如果当前的工作条件不能满足开发人员的需求，之前的工作变动和工作满意度低与他们更换工作的意向始终相关。这突出表明，软件企业需要适应各种工作安排，以保持雇主的竞争力。在我们的研究结果和现有文献的基础上，我们引入了综合工作需求-资源和自我决定（IJARS）模型，作为解释软件工程师在 COVID-19 大流行期间的福利和生产率的综合框架。

{"title":"Understanding Developers Well-Being and Productivity: a 2-year Longitudinal Analysis during the COVID-19 Pandemic","authors":"Daniel Russo, Paul H. P. Hanel, Niels van Berkel","doi":"10.1145/3638244","DOIUrl":"https://doi.org/10.1145/3638244","url":null,"abstract":"The COVID-19 pandemic has brought significant and enduring shifts in various aspects of life, including increased flexibility in work arrangements. In a longitudinal study, spanning 24 months with six measurement points from April 2020 to April 2022, we explore changes in well-being, productivity, social contacts, and needs of software engineers during this time. Our findings indicate systematic changes in various variables. For example, well-being and quality of social contacts increased while emotional loneliness decreased as lockdown measures were relaxed. Conversely, people’s boredom and productivity, remained stable. Furthermore, a preliminary investigation into the future of work at the end of the pandemic revealed a consensus among developers for a preference of hybrid work arrangements. We also discovered that prior job changes and low job satisfaction were consistently linked to intentions to change jobs if current work conditions do not meet developers’ needs. This highlights the need for software organizations to adapt to various work arrangements to remain competitive employers. Building upon our findings and the existing literature, we introduce the Integrated Job Demands-Resources and Self-Determination (IJARS) Model as a comprehensive framework to explain the well-being and productivity of software engineers during the COVID-19 pandemic.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"22 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139028006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains Prompt Sapper：用于构建人工智能链的 LLM 生产工具

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2023-12-21 DOI: 10.1145/3638247

Yu Cheng, Jieshan Chen, Qing Huang, Zhenchang Xing, Xiwei Xu, Qinghua Lu

The emergence of foundation models, such as large language models (LLMs) GPT-4 and text-to-image models DALL-E, has opened up numerous possibilities across various domains. People can now use natural language (i.e. prompts) to communicate with AI to perform tasks. While people can use foundation models through chatbots (e.g., ChatGPT), chat, regardless of the capabilities of the underlying models, is not a production tool for building reusable AI services. APIs like LangChain allow for LLM-based application development but require substantial programming knowledge, thus posing a barrier. To mitigate this, we systematically review, summarise, refine and extend the concept of AI chain by incorporating the best principles and practices that have been accumulated in software engineering for decades into AI chain engineering, to systematize AI chain engineering methodology. We also develop a no-code integrated development environment, Prompt Sapper, which embodies these AI chain engineering principles and patterns naturally in the process of building AI chains, thereby improving the performance and quality of AI chains. With Prompt Sapper, AI chain engineers can compose prompt-based AI services on top of foundation models through chat-based requirement analysis and visual programming. Our user study evaluated and demonstrated the efficiency and correctness of Prompt Sapper.

大型语言模型（LLMs）GPT-4 和文本到图像模型 DALL-E 等基础模型的出现，为各个领域带来了无数可能性。现在，人们可以使用自然语言（即提示）与人工智能交流，以执行任务。虽然人们可以通过聊天机器人（如 ChatGPT）使用基础模型，但无论基础模型的能力如何，聊天都不是构建可重用人工智能服务的生产工具。像 LangChain 这样的应用程序接口允许基于 LLM 的应用程序开发，但需要大量的编程知识，因此造成了障碍。为了缓解这一问题，我们系统地回顾、总结、提炼和扩展了人工智能链的概念，将软件工程领域几十年来积累的最佳原则和实践融入人工智能链工程中，使人工智能链工程方法系统化。我们还开发了无代码集成开发环境 Prompt Sapper，在构建人工智能链的过程中自然而然地体现这些人工智能链工程原则和模式，从而提高人工智能链的性能和质量。有了 Prompt Sapper，人工智能链工程师可以通过基于聊天的需求分析和可视化编程，在基础模型之上构建基于提示的人工智能服务。我们的用户研究评估并证明了 Prompt Sapper 的效率和正确性。

{"title":"Prompt Sapper: A LLM-Empowered Production Tool for Building AI Chains","authors":"Yu Cheng, Jieshan Chen, Qing Huang, Zhenchang Xing, Xiwei Xu, Qinghua Lu","doi":"10.1145/3638247","DOIUrl":"https://doi.org/10.1145/3638247","url":null,"abstract":"The emergence of foundation models, such as large language models (LLMs) GPT-4 and text-to-image models DALL-E, has opened up numerous possibilities across various domains. People can now use natural language (i.e. prompts) to communicate with AI to perform tasks. While people can use foundation models through chatbots (e.g., ChatGPT), chat, regardless of the capabilities of the underlying models, is not a production tool for building reusable AI services. APIs like LangChain allow for LLM-based application development but require substantial programming knowledge, thus posing a barrier. To mitigate this, we systematically review, summarise, refine and extend the concept of AI chain by incorporating the best principles and practices that have been accumulated in software engineering for decades into AI chain engineering, to systematize AI chain engineering methodology. We also develop a no-code integrated development environment, Prompt Sapper\u0000, which embodies these AI chain engineering principles and patterns naturally in the process of building AI chains, thereby improving the performance and quality of AI chains. With Prompt Sapper, AI chain engineers can compose prompt-based AI services on top of foundation models through chat-based requirement analysis and visual programming. Our user study evaluated and demonstrated the efficiency and correctness of Prompt Sapper.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"63 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138823904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-Practice 构建特定领域的机器学习工作流：实践现状的概念框架

IF 4.4 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology

Pub Date : 2023-12-21 DOI: 10.1145/3638243

Bentley James Oakes, Michalis Famelis, Houari Sahraoui

Domain experts are increasingly employing machine learning to solve their domain-specific problems. This article presents to software engineering researchers the six key challenges that a domain expert faces in addressing their problem with a computational workflow, and the underlying executable implementation. These challenges arise out of our conceptual framework which presents the “route” of transformations that a domain expert may choose to take while developing their solution.

To ground our conceptual framework in the state-of-the-practice, this article discusses a selection of available textual and graphical workflow systems and their support for the transformations described in our framework. Example studies from the literature in various domains are also examined to highlight the tools used by the domain experts as well as a classification of the domain-specificity and machine learning usage of their problem, workflow, and implementation.

The state-of-the-practice informs our discussion of the six key challenges, where we identify which challenges and transformations are not sufficiently addressed by available tools. We also suggest possible research directions for software engineering researchers to increase the automation of these tools and disseminate best-practice techniques between software engineering and various scientific domains.

领域专家越来越多地使用机器学习来解决其特定领域的问题。本文向软件工程研究人员介绍了领域专家在使用计算工作流和底层可执行实现解决其问题时所面临的六大挑战。这些挑战源于我们的概念框架，该框架提出了领域专家在开发解决方案时可能选择的转换 "路线"。为了将我们的概念框架建立在实践基础上，本文讨论了一些可用的文本和图形工作流系统，以及它们对我们框架中描述的转换的支持。本文还研究了不同领域文献中的示例研究，以突出领域专家使用的工具，并对其问题、工作流程和实施的领域特定性和机器学习使用情况进行了分类。实践状况为我们讨论六大挑战提供了参考，我们确定了哪些挑战和转换是现有工具无法充分解决的。我们还为软件工程研究人员提出了可能的研究方向，以提高这些工具的自动化程度，并在软件工程和各种科学领域之间传播最佳实践技术。

{"title":"Building Domain-Specific Machine Learning Workflows: A Conceptual Framework for the State-of-the-Practice","authors":"Bentley James Oakes, Michalis Famelis, Houari Sahraoui","doi":"10.1145/3638243","DOIUrl":"https://doi.org/10.1145/3638243","url":null,"abstract":"Domain experts are increasingly employing machine learning to solve their domain-specific problems. This article presents to software engineering researchers the six key challenges that a domain expert faces in addressing their problem with a computational workflow, and the underlying executable implementation. These challenges arise out of our conceptual framework which presents the “route” of transformations that a domain expert may choose to take while developing their solution. To ground our conceptual framework in the state-of-the-practice, this article discusses a selection of available textual and graphical workflow systems and their support for the transformations described in our framework. Example studies from the literature in various domains are also examined to highlight the tools used by the domain experts as well as a classification of the domain-specificity and machine learning usage of their problem, workflow, and implementation. The state-of-the-practice informs our discussion of the six key challenges, where we identify which challenges and transformations are not sufficiently addressed by available tools. We also suggest possible research directions for software engineering researchers to increase the automation of these tools and disseminate best-practice techniques between software engineering and various scientific domains.","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"2017 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138824279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0