2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献_第6页

Siri, Write the Next Method Siri，编写下一个方法

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-08 DOI: 10.1109/ICSE43902.2021.00025

Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, G. Bavota

Code completion is one of the killer features of Integrated Development Environments (IDEs), and researchers have proposed different methods to improve its accuracy. While these techniques are valuable to speed up code writing, they are limited to recommendations related to the next few tokens a developer is likely to type given the current context. In the best case, they can recommend a few APIs that a developer is likely to use next. We present FeaRS, a novel retrieval-based approach that, given the current code a developer is writing in the IDE, can recommend the next complete method (i.e., signature and method body) that the developer is likely to implement. To do this, FeaRS exploits "implementation patterns" (i.e., groups of methods usually implemented within the same task) learned by mining thousands of open source projects. We instantiated our approach to the specific context of Android apps. A large-scale empirical evaluation we performed across more than 20k apps shows encouraging preliminary results, but also highlights future challenges to overcome.

代码完成是集成开发环境(ide)的杀手级特性之一，研究人员提出了不同的方法来提高其准确性。虽然这些技术对于加快代码编写很有价值，但它们仅限于与开发人员在当前上下文中可能键入的下几个令牌相关的建议。在最好的情况下，他们可以推荐一些开发人员接下来可能使用的api。我们提出了FeaRS，这是一种新颖的基于检索的方法，根据开发人员在IDE中编写的当前代码，它可以推荐开发人员可能实现的下一个完整方法(即签名和方法体)。为了做到这一点，FeaRS利用了“实现模式”(即，通常在同一任务中实现的方法组)，这些模式是通过挖掘数千个开源项目而获得的。我们将我们的方法实例化到Android应用的特定环境中。我们对2万多款应用进行了大规模的实证评估，初步结果令人鼓舞，但也强调了未来需要克服的挑战。

引用次数: 13

Containing Malicious Package Updates in npm with a Lightweight Permission System 在npm中包含带有轻量级权限系统的恶意包更新

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-08 DOI: 10.1109/ICSE43902.2021.00121

G. Ferreira, Limin Jia, Joshua Sunshine, Christian Kästner

The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesystem or network APIs. This offers the opportunity to enforce least-privilege design per package, protecting applications and package dependencies from malicious updates. We propose a lightweight permission system that protects Node.js applications by enforcing package permissions at runtime. We discuss the design space of solutions and show that our system makes a large number of packages much harder to be exploited, almost for free.

在快速发展的软件生态系统中，如Node.js/npm中有大量可用的第三方包，攻击者可以通过向包依赖项推送恶意更新来破坏应用程序。通过研究npm存储库，我们发现Node.js应用程序中使用的npm存储库中的许多包只执行简单的计算，不需要访问文件系统或网络api。这提供了对每个包实施最小特权设计的机会，从而保护应用程序和包依赖项免受恶意更新的影响。我们提出了一个轻量级的权限系统，通过在运行时执行包权限来保护Node.js应用程序。我们讨论了解决方案的设计空间，并展示了我们的系统使大量软件包更难以被利用，几乎是免费的。

引用次数: 25

A Case Study of Onboarding in Software Teams: Tasks and Strategies 软件团队的入职案例研究:任务和策略

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-08 DOI: 10.1109/ICSE43902.2021.00063

An Ju, Hitesh Sajnani, Scot Kelly, Kim Herzig

Developers frequently move into new teams or environments across software companies. Their onboarding experience is correlated with productivity, job satisfaction, and other short-term and long-term outcomes. The majority of the onboarding process comprises engineering tasks such as fixing bugs or implementing small features. Nevertheless, we do not have a systematic view of how tasks influence onboarding. In this paper, we present a case study of Microsoft, where we interviewed 32 developers moving into a new team and 15 engineering managers onboarding a new developer into their team – to understand and characterize developers' onboarding experience and expectations in relation to the tasks performed by them while onboarding. We present how tasks interact with new developers through three representative themes: learning, confidence building, and socialization. We also discuss three onboarding strategies as inferred from the interviews that managers commonly use unknowingly, and discuss their pros and cons and offer situational recommendations. Furthermore, we triangulate our interview findings with a developer survey (N = 189) and a manager survey (N = 37) and find that survey results suggest that our findings are representative and our recommendations are actionable. Practitioners could use our findings to improve their onboarding processes, while researchers could find new research directions from this study to advance the understanding of developer onboarding. Our research instruments and anonymous data are available at https://zenodo.org/record/4455937#.YCOQCs 0lFd.

开发人员经常跨软件公司进入新的团队或环境。他们的入职经历与生产力、工作满意度以及其他短期和长期结果相关。大多数入职过程包括工程任务，例如修复错误或实现小功能。然而，对于任务如何影响入职，我们并没有一个系统的观点。在这篇论文中，我们提出了一个微软的案例研究，我们采访了32名进入新团队的开发人员和15名将新开发人员纳入团队的工程经理，以了解和描述开发人员的入职经历以及与他们在入职时执行的任务相关的期望。我们通过三个代表性主题:学习、建立信心和社会化来展示任务如何与新开发人员互动。我们还讨论了从面试中推断出的三种入职策略，这些策略是管理者通常在不知情的情况下使用的，并讨论了它们的利弊，并提供了情境建议。此外，我们将访谈结果与开发者调查(N = 189)和管理者调查(N = 37)进行三角测量，发现调查结果表明我们的调查结果具有代表性，我们的建议具有可操作性。实践者可以使用我们的发现来改进他们的入职过程，而研究人员可以从这项研究中找到新的研究方向，以促进对开发人员入职的理解。我们的研究工具和匿名数据可在https://zenodo.org/record/4455937#.YCOQCs 0lFd上获得。

{"title":"A Case Study of Onboarding in Software Teams: Tasks and Strategies","authors":"An Ju, Hitesh Sajnani, Scot Kelly, Kim Herzig","doi":"10.1109/ICSE43902.2021.00063","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00063","url":null,"abstract":"Developers frequently move into new teams or environments across software companies. Their onboarding experience is correlated with productivity, job satisfaction, and other short-term and long-term outcomes. The majority of the onboarding process comprises engineering tasks such as fixing bugs or implementing small features. Nevertheless, we do not have a systematic view of how tasks influence onboarding. In this paper, we present a case study of Microsoft, where we interviewed 32 developers moving into a new team and 15 engineering managers onboarding a new developer into their team – to understand and characterize developers' onboarding experience and expectations in relation to the tasks performed by them while onboarding. We present how tasks interact with new developers through three representative themes: learning, confidence building, and socialization. We also discuss three onboarding strategies as inferred from the interviews that managers commonly use unknowingly, and discuss their pros and cons and offer situational recommendations. Furthermore, we triangulate our interview findings with a developer survey (N = 189) and a manager survey (N = 37) and find that survey results suggest that our findings are representative and our recommendations are actionable. Practitioners could use our findings to improve their onboarding processes, while researchers could find new research directions from this study to advance the understanding of developer onboarding. Our research instruments and anonymous data are available at https://zenodo.org/record/4455937#.YCOQCs 0lFd.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127171338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models 测量歧视以促进多个深度学习模型的比较测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-07 DOI: 10.1109/ICSE43902.2021.00045

Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu

The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered as candidates to achieve this task. Testers are expected to compare multiple DL models and select the more suitable ones w.r.t. the whole testing context. Due to the limitation of labeling effort, testers aim to select an efficient subset of samples to make an as precise rank estimation as possible for these models. To tackle this problem, we propose Sample Discrimination based Selection (SDS) to select efficient samples that could discriminate multiple models, i.e., the prediction behaviors (right/wrong) of these samples would be helpful to indicate the trend of model performance. To evaluate SDS, we conduct an extensive empirical study with three widely-used image datasets and 80 real world DL models. The experiment results show that, compared with state-of-the-art baseline methods, SDS is an effective and efficient sample selection method to rank multiple DL models.

深度学习技术的蓬勃发展导致了大量深度学习模型的建立和共享，这为深度学习模型的获取和重用提供了便利。对于给定的任务，我们会遇到具有相同功能的多个DL模型，这些模型被认为是实现该任务的候选模型。测试人员需要比较多个深度学习模型，并在整个测试环境中选择更合适的模型。由于标注工作的限制，测试人员的目标是选择一个有效的样本子集，以便对这些模型进行尽可能精确的秩估计。为了解决这一问题，我们提出了基于样本辨别的选择(Sample Discrimination based Selection, SDS)来选择能够区分多个模型的有效样本，即这些样本的预测行为(对/错)将有助于指示模型性能的趋势。为了评估SDS，我们对三个广泛使用的图像数据集和80个真实世界的DL模型进行了广泛的实证研究。实验结果表明，与现有的基线方法相比，SDS是一种有效的样本选择方法，可以对多个深度学习模型进行排序。

{"title":"Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models","authors":"Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu","doi":"10.1109/ICSE43902.2021.00045","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00045","url":null,"abstract":"The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered as candidates to achieve this task. Testers are expected to compare multiple DL models and select the more suitable ones w.r.t. the whole testing context. Due to the limitation of labeling effort, testers aim to select an efficient subset of samples to make an as precise rank estimation as possible for these models. To tackle this problem, we propose Sample Discrimination based Selection (SDS) to select efficient samples that could discriminate multiple models, i.e., the prediction behaviors (right/wrong) of these samples would be helpful to indicate the trend of model performance. To evaluate SDS, we conduct an extensive empirical study with three widely-used image datasets and 80 real world DL models. The experiment results show that, compared with state-of-the-art baseline methods, SDS is an effective and efficient sample selection method to rank multiple DL models.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Do you Really Code? Designing and Evaluating Screening Questions for Online Surveys with Programmers 你真的会编程吗?设计和评估在线调查的筛选问题与程序员

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-07 DOI: 10.1109/ICSE43902.2021.00057

A. Danilova, Alena Naiakshina, S. Horstmann, Matthew Smith

Recruiting professional programmers in sufficient numbers for research studies can be challenging because they often cannot spare the time, or due to their geographical distribution and potentially the cost involved. Online platforms such as Clickworker or Qualtrics do provide options to recruit participants with programming skill; however, misunderstandings and fraud can be an issue. This can result in participants without programming skill taking part in studies and surveys. If these participants are not detected, they can cause detrimental noise in the survey data. In this paper, we develop screener questions that are easy and quick to answer for people with programming skill but difficult to answer correctly for those without. In order to evaluate our questionnaire for efficacy and efficiency, we recruited several batches of participants with and without programming skill and tested the questions. In our batch 42% of Clickworkers stating that they have programming skill did not meet our criteria and we would recommend filtering these from studies. We also evaluated the questions in an adversarial setting. We conclude with a set of recommended questions which researchers can use to recruit participants with programming skill from online platforms.

招募足够数量的专业程序员进行研究可能是具有挑战性的，因为他们通常无法抽出时间，或者由于他们的地理分布和潜在的成本。Clickworker或qualics等在线平台确实提供了招聘具有编程技能的参与者的选项;然而，误解和欺诈可能是一个问题。这可能导致没有编程技能的参与者参与研究和调查。如果这些参与者没有被发现，他们可能会在调查数据中造成有害的噪音。在本文中，我们开发了筛选问题，对于具有编程技能的人来说，这些问题易于快速回答，但对于没有编程技能的人来说，很难正确回答。为了评估问卷的有效性和效率，我们招募了几批有或没有编程技能的参与者，并对问题进行了测试。在我们的调查中，42%的clickworker声称他们有编程技能，但没有达到我们的标准，我们建议将这些从研究中过滤掉。我们还在对抗环境中评估了这些问题。我们总结了一组推荐问题，研究人员可以使用这些问题从在线平台招募具有编程技能的参与者。

{"title":"Do you Really Code? Designing and Evaluating Screening Questions for Online Surveys with Programmers","authors":"A. Danilova, Alena Naiakshina, S. Horstmann, Matthew Smith","doi":"10.1109/ICSE43902.2021.00057","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00057","url":null,"abstract":"Recruiting professional programmers in sufficient numbers for research studies can be challenging because they often cannot spare the time, or due to their geographical distribution and potentially the cost involved. Online platforms such as Clickworker or Qualtrics do provide options to recruit participants with programming skill; however, misunderstandings and fraud can be an issue. This can result in participants without programming skill taking part in studies and surveys. If these participants are not detected, they can cause detrimental noise in the survey data. In this paper, we develop screener questions that are easy and quick to answer for people with programming skill but difficult to answer correctly for those without. In order to evaluate our questionnaire for efficacy and efficiency, we recruited several batches of participants with and without programming skill and tested the questions. In our batch 42% of Clickworkers stating that they have programming skill did not meet our criteria and we would recommend filtering these from studies. We also evaluated the questions in an adversarial setting. We conclude with a set of recommended questions which researchers can use to recruit participants with programming skill from online platforms.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Growing a Test Corpus with Bonsai Fuzzing 用盆景模糊法培养测试语料库

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-07 DOI: 10.1109/ICSE43902.2021.00072

Vasudev Vikram, Rohan Padhye, Koushik Sen

This paper presents a coverage-guided grammar-based fuzzing technique for automatically synthesizing a corpus of concise test inputs. We walk-through a case study of a compiler designed for education and the corresponding problem of generating meaningful test cases to provide to students. The prior state-of-the-art solution is a combination of fuzzing and test-case reduction techniques such as variants of delta-debugging. Our key insight is that instead of attempting to minimize convoluted fuzzer-generated test inputs, we can instead grow concise test inputs by construction using a form of iterative deepening. We call this approach bonsai fuzzing. Experimental results show that bonsai fuzzing can generate test corpora having inputs that are 16–45% smaller in size on average as compared to a fuzz-then-reduce approach, while achieving approximately the same code coverage and fault-detection capability.

本文提出了一种基于覆盖引导语法的模糊测试技术，用于自动合成简洁测试输入的语料库。我们通过一个为教育设计的编译器的案例研究，以及生成提供给学生的有意义的测试用例的相应问题。先前最先进的解决方案是模糊测试和测试用例减少技术(如增量调试的变体)的组合。我们的关键见解是，我们可以通过使用迭代深化的形式来构建简洁的测试输入，而不是试图最小化复杂的模糊器生成的测试输入。我们称这种方法为盆景模糊。实验结果表明，盆景模糊可以生成的测试语料库的输入比模糊-然后减少的方法平均小16-45%，同时实现大致相同的代码覆盖率和故障检测能力。

引用次数: 10

Fine with “1234”? An Analysis of SMS One-Time Password Randomness in Android Apps “1234”可以吗?Android应用中短信一次性密码随机性分析

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-06 DOI: 10.1109/ICSE43902.2021.00148

Siqi Ma, Juanru Li, Hyoungshick Kim, E. Bertino, S. Nepal, D. Ostry, Cong Sun

A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potential attacks. In this paper, we present a vulnerability study against PRNGs implemented for Android apps. A key challenge is that PRNGs are typically implemented on the server-side, and thus the source code is not accessible. To resolve this issue, we build an analysis tool, OTP-Lint, to assess implementations of the PRNGs in an automated manner without the source code requirement. Through reverse engineering, OTP-Lint identifies the apps using SMS OTP and triggers each app's login functionality to retrieve OTP values. It further assesses the randomness of the OTP values to identify vulnerable PRNGs. By analyzing 6,431 commercially used Android apps downloaded from Google Play and Tencent Myapp, OTP-Lint identified 399 vulnerable apps that generate predictable OTP values. Even worse, 194 vulnerable apps use the OTP authentication alone without any additional security mechanisms, leading to insecure authentication against guessing attacks and replay attacks.

SMS一次性密码(OTP)的一个基本前提是，所使用的伪随机数(prn)对于每个登录会话都是唯一不可预测的。因此，生成prn的过程是OTP身份验证中最关键的步骤。伪随机数生成器(PRNG)的不正确实现将导致可预测甚至静态的OTP值，使它们容易受到潜在攻击。在本文中，我们提出了一个针对Android应用程序实现的prng的漏洞研究。一个关键的挑战是prng通常是在服务器端实现的，因此源代码是不可访问的。为了解决这个问题，我们构建了一个分析工具OTP-Lint，以自动化的方式评估prng的实现，而不需要源代码要求。通过逆向工程，OTP- lint识别使用SMS OTP的应用程序，并触发每个应用程序的登录功能来检索OTP值。进一步评估OTP值的随机性以识别易受攻击的prng。通过分析从Google Play和腾讯Myapp下载的6431个商用Android应用，OTP- lint确定了399个易受攻击的应用，这些应用会产生可预测的OTP值。更糟糕的是，194个易受攻击的应用程序单独使用OTP身份验证，而没有任何额外的安全机制，导致不安全的身份验证，以防止猜测攻击和重放攻击。

{"title":"Fine with “1234”? An Analysis of SMS One-Time Password Randomness in Android Apps","authors":"Siqi Ma, Juanru Li, Hyoungshick Kim, E. Bertino, S. Nepal, D. Ostry, Cong Sun","doi":"10.1109/ICSE43902.2021.00148","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00148","url":null,"abstract":"A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potential attacks. In this paper, we present a vulnerability study against PRNGs implemented for Android apps. A key challenge is that PRNGs are typically implemented on the server-side, and thus the source code is not accessible. To resolve this issue, we build an analysis tool, OTP-Lint, to assess implementations of the PRNGs in an automated manner without the source code requirement. Through reverse engineering, OTP-Lint identifies the apps using SMS OTP and triggers each app's login functionality to retrieve OTP values. It further assesses the randomness of the OTP values to identify vulnerable PRNGs. By analyzing 6,431 commercially used Android apps downloaded from Google Play and Tencent Myapp, OTP-Lint identified 399 vulnerable apps that generate predictable OTP values. Even worse, 194 vulnerable apps use the OTP authentication alone without any additional security mechanisms, leading to insecure authentication against guessing attacks and replay attacks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

We'll Fix It in Post: What Do Bug Fixes in Video Game Update Notes Tell Us? 我们将在帖子中修复:电子游戏更新说明中的Bug修复告诉我们什么?

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-06 DOI: 10.1109/ICSE43902.2021.00073

Andrew Truelove, E. Almeida, Iftekhar Ahmed

Bugs that persist into releases of video games can have negative impacts on both developers and users, but particular aspects of testing in game development can lead to difficulties in effectively catching these missed bugs. It has become common practice for developers to apply updates to games in order to fix missed bugs. These updates are often accompanied by notes that describe the changes to the game included in the update. However, some bugs reappear even after an update attempts to fix them. In this paper, we develop a taxonomy for bug types in games that is based on prior work. We examine 12,122 bug fixes from 723 updates for 30 popular games on the Steam platform. We label the bug fixes included in these updates to identify the frequency of these different bug types, the rate at which bug types recur over multiple updates, and which bug types are treated as more severe. Additionally, we survey game developers regarding their experience with different bug types and what aspects of game development they most strongly associate with bug appearance. We find that Information bugs appear the most frequently in updates, while Crash bugs recur the most frequently and are often treated as more severe than other bug types. Finally, we find that challenges in testing, code quality, and bug reproduction have a close association with bug persistence. These findings should help developers identify which aspects of game development could benefit from greater attention in order to prevent bugs. Researchers can use our results in devising tools and methods to better identify and address certain bug types.

电子游戏发行过程中持续存在的漏洞可能会对开发者和用户产生负面影响，但游戏开发中测试的某些方面可能会导致难以有效地发现这些遗漏的漏洞。开发者通过更新游戏来修复遗漏的漏洞已经成为一种常见做法。这些更新通常伴随着描述更新中包含的游戏变化的注释。然而，即使在更新尝试修复它们之后，一些错误仍会重新出现。在本文中，我们基于之前的工作为游戏中的漏洞类型制定了一个分类。我们检查了Steam平台上30款热门游戏的723次更新中的12122个bug修复。我们标记了这些更新中包含的错误修复，以确定这些不同错误类型的频率，错误类型在多个更新中重复出现的速率，以及哪些错误类型被视为更严重。此外，我们还调查了游戏开发者处理不同类型漏洞的经验，以及他们在游戏开发中与漏洞外观联系最紧密的方面。我们发现信息错误在更新中出现的频率最高，而崩溃错误出现的频率最高，而且通常被认为比其他类型的错误更严重。最后，我们发现测试、代码质量和bug再现方面的挑战与bug持久性有着密切的联系。这些发现将帮助开发者确定游戏开发的哪些方面值得关注，从而避免出现漏洞。研究人员可以使用我们的结果来设计工具和方法，以更好地识别和解决某些类型的bug。

{"title":"We'll Fix It in Post: What Do Bug Fixes in Video Game Update Notes Tell Us?","authors":"Andrew Truelove, E. Almeida, Iftekhar Ahmed","doi":"10.1109/ICSE43902.2021.00073","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00073","url":null,"abstract":"Bugs that persist into releases of video games can have negative impacts on both developers and users, but particular aspects of testing in game development can lead to difficulties in effectively catching these missed bugs. It has become common practice for developers to apply updates to games in order to fix missed bugs. These updates are often accompanied by notes that describe the changes to the game included in the update. However, some bugs reappear even after an update attempts to fix them. In this paper, we develop a taxonomy for bug types in games that is based on prior work. We examine 12,122 bug fixes from 723 updates for 30 popular games on the Steam platform. We label the bug fixes included in these updates to identify the frequency of these different bug types, the rate at which bug types recur over multiple updates, and which bug types are treated as more severe. Additionally, we survey game developers regarding their experience with different bug types and what aspects of game development they most strongly associate with bug appearance. We find that Information bugs appear the most frequently in updates, while Crash bugs recur the most frequently and are often treated as more severe than other bug types. Finally, we find that challenges in testing, code quality, and bug reproduction have a close association with bug persistence. These findings should help developers identify which aspects of game development could benefit from greater attention in order to prevent bugs. Researchers can use our results in devising tools and methods to better identify and address certain bug types.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130127407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

App’s Auto-Login Function Security Testing via Android OS-Level Virtualization 基于Android操作系统级别虚拟化的应用程序自动登录功能安全性测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-05 DOI: 10.1109/ICSE43902.2021.00149

Wenna Song, Jiang Ming, Lin Jiang, Han Yan, Yi Xiang, Yuan Chen, Jianming Fu, Guojun Peng

Limited by the small keyboard, most mobile apps support the automatic login feature for better user experience. Therefore, users avoid the inconvenience of retyping their ID and password when an app runs in the foreground again. However, this auto-login function can be exploited to launch the so-called "data-clone attack": once the locally-stored, auto-login depended data are cloned by attackers and placed into their own smartphones, attackers can break through the login-device number limit and log in to the victim's account stealthily. A natural countermeasure is to check the consistency of devicespecific attributes. As long as the new device shows different device fingerprints with the previous one, the app will disable the auto-login function and thus prevent data-clone attacks. In this paper, we develop VPDroid, a transparent Android OSlevel virtualization platform tailored for security testing. With VPDroid, security analysts can customize different device artifacts, such as CPU model, Android ID, and phone number, in a virtual phone without user-level API hooking. VPDroid's isolation mechanism ensures that user-mode apps in the virtual phone cannot detect device-specific discrepancies. To assess Android apps' susceptibility to the data-clone attack, we use VPDroid to simulate data-clone attacks with 234 most-downloaded apps. Our experiments on five different virtual phone environments show that VPDroid's device attribute customization can deceive all tested apps that perform device-consistency checks, such as Twitter, WeChat, and PayPal. 19 vendors have confirmed our report as a zero-day vulnerability. Our findings paint a cautionary tale: only enforcing a device-consistency check at client side is still vulnerable to an advanced data-clone attack.

由于键盘小，大多数移动应用都支持自动登录功能，以获得更好的用户体验。因此，当应用程序再次在前台运行时，用户避免了重新输入ID和密码的不便。然而，这种自动登录功能可以被利用来发动所谓的“数据克隆攻击”:一旦攻击者将本地存储的、依赖于自动登录的数据克隆到自己的智能手机中，攻击者就可以突破登录设备数量限制，偷偷地登录受害者的账户。一种自然的对策是检查设备特定属性的一致性。只要新设备显示的指纹与旧设备不同，应用程序就会禁用自动登录功能，从而防止数据克隆攻击。在本文中，我们开发了VPDroid，一个为安全测试量身定制的透明Android os级虚拟化平台。使用VPDroid，安全分析师可以在虚拟电话中定制不同的设备工件，例如CPU模型、Android ID和电话号码，而无需用户级API挂钩。VPDroid的隔离机制确保虚拟电话中的用户模式应用程序无法检测到设备特定的差异。为了评估Android应用对数据克隆攻击的敏感性，我们使用VPDroid对234个下载量最大的应用进行了数据克隆攻击模拟。我们在五种不同的虚拟电话环境中进行的实验表明，VPDroid的设备属性定制可以欺骗所有执行设备一致性检查的测试应用程序，如Twitter、微信和PayPal。19家供应商已经确认我们的报告是一个零日漏洞。我们的发现描绘了一个警世故事:仅在客户端强制执行设备一致性检查仍然容易受到高级数据克隆攻击。

{"title":"App’s Auto-Login Function Security Testing via Android OS-Level Virtualization","authors":"Wenna Song, Jiang Ming, Lin Jiang, Han Yan, Yi Xiang, Yuan Chen, Jianming Fu, Guojun Peng","doi":"10.1109/ICSE43902.2021.00149","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00149","url":null,"abstract":"Limited by the small keyboard, most mobile apps support the automatic login feature for better user experience. Therefore, users avoid the inconvenience of retyping their ID and password when an app runs in the foreground again. However, this auto-login function can be exploited to launch the so-called \"data-clone attack\": once the locally-stored, auto-login depended data are cloned by attackers and placed into their own smartphones, attackers can break through the login-device number limit and log in to the victim's account stealthily. A natural countermeasure is to check the consistency of devicespecific attributes. As long as the new device shows different device fingerprints with the previous one, the app will disable the auto-login function and thus prevent data-clone attacks. In this paper, we develop VPDroid, a transparent Android OSlevel virtualization platform tailored for security testing. With VPDroid, security analysts can customize different device artifacts, such as CPU model, Android ID, and phone number, in a virtual phone without user-level API hooking. VPDroid's isolation mechanism ensures that user-mode apps in the virtual phone cannot detect device-specific discrepancies. To assess Android apps' susceptibility to the data-clone attack, we use VPDroid to simulate data-clone attacks with 234 most-downloaded apps. Our experiments on five different virtual phone environments show that VPDroid's device attribute customization can deceive all tested apps that perform device-consistency checks, such as Twitter, WeChat, and PayPal. 19 vendors have confirmed our report as a zero-day vulnerability. Our findings paint a cautionary tale: only enforcing a device-consistency check at client side is still vulnerable to an advanced data-clone attack.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

DeepLocalize: Fault Localization for Deep Neural Networks 深度定位:深度神经网络的故障定位

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-03-04 DOI: 10.1109/ICSE43902.2021.00034

Mohammad Wardat, Wei Le, Hridesh Rajan

Deep Neural Networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.

深度神经网络(dnn)正在成为大多数软件系统不可或缺的一部分。之前的研究表明，深度神经网络存在缺陷。不幸的是，由于缺乏对模型行为的理解，现有的调试技术不支持定位DNN错误。整个DNN模型呈现为一个黑盒子。为了解决这些问题，我们提出了一种方法和工具，可以自动确定模型是否存在错误，并识别DNN错误的根本原因。我们的关键见解是，可以分析在层之间传播的值的历史趋势，以识别故障，并定位故障。为此，我们首先启用深度学习应用程序的动态分析:将其转换为命令式表示，或者使用回调机制。这两种机制都允许我们插入探针，当DNN在训练数据上进行训练时，可以对DNN产生的痕迹进行动态分析。然后，我们对轨迹进行动态分析，以识别导致错误的故障层或超参数。我们提出了一种识别根本原因的算法，该算法通过捕获任何数值误差和在训练过程中监测模型，并找到DNN结果上每个层/参数的相关性。我们从Stack Overflow和GitHub收集了一个包含40个错误模型和补丁的基准测试，这些模型和补丁包含深度学习应用程序中的实际错误。我们的基准可以用来评估自动调试工具和修复技术。我们使用这个DNN bug-and-patch基准测试评估了我们的方法，结果表明我们的方法比现状-实践的Keras库中使用的现有调试方法要有效得多。对于34/40个案例，我们的方法能够检测到错误，而Keras提供的最佳调试方法检测到32/40个错误。我们的方法能够定位21/40个bug，而Keras无法定位任何错误。

{"title":"DeepLocalize: Fault Localization for Deep Neural Networks","authors":"Mohammad Wardat, Wei Le, Hridesh Rajan","doi":"10.1109/ICSE43902.2021.00034","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00034","url":null,"abstract":"Deep Neural Networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114150650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47