Pub Date : 2021-03-08DOI: 10.1109/ICSE43902.2021.00025
Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, G. Bavota
Code completion is one of the killer features of Integrated Development Environments (IDEs), and researchers have proposed different methods to improve its accuracy. While these techniques are valuable to speed up code writing, they are limited to recommendations related to the next few tokens a developer is likely to type given the current context. In the best case, they can recommend a few APIs that a developer is likely to use next. We present FeaRS, a novel retrieval-based approach that, given the current code a developer is writing in the IDE, can recommend the next complete method (i.e., signature and method body) that the developer is likely to implement. To do this, FeaRS exploits "implementation patterns" (i.e., groups of methods usually implemented within the same task) learned by mining thousands of open source projects. We instantiated our approach to the specific context of Android apps. A large-scale empirical evaluation we performed across more than 20k apps shows encouraging preliminary results, but also highlights future challenges to overcome.
{"title":"Siri, Write the Next Method","authors":"Fengcai Wen, Emad Aghajani, Csaba Nagy, Michele Lanza, G. Bavota","doi":"10.1109/ICSE43902.2021.00025","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00025","url":null,"abstract":"Code completion is one of the killer features of Integrated Development Environments (IDEs), and researchers have proposed different methods to improve its accuracy. While these techniques are valuable to speed up code writing, they are limited to recommendations related to the next few tokens a developer is likely to type given the current context. In the best case, they can recommend a few APIs that a developer is likely to use next. We present FeaRS, a novel retrieval-based approach that, given the current code a developer is writing in the IDE, can recommend the next complete method (i.e., signature and method body) that the developer is likely to implement. To do this, FeaRS exploits \"implementation patterns\" (i.e., groups of methods usually implemented within the same task) learned by mining thousands of open source projects. We instantiated our approach to the specific context of Android apps. A large-scale empirical evaluation we performed across more than 20k apps shows encouraging preliminary results, but also highlights future challenges to overcome.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122862638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-08DOI: 10.1109/ICSE43902.2021.00121
G. Ferreira, Limin Jia, Joshua Sunshine, Christian Kästner
The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesystem or network APIs. This offers the opportunity to enforce least-privilege design per package, protecting applications and package dependencies from malicious updates. We propose a lightweight permission system that protects Node.js applications by enforcing package permissions at runtime. We discuss the design space of solutions and show that our system makes a large number of packages much harder to be exploited, almost for free.
{"title":"Containing Malicious Package Updates in npm with a Lightweight Permission System","authors":"G. Ferreira, Limin Jia, Joshua Sunshine, Christian Kästner","doi":"10.1109/ICSE43902.2021.00121","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00121","url":null,"abstract":"The large amount of third-party packages available in fast-moving software ecosystems, such as Node.js/npm, enables attackers to compromise applications by pushing malicious updates to their package dependencies. Studying the npm repository, we observed that many packages in the npm repository that are used in Node.js applications perform only simple computations and do not need access to filesystem or network APIs. This offers the opportunity to enforce least-privilege design per package, protecting applications and package dependencies from malicious updates. We propose a lightweight permission system that protects Node.js applications by enforcing package permissions at runtime. We discuss the design space of solutions and show that our system makes a large number of packages much harder to be exploited, almost for free.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121933402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-08DOI: 10.1109/ICSE43902.2021.00063
An Ju, Hitesh Sajnani, Scot Kelly, Kim Herzig
Developers frequently move into new teams or environments across software companies. Their onboarding experience is correlated with productivity, job satisfaction, and other short-term and long-term outcomes. The majority of the onboarding process comprises engineering tasks such as fixing bugs or implementing small features. Nevertheless, we do not have a systematic view of how tasks influence onboarding. In this paper, we present a case study of Microsoft, where we interviewed 32 developers moving into a new team and 15 engineering managers onboarding a new developer into their team – to understand and characterize developers' onboarding experience and expectations in relation to the tasks performed by them while onboarding. We present how tasks interact with new developers through three representative themes: learning, confidence building, and socialization. We also discuss three onboarding strategies as inferred from the interviews that managers commonly use unknowingly, and discuss their pros and cons and offer situational recommendations. Furthermore, we triangulate our interview findings with a developer survey (N = 189) and a manager survey (N = 37) and find that survey results suggest that our findings are representative and our recommendations are actionable. Practitioners could use our findings to improve their onboarding processes, while researchers could find new research directions from this study to advance the understanding of developer onboarding. Our research instruments and anonymous data are available at https://zenodo.org/record/4455937#.YCOQCs 0lFd.
{"title":"A Case Study of Onboarding in Software Teams: Tasks and Strategies","authors":"An Ju, Hitesh Sajnani, Scot Kelly, Kim Herzig","doi":"10.1109/ICSE43902.2021.00063","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00063","url":null,"abstract":"Developers frequently move into new teams or environments across software companies. Their onboarding experience is correlated with productivity, job satisfaction, and other short-term and long-term outcomes. The majority of the onboarding process comprises engineering tasks such as fixing bugs or implementing small features. Nevertheless, we do not have a systematic view of how tasks influence onboarding. In this paper, we present a case study of Microsoft, where we interviewed 32 developers moving into a new team and 15 engineering managers onboarding a new developer into their team – to understand and characterize developers' onboarding experience and expectations in relation to the tasks performed by them while onboarding. We present how tasks interact with new developers through three representative themes: learning, confidence building, and socialization. We also discuss three onboarding strategies as inferred from the interviews that managers commonly use unknowingly, and discuss their pros and cons and offer situational recommendations. Furthermore, we triangulate our interview findings with a developer survey (N = 189) and a manager survey (N = 37) and find that survey results suggest that our findings are representative and our recommendations are actionable. Practitioners could use our findings to improve their onboarding processes, while researchers could find new research directions from this study to advance the understanding of developer onboarding. Our research instruments and anonymous data are available at https://zenodo.org/record/4455937#.YCOQCs 0lFd.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127171338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-07DOI: 10.1109/ICSE43902.2021.00045
Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu
The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered as candidates to achieve this task. Testers are expected to compare multiple DL models and select the more suitable ones w.r.t. the whole testing context. Due to the limitation of labeling effort, testers aim to select an efficient subset of samples to make an as precise rank estimation as possible for these models. To tackle this problem, we propose Sample Discrimination based Selection (SDS) to select efficient samples that could discriminate multiple models, i.e., the prediction behaviors (right/wrong) of these samples would be helpful to indicate the trend of model performance. To evaluate SDS, we conduct an extensive empirical study with three widely-used image datasets and 80 real world DL models. The experiment results show that, compared with state-of-the-art baseline methods, SDS is an effective and efficient sample selection method to rank multiple DL models.
深度学习技术的蓬勃发展导致了大量深度学习模型的建立和共享,这为深度学习模型的获取和重用提供了便利。对于给定的任务,我们会遇到具有相同功能的多个DL模型,这些模型被认为是实现该任务的候选模型。测试人员需要比较多个深度学习模型,并在整个测试环境中选择更合适的模型。由于标注工作的限制,测试人员的目标是选择一个有效的样本子集,以便对这些模型进行尽可能精确的秩估计。为了解决这一问题,我们提出了基于样本辨别的选择(Sample Discrimination based Selection, SDS)来选择能够区分多个模型的有效样本,即这些样本的预测行为(对/错)将有助于指示模型性能的趋势。为了评估SDS,我们对三个广泛使用的图像数据集和80个真实世界的DL模型进行了广泛的实证研究。实验结果表明,与现有的基线方法相比,SDS是一种有效的样本选择方法,可以对多个深度学习模型进行排序。
{"title":"Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models","authors":"Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, Baowen Xu","doi":"10.1109/ICSE43902.2021.00045","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00045","url":null,"abstract":"The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered as candidates to achieve this task. Testers are expected to compare multiple DL models and select the more suitable ones w.r.t. the whole testing context. Due to the limitation of labeling effort, testers aim to select an efficient subset of samples to make an as precise rank estimation as possible for these models. To tackle this problem, we propose Sample Discrimination based Selection (SDS) to select efficient samples that could discriminate multiple models, i.e., the prediction behaviors (right/wrong) of these samples would be helpful to indicate the trend of model performance. To evaluate SDS, we conduct an extensive empirical study with three widely-used image datasets and 80 real world DL models. The experiment results show that, compared with state-of-the-art baseline methods, SDS is an effective and efficient sample selection method to rank multiple DL models.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115399293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-07DOI: 10.1109/ICSE43902.2021.00057
A. Danilova, Alena Naiakshina, S. Horstmann, Matthew Smith
Recruiting professional programmers in sufficient numbers for research studies can be challenging because they often cannot spare the time, or due to their geographical distribution and potentially the cost involved. Online platforms such as Clickworker or Qualtrics do provide options to recruit participants with programming skill; however, misunderstandings and fraud can be an issue. This can result in participants without programming skill taking part in studies and surveys. If these participants are not detected, they can cause detrimental noise in the survey data. In this paper, we develop screener questions that are easy and quick to answer for people with programming skill but difficult to answer correctly for those without. In order to evaluate our questionnaire for efficacy and efficiency, we recruited several batches of participants with and without programming skill and tested the questions. In our batch 42% of Clickworkers stating that they have programming skill did not meet our criteria and we would recommend filtering these from studies. We also evaluated the questions in an adversarial setting. We conclude with a set of recommended questions which researchers can use to recruit participants with programming skill from online platforms.
{"title":"Do you Really Code? Designing and Evaluating Screening Questions for Online Surveys with Programmers","authors":"A. Danilova, Alena Naiakshina, S. Horstmann, Matthew Smith","doi":"10.1109/ICSE43902.2021.00057","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00057","url":null,"abstract":"Recruiting professional programmers in sufficient numbers for research studies can be challenging because they often cannot spare the time, or due to their geographical distribution and potentially the cost involved. Online platforms such as Clickworker or Qualtrics do provide options to recruit participants with programming skill; however, misunderstandings and fraud can be an issue. This can result in participants without programming skill taking part in studies and surveys. If these participants are not detected, they can cause detrimental noise in the survey data. In this paper, we develop screener questions that are easy and quick to answer for people with programming skill but difficult to answer correctly for those without. In order to evaluate our questionnaire for efficacy and efficiency, we recruited several batches of participants with and without programming skill and tested the questions. In our batch 42% of Clickworkers stating that they have programming skill did not meet our criteria and we would recommend filtering these from studies. We also evaluated the questions in an adversarial setting. We conclude with a set of recommended questions which researchers can use to recruit participants with programming skill from online platforms.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-07DOI: 10.1109/ICSE43902.2021.00072
Vasudev Vikram, Rohan Padhye, Koushik Sen
This paper presents a coverage-guided grammar-based fuzzing technique for automatically synthesizing a corpus of concise test inputs. We walk-through a case study of a compiler designed for education and the corresponding problem of generating meaningful test cases to provide to students. The prior state-of-the-art solution is a combination of fuzzing and test-case reduction techniques such as variants of delta-debugging. Our key insight is that instead of attempting to minimize convoluted fuzzer-generated test inputs, we can instead grow concise test inputs by construction using a form of iterative deepening. We call this approach bonsai fuzzing. Experimental results show that bonsai fuzzing can generate test corpora having inputs that are 16–45% smaller in size on average as compared to a fuzz-then-reduce approach, while achieving approximately the same code coverage and fault-detection capability.
{"title":"Growing a Test Corpus with Bonsai Fuzzing","authors":"Vasudev Vikram, Rohan Padhye, Koushik Sen","doi":"10.1109/ICSE43902.2021.00072","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00072","url":null,"abstract":"This paper presents a coverage-guided grammar-based fuzzing technique for automatically synthesizing a corpus of concise test inputs. We walk-through a case study of a compiler designed for education and the corresponding problem of generating meaningful test cases to provide to students. The prior state-of-the-art solution is a combination of fuzzing and test-case reduction techniques such as variants of delta-debugging. Our key insight is that instead of attempting to minimize convoluted fuzzer-generated test inputs, we can instead grow concise test inputs by construction using a form of iterative deepening. We call this approach bonsai fuzzing. Experimental results show that bonsai fuzzing can generate test corpora having inputs that are 16–45% smaller in size on average as compared to a fuzz-then-reduce approach, while achieving approximately the same code coverage and fault-detection capability.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-06DOI: 10.1109/ICSE43902.2021.00148
Siqi Ma, Juanru Li, Hyoungshick Kim, E. Bertino, S. Nepal, D. Ostry, Cong Sun
A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potential attacks. In this paper, we present a vulnerability study against PRNGs implemented for Android apps. A key challenge is that PRNGs are typically implemented on the server-side, and thus the source code is not accessible. To resolve this issue, we build an analysis tool, OTP-Lint, to assess implementations of the PRNGs in an automated manner without the source code requirement. Through reverse engineering, OTP-Lint identifies the apps using SMS OTP and triggers each app's login functionality to retrieve OTP values. It further assesses the randomness of the OTP values to identify vulnerable PRNGs. By analyzing 6,431 commercially used Android apps downloaded from Google Play and Tencent Myapp, OTP-Lint identified 399 vulnerable apps that generate predictable OTP values. Even worse, 194 vulnerable apps use the OTP authentication alone without any additional security mechanisms, leading to insecure authentication against guessing attacks and replay attacks.
{"title":"Fine with “1234”? An Analysis of SMS One-Time Password Randomness in Android Apps","authors":"Siqi Ma, Juanru Li, Hyoungshick Kim, E. Bertino, S. Nepal, D. Ostry, Cong Sun","doi":"10.1109/ICSE43902.2021.00148","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00148","url":null,"abstract":"A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potential attacks. In this paper, we present a vulnerability study against PRNGs implemented for Android apps. A key challenge is that PRNGs are typically implemented on the server-side, and thus the source code is not accessible. To resolve this issue, we build an analysis tool, OTP-Lint, to assess implementations of the PRNGs in an automated manner without the source code requirement. Through reverse engineering, OTP-Lint identifies the apps using SMS OTP and triggers each app's login functionality to retrieve OTP values. It further assesses the randomness of the OTP values to identify vulnerable PRNGs. By analyzing 6,431 commercially used Android apps downloaded from Google Play and Tencent Myapp, OTP-Lint identified 399 vulnerable apps that generate predictable OTP values. Even worse, 194 vulnerable apps use the OTP authentication alone without any additional security mechanisms, leading to insecure authentication against guessing attacks and replay attacks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-06DOI: 10.1109/ICSE43902.2021.00073
Andrew Truelove, E. Almeida, Iftekhar Ahmed
Bugs that persist into releases of video games can have negative impacts on both developers and users, but particular aspects of testing in game development can lead to difficulties in effectively catching these missed bugs. It has become common practice for developers to apply updates to games in order to fix missed bugs. These updates are often accompanied by notes that describe the changes to the game included in the update. However, some bugs reappear even after an update attempts to fix them. In this paper, we develop a taxonomy for bug types in games that is based on prior work. We examine 12,122 bug fixes from 723 updates for 30 popular games on the Steam platform. We label the bug fixes included in these updates to identify the frequency of these different bug types, the rate at which bug types recur over multiple updates, and which bug types are treated as more severe. Additionally, we survey game developers regarding their experience with different bug types and what aspects of game development they most strongly associate with bug appearance. We find that Information bugs appear the most frequently in updates, while Crash bugs recur the most frequently and are often treated as more severe than other bug types. Finally, we find that challenges in testing, code quality, and bug reproduction have a close association with bug persistence. These findings should help developers identify which aspects of game development could benefit from greater attention in order to prevent bugs. Researchers can use our results in devising tools and methods to better identify and address certain bug types.
{"title":"We'll Fix It in Post: What Do Bug Fixes in Video Game Update Notes Tell Us?","authors":"Andrew Truelove, E. Almeida, Iftekhar Ahmed","doi":"10.1109/ICSE43902.2021.00073","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00073","url":null,"abstract":"Bugs that persist into releases of video games can have negative impacts on both developers and users, but particular aspects of testing in game development can lead to difficulties in effectively catching these missed bugs. It has become common practice for developers to apply updates to games in order to fix missed bugs. These updates are often accompanied by notes that describe the changes to the game included in the update. However, some bugs reappear even after an update attempts to fix them. In this paper, we develop a taxonomy for bug types in games that is based on prior work. We examine 12,122 bug fixes from 723 updates for 30 popular games on the Steam platform. We label the bug fixes included in these updates to identify the frequency of these different bug types, the rate at which bug types recur over multiple updates, and which bug types are treated as more severe. Additionally, we survey game developers regarding their experience with different bug types and what aspects of game development they most strongly associate with bug appearance. We find that Information bugs appear the most frequently in updates, while Crash bugs recur the most frequently and are often treated as more severe than other bug types. Finally, we find that challenges in testing, code quality, and bug reproduction have a close association with bug persistence. These findings should help developers identify which aspects of game development could benefit from greater attention in order to prevent bugs. Researchers can use our results in devising tools and methods to better identify and address certain bug types.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130127407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-05DOI: 10.1109/ICSE43902.2021.00149
Wenna Song, Jiang Ming, Lin Jiang, Han Yan, Yi Xiang, Yuan Chen, Jianming Fu, Guojun Peng
Limited by the small keyboard, most mobile apps support the automatic login feature for better user experience. Therefore, users avoid the inconvenience of retyping their ID and password when an app runs in the foreground again. However, this auto-login function can be exploited to launch the so-called "data-clone attack": once the locally-stored, auto-login depended data are cloned by attackers and placed into their own smartphones, attackers can break through the login-device number limit and log in to the victim's account stealthily. A natural countermeasure is to check the consistency of devicespecific attributes. As long as the new device shows different device fingerprints with the previous one, the app will disable the auto-login function and thus prevent data-clone attacks. In this paper, we develop VPDroid, a transparent Android OSlevel virtualization platform tailored for security testing. With VPDroid, security analysts can customize different device artifacts, such as CPU model, Android ID, and phone number, in a virtual phone without user-level API hooking. VPDroid's isolation mechanism ensures that user-mode apps in the virtual phone cannot detect device-specific discrepancies. To assess Android apps' susceptibility to the data-clone attack, we use VPDroid to simulate data-clone attacks with 234 most-downloaded apps. Our experiments on five different virtual phone environments show that VPDroid's device attribute customization can deceive all tested apps that perform device-consistency checks, such as Twitter, WeChat, and PayPal. 19 vendors have confirmed our report as a zero-day vulnerability. Our findings paint a cautionary tale: only enforcing a device-consistency check at client side is still vulnerable to an advanced data-clone attack.
{"title":"App’s Auto-Login Function Security Testing via Android OS-Level Virtualization","authors":"Wenna Song, Jiang Ming, Lin Jiang, Han Yan, Yi Xiang, Yuan Chen, Jianming Fu, Guojun Peng","doi":"10.1109/ICSE43902.2021.00149","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00149","url":null,"abstract":"Limited by the small keyboard, most mobile apps support the automatic login feature for better user experience. Therefore, users avoid the inconvenience of retyping their ID and password when an app runs in the foreground again. However, this auto-login function can be exploited to launch the so-called \"data-clone attack\": once the locally-stored, auto-login depended data are cloned by attackers and placed into their own smartphones, attackers can break through the login-device number limit and log in to the victim's account stealthily. A natural countermeasure is to check the consistency of devicespecific attributes. As long as the new device shows different device fingerprints with the previous one, the app will disable the auto-login function and thus prevent data-clone attacks. In this paper, we develop VPDroid, a transparent Android OSlevel virtualization platform tailored for security testing. With VPDroid, security analysts can customize different device artifacts, such as CPU model, Android ID, and phone number, in a virtual phone without user-level API hooking. VPDroid's isolation mechanism ensures that user-mode apps in the virtual phone cannot detect device-specific discrepancies. To assess Android apps' susceptibility to the data-clone attack, we use VPDroid to simulate data-clone attacks with 234 most-downloaded apps. Our experiments on five different virtual phone environments show that VPDroid's device attribute customization can deceive all tested apps that perform device-consistency checks, such as Twitter, WeChat, and PayPal. 19 vendors have confirmed our report as a zero-day vulnerability. Our findings paint a cautionary tale: only enforcing a device-consistency check at client side is still vulnerable to an advanced data-clone attack.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-04DOI: 10.1109/ICSE43902.2021.00034
Mohammad Wardat, Wei Le, Hridesh Rajan
Deep Neural Networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.
{"title":"DeepLocalize: Fault Localization for Deep Neural Networks","authors":"Mohammad Wardat, Wei Le, Hridesh Rajan","doi":"10.1109/ICSE43902.2021.00034","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00034","url":null,"abstract":"Deep Neural Networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114150650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}