On the Effectiveness of Random Testing for Android: Or How I Learned to Stop Worrying and Love the Monkey

2018 IEEE/ACM 13th International Workshop on Automation of Software Test (AST) Pub Date : 2018-05-28 DOI:10.1145/3194733.3194742

Priyam Patel, Gokul Srinivasan, Sydur Rahaman, Iulian Neamtiu

{"title":"On the Effectiveness of Random Testing for Android: Or How I Learned to Stop Worrying and Love the Monkey","authors":"Priyam Patel, Gokul Srinivasan, Sydur Rahaman, Iulian Neamtiu","doi":"10.1145/3194733.3194742","DOIUrl":null,"url":null,"abstract":"Random testing of Android apps is attractive due to ease-of-use and scalability, but its effectiveness could be questioned. Prior studies have shown that Monkey – a simple approach and tool for random testing of Android apps – is surprisingly effective, \"beating\" much more sophisticated tools by achieving higher coverage. We study how Monkey's parameters affect code coverage (at class, method, block, and line levels) and set out to answer several research questions centered around improving the effectiveness of Monkey-based random testing in Android, and how it compares with manual exploration. First, we show that random stress testing via Monkey is extremely efficient (85 seconds on average) and effective at crashing apps, including 15 widely-used apps that have millions (or even billions) of installs. Second, we vary Monkey's event distribution to change app behavior and measured the resulting coverage. We found that, except for isolated cases, altering Monkey's default event distribution is unlikely to lead to higher coverage. Third, we manually explore 62 apps and compare the resulting coverages; we found that coverage achieved via manual exploration is just 2-3% higher than that achieved via Monkey exploration. Finally, our analysis shows that coarse-grained coverage is highly indicative of fine-grained coverage, hence coarse-grained coverage (which imposes low collection overhead) hits a performance vs accuracy sweet spot.","PeriodicalId":423703,"journal":{"name":"2018 IEEE/ACM 13th International Workshop on Automation of Software Test (AST)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 13th International Workshop on Automation of Software Test (AST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194733.3194742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

Random testing of Android apps is attractive due to ease-of-use and scalability, but its effectiveness could be questioned. Prior studies have shown that Monkey – a simple approach and tool for random testing of Android apps – is surprisingly effective, "beating" much more sophisticated tools by achieving higher coverage. We study how Monkey's parameters affect code coverage (at class, method, block, and line levels) and set out to answer several research questions centered around improving the effectiveness of Monkey-based random testing in Android, and how it compares with manual exploration. First, we show that random stress testing via Monkey is extremely efficient (85 seconds on average) and effective at crashing apps, including 15 widely-used apps that have millions (or even billions) of installs. Second, we vary Monkey's event distribution to change app behavior and measured the resulting coverage. We found that, except for isolated cases, altering Monkey's default event distribution is unlikely to lead to higher coverage. Third, we manually explore 62 apps and compare the resulting coverages; we found that coverage achieved via manual exploration is just 2-3% higher than that achieved via Monkey exploration. Finally, our analysis shows that coarse-grained coverage is highly indicative of fine-grained coverage, hence coarse-grained coverage (which imposes low collection overhead) hits a performance vs accuracy sweet spot.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关于Android随机测试的有效性:或者我如何学会停止担忧并爱上猴子

Android应用的随机测试因其易用性和可扩展性而具有吸引力，但其有效性可能受到质疑。之前的研究表明，Monkey——一种用于随机测试Android应用程序的简单方法和工具——非常有效，通过获得更高的覆盖率，“击败”了更复杂的工具。我们研究了Monkey的参数如何影响代码覆盖率(在类，方法，块和行级别)，并着手回答围绕提高Android中基于Monkey的随机测试的有效性的几个研究问题，以及它与手动探索的比较。首先，我们发现通过Monkey进行的随机压力测试非常有效(平均85秒)并且能够有效地让应用崩溃，包括15款拥有数百万(甚至数十亿)安装量的广泛使用应用。其次，我们改变Monkey的事件分布以改变应用行为，并测量结果覆盖率。我们发现，除了个别情况，改变Monkey的默认事件分布不太可能导致更高的覆盖率。第三，我们手动探索62个应用程序并比较结果覆盖率;我们发现，通过人工探索获得的覆盖率仅比通过Monkey探索获得的覆盖率高2-3%。最后，我们的分析表明，粗粒度覆盖高度指示细粒度覆盖，因此粗粒度覆盖(施加较低的收集开销)达到了性能与准确性的最佳点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE/ACM 13th International Workshop on Automation of Software Test (AST)

自引率

0.00%

发文量