Chijin Zhou, Quan Zhang, Mingzhe Wang, Lihua Guo, Jie Liang, Zhe Liu, Mathias Payer, Yuting Jiang
Browser APIs are essential to the modern web experience. Due to their large number and complexity, they vastly expand the attack surface of browsers. To detect vulnerabilities in these APIs, fuzzers generate test cases with a large amount of random API invocations. However, the massive search space formed by arbitrary API combinations hinders their effectiveness: since randomly-picked API invocations unlikely interfere with each other (i.e., compute on partially shared data), few interesting API interactions are explored. Consequently, reducing the search space by revealing inter-API relations is a major challenge in browser fuzzing. We propose Minerva, an efficient browser fuzzer for browser API bug detection. The key idea is to leverage API interference relations to reduce redundancy and improve coverage. Minerva consists of two modules: dynamic mod-ref analysis and guided code generation. Before fuzzing starts, the dynamic mod-ref analysis module builds an API interference graph. It first automatically identifies individual browser APIs from the browser’s code base. Next, it instruments the browser to dynamically collect mod-ref relations between APIs. During fuzzing, the guided code generation module synthesizes highly-relevant API invocations guided by the mod-ref relations. We evaluate Minerva on three mainstream browsers, i.e. Safari, FireFox, and Chromium. Compared to state-of-the-art fuzzers, Minerva improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs. Besides, Minerva has discovered 35 previously-unknown bugs out of which 20 have been fixed with 5 CVEs assigned and acknowledged by browser vendors.
{"title":"Minerva: browser API fuzzing with dynamic mod-ref analysis","authors":"Chijin Zhou, Quan Zhang, Mingzhe Wang, Lihua Guo, Jie Liang, Zhe Liu, Mathias Payer, Yuting Jiang","doi":"10.1145/3540250.3549107","DOIUrl":"https://doi.org/10.1145/3540250.3549107","url":null,"abstract":"Browser APIs are essential to the modern web experience. Due to their large number and complexity, they vastly expand the attack surface of browsers. To detect vulnerabilities in these APIs, fuzzers generate test cases with a large amount of random API invocations. However, the massive search space formed by arbitrary API combinations hinders their effectiveness: since randomly-picked API invocations unlikely interfere with each other (i.e., compute on partially shared data), few interesting API interactions are explored. Consequently, reducing the search space by revealing inter-API relations is a major challenge in browser fuzzing. We propose Minerva, an efficient browser fuzzer for browser API bug detection. The key idea is to leverage API interference relations to reduce redundancy and improve coverage. Minerva consists of two modules: dynamic mod-ref analysis and guided code generation. Before fuzzing starts, the dynamic mod-ref analysis module builds an API interference graph. It first automatically identifies individual browser APIs from the browser’s code base. Next, it instruments the browser to dynamically collect mod-ref relations between APIs. During fuzzing, the guided code generation module synthesizes highly-relevant API invocations guided by the mod-ref relations. We evaluate Minerva on three mainstream browsers, i.e. Safari, FireFox, and Chromium. Compared to state-of-the-art fuzzers, Minerva improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs. Besides, Minerva has discovered 35 previously-unknown bugs out of which 20 have been fixed with 5 CVEs assigned and acknowledged by browser vendors.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87123057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-device replay for Android apps is challenging because apps have to adapt or even restructure their GUIs responsively upon screen-size or orientation change across devices. As a first exploratory work, this paper demonstrates that cross-device record and replay can be made simple and practical by a one-pass, greedy algorithm by the Rx framework leveraging the least surprise principle in the GUI design. The experimental results of over 1,000 replay settings encouragingly show that our implemented Rx prototype tool effectively solved non-trivial cross-device replay cases beyond any known non-search-based work's scope, and had still competitive capabilities on same-device replay with start-of-the-art techniques.
{"title":"Cross-device record and replay for Android apps","authors":"Cong Li, Yanyan Jiang, Chang Xu","doi":"10.1145/3540250.3549083","DOIUrl":"https://doi.org/10.1145/3540250.3549083","url":null,"abstract":"Cross-device replay for Android apps is challenging because apps have to adapt or even restructure their GUIs responsively upon screen-size or orientation change across devices. As a first exploratory work, this paper demonstrates that cross-device record and replay can be made simple and practical by a one-pass, greedy algorithm by the Rx framework leveraging the least surprise principle in the GUI design. The experimental results of over 1,000 replay settings encouragingly show that our implemented Rx prototype tool effectively solved non-trivial cross-device replay cases beyond any known non-search-based work's scope, and had still competitive capabilities on same-device replay with start-of-the-art techniques.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87559080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Programming languages are artificial and highly restricted languages. But source code is there to tell computers as well as programmers what to do, as an act of communication. Despite its weird syntax and is riddled with different delimiters, the good news is that the very large corpus of open-source code is available. That makes it reasonable to apply machine learning techniques to source code to enable the source code analytics. Despite there are plenty of deep learning frameworks in the field of NLP, source code analytics has different features. In addition to the conventional way of coding, understanding the meaning of code involves many perspectives. The source code representation could be the token sequence, the API call sequence, the data dependency graph, and the control flow graph, as well as the program hierarchy, etc. This tutorial will tell the long, ongoing, and fruitful journey on exploiting the potential power of deep learning techniques in source code analytics. It will highlight that how code representation models can be utilized to support software engineers to perform different tasks that require proficient programming knowledge. The exploratory work show that code does imply the learnable knowledge, more precisely the learnable tacit knowledge. Although such knowledge is not easily transferrable between humans, it can be transferred between the automated programming tasks. A vision for future research will be stated for source code analytics.
{"title":"Multi-perspective representation learning for source code analytics (invited tutorial)","authors":"Zhi Jin","doi":"10.1145/3540250.3569446","DOIUrl":"https://doi.org/10.1145/3540250.3569446","url":null,"abstract":"Programming languages are artificial and highly restricted languages. But source code is there to tell computers as well as programmers what to do, as an act of communication. Despite its weird syntax and is riddled with different delimiters, the good news is that the very large corpus of open-source code is available. That makes it reasonable to apply machine learning techniques to source code to enable the source code analytics. Despite there are plenty of deep learning frameworks in the field of NLP, source code analytics has different features. In addition to the conventional way of coding, understanding the meaning of code involves many perspectives. The source code representation could be the token sequence, the API call sequence, the data dependency graph, and the control flow graph, as well as the program hierarchy, etc. This tutorial will tell the long, ongoing, and fruitful journey on exploiting the potential power of deep learning techniques in source code analytics. It will highlight that how code representation models can be utilized to support software engineers to perform different tasks that require proficient programming knowledge. The exploratory work show that code does imply the learnable knowledge, more precisely the learnable tacit knowledge. Although such knowledge is not easily transferrable between humans, it can be transferred between the automated programming tasks. A vision for future research will be stated for source code analytics.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86629648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Users increasingly rely on mobile apps for everyday tasks, including security- and privacy-sensitive tasks such as online banking, e-health, and e-government. Additionally, a wealth of sensors captures the movements and habits of the users for fitness tracking and convenience. Despite legal regulations imposing requirements and limits on the processing of privacy-sensitive data, users must still trust the app developers to apply suffcient protections. In this paper, we investigate the state of security in Android apps and how security-related code smells have evolved since the introduction of the Android operating system. With an analysis of 300 apps per year over 12 years between 2010 and 2021 from the Google Play Store, we find that the number of code scanner findings per thousand lines of code decreases over time. Still, this development is offset by the increase in code size. Apps have more and more findings, suggesting that the overall security level decreases. This trend is driven by flaws in the use of cryptography, insecure compiler flags, insecure uses of WebView components, and insecure uses of language features such as reflection. Based on our data, we argue for stricter controls on apps before admission to the store.
用户越来越依赖移动应用程序来完成日常任务,包括对安全和隐私敏感的任务,如网上银行、电子医疗和电子政务。此外,丰富的传感器捕捉用户的运动和习惯,以进行健身跟踪和方便。尽管法律法规对隐私敏感数据的处理施加了要求和限制,但用户仍然必须相信应用程序开发人员会提供足够的保护。在本文中,我们研究了Android应用程序的安全状态,以及自Android操作系统引入以来,与安全相关的代码气味是如何演变的。通过对Google Play Store从2010年到2021年的12年间每年300个应用的分析,我们发现每千行代码中代码扫描器发现的数量随着时间的推移而减少。但是,这种开发被代码大小的增加所抵消。应用程序有越来越多的发现,表明整体安全水平下降。这种趋势是由加密使用中的缺陷、不安全的编译器标志、不安全的WebView组件使用以及不安全的语言特性(如反射)使用所驱动的。根据我们的数据,我们主张在应用程序进入商店之前对其进行更严格的控制。
{"title":"Security code smells in apps: are we getting better?","authors":"Steven Arzt","doi":"10.1145/3540250.3549091","DOIUrl":"https://doi.org/10.1145/3540250.3549091","url":null,"abstract":"Users increasingly rely on mobile apps for everyday tasks, including security- and privacy-sensitive tasks such as online banking, e-health, and e-government. Additionally, a wealth of sensors captures the movements and habits of the users for fitness tracking and convenience. Despite legal regulations imposing requirements and limits on the processing of privacy-sensitive data, users must still trust the app developers to apply suffcient protections. In this paper, we investigate the state of security in Android apps and how security-related code smells have evolved since the introduction of the Android operating system. With an analysis of 300 apps per year over 12 years between 2010 and 2021 from the Google Play Store, we find that the number of code scanner findings per thousand lines of code decreases over time. Still, this development is offset by the increase in code size. Apps have more and more findings, suggesting that the overall security level decreases. This trend is driven by flaws in the use of cryptography, insecure compiler flags, insecure uses of WebView components, and insecure uses of language features such as reflection. Based on our data, we argue for stricter controls on apps before admission to the store.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83504917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed ~10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.
{"title":"Pair programming conversations with agents vs. developers: challenges and opportunities for SE community","authors":"Peter Robe, S. Kuttal, J. AuBuchon, Jacob C. Hart","doi":"10.1145/3540250.3549127","DOIUrl":"https://doi.org/10.1145/3540250.3549127","url":null,"abstract":"Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed ~10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90327735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao He, Haonan Su, Wenxin Xiao, Runzhi He, Minghui Zhou
To facilitate newcomer onboarding, GitHub recommends the use of "good first issue" (GFI) labels to signal issues suitable for newcomers to resolve. However, previous research shows that manually labeled GFIs are scarce and inappropriate, showing a need for automated recommendations. In this paper, we present GFI-Bot (accessible at https://gfibot.io), a proof-of-concept machine learning powered bot for automated GFI recommendation in practice. Project maintainers can configure GFI-Bot to discover and label possible GFIs so that newcomers can easily locate issues for making their first contributions. GFI-Bot also provides a high-quality, up-to-date dataset for advancing GFI recommendation research.
{"title":"GFI-bot: automated good first issue recommendation on GitHub","authors":"Hao He, Haonan Su, Wenxin Xiao, Runzhi He, Minghui Zhou","doi":"10.1145/3540250.3558922","DOIUrl":"https://doi.org/10.1145/3540250.3558922","url":null,"abstract":"To facilitate newcomer onboarding, GitHub recommends the use of \"good first issue\" (GFI) labels to signal issues suitable for newcomers to resolve. However, previous research shows that manually labeled GFIs are scarce and inappropriate, showing a need for automated recommendations. In this paper, we present GFI-Bot (accessible at https://gfibot.io), a proof-of-concept machine learning powered bot for automated GFI recommendation in practice. Project maintainers can configure GFI-Bot to discover and label possible GFIs so that newcomers can easily locate issues for making their first contributions. GFI-Bot also provides a high-quality, up-to-date dataset for advancing GFI recommendation research.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"2020 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75481897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the openness of crowdsourced testing, mobile app crowdsourced testing has been subject to duplicate reports. The previous research methods extract the textual features of the crowdsourced test reports, combine with shallow image analysis, and perform unsupervised clustering on the crowdsourced test reports to clarify the duplication of crowdsourced test reports and solve the problem. However, these methods ignore the semantic connection between textual descriptions and screenshots, making the clustering results unsatisfactory and the deduplication effect less accurate. This paper proposes a semi-supervised clustering tool for crowdsourced test reports with deep image understanding, namely SemCluster, which makes the most of the semantic connection between textual descriptions and screenshots by constructing semantic binding rules and performing semi-supervised clustering. SemCluster improves six metrics of clustering results in the experiment compared to the state-of-the-art method, which verifies that SemCluster has achieved a good deduplication effect. The demo can be found at: https://sites.google.com/view/semcluster-demo.
{"title":"SemCluster: a semi-supervised clustering tool for crowdsourced test reports with deep image understanding","authors":"Mingzhe Du, Shengcheng Yu, Chunrong Fang, Tongyu Li, Heyuan Zhang, Zhenyu Chen","doi":"10.1145/3540250.3558933","DOIUrl":"https://doi.org/10.1145/3540250.3558933","url":null,"abstract":"Due to the openness of crowdsourced testing, mobile app crowdsourced testing has been subject to duplicate reports. The previous research methods extract the textual features of the crowdsourced test reports, combine with shallow image analysis, and perform unsupervised clustering on the crowdsourced test reports to clarify the duplication of crowdsourced test reports and solve the problem. However, these methods ignore the semantic connection between textual descriptions and screenshots, making the clustering results unsatisfactory and the deduplication effect less accurate. This paper proposes a semi-supervised clustering tool for crowdsourced test reports with deep image understanding, namely SemCluster, which makes the most of the semantic connection between textual descriptions and screenshots by constructing semantic binding rules and performing semi-supervised clustering. SemCluster improves six metrics of clustering results in the experiment compared to the state-of-the-art method, which verifies that SemCluster has achieved a good deduplication effect. The demo can be found at: https://sites.google.com/view/semcluster-demo.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74919479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present PyTER, an automated program repair (APR) technique for Python type errors. Python developers struggle with type error exceptions that are prevalent and difficult to fix. Despite the importance, however, automatically repairing type errors in dynamically typed languages such as Python has received little attention in the APR community and no existing techniques are readily available for practical use. PyTER is the first technique that is carefully designed to fix diverse type errors in real-world Python applications. To this end, we present a novel APR approach that uses dynamic and static analyses to infer correct and incorrect types of program variables, and leverage their difference to effectively identify faulty locations and patch candidates. We evaluated PyTER on 93 type errors collected from open-source projects. The result shows that PyTER is able to fix 48.4% of them with a precision of 77.6%.
{"title":"PyTER: effective program repair for Python type errors","authors":"Wonseok Oh, Hakjoo Oh","doi":"10.1145/3540250.3549130","DOIUrl":"https://doi.org/10.1145/3540250.3549130","url":null,"abstract":"We present PyTER, an automated program repair (APR) technique for Python type errors. Python developers struggle with type error exceptions that are prevalent and difficult to fix. Despite the importance, however, automatically repairing type errors in dynamically typed languages such as Python has received little attention in the APR community and no existing techniques are readily available for practical use. PyTER is the first technique that is carefully designed to fix diverse type errors in real-world Python applications. To this end, we present a novel APR approach that uses dynamic and static analyses to infer correct and incorrect types of program variables, and leverage their difference to effectively identify faulty locations and patch candidates. We evaluated PyTER on 93 type errors collected from open-source projects. The result shows that PyTER is able to fix 48.4% of them with a precision of 77.6%.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72910499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés
Online testing of web APIs—testing APIs in production—is gaining traction in industry. Platforms such as RapidAPI and Sauce Labs provide online testing and monitoring services of web APIs 24/7, typically by re-executing manually designed test cases on the target APIs on a regular basis. In parallel, research on the automated generation of test cases for RESTful APIs has seen significant advances in recent years. However, despite their promising results in the lab, it is unclear whether research tools would scale to industrial-size settings and, more importantly, how they would perform in an online testing setup, increasingly common in practice. In this paper, we report the results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs. Specifically, we used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases. To scale at this level, we had to transition from a monolithic tool approach to a multi-bot architecture with over 200 bots working cooperatively in tasks like test generation and reporting. As a result, we uncovered about 390K failures, which we conservatively triaged into 254 bugs, 65 of which have been acknowledged or fixed by developers to date. Among others, we identified confirmed faults in the APIs of Amadeus, Foursquare, Yelp, and YouTube, accessed by millions of applications worldwide. More importantly, our reports have guided developers on improving their APIs, including bug fixes and documentation updates in the APIs of Amadeus and YouTube. Our results show the potential of online testing of RESTful APIs as the next must-have feature in industry, but also some of the key challenges to overcome for its full adoption in practice.
web api的在线测试——在生产环境中测试api——在工业中越来越受欢迎。RapidAPI和Sauce Labs等平台提供全天候的web api在线测试和监控服务,通常是定期在目标api上重新执行手动设计的测试用例。与此同时,对RESTful api测试用例自动生成的研究近年来取得了重大进展。然而,尽管这些研究工具在实验室中取得了可喜的成果,但目前尚不清楚它们是否可以扩展到工业规模,更重要的是,它们在实践中日益普遍的在线测试设置中如何表现。在本文中,我们报告了一项关于使用自动化测试用例生成方法在线测试RESTful api的实证研究结果。具体来说,我们使用rest框架在13个工业api中自动生成和执行测试用例,连续15天不间断,产生了超过一百万个测试用例。为了在这个级别上进行扩展,我们必须从单一的工具方法过渡到多机器人架构,其中有超过200个机器人在测试生成和报告等任务中协同工作。结果,我们发现了大约390K个故障,我们保守地将其分类为254个错误,其中65个已经被开发人员承认或修复。其中,我们在Amadeus、Foursquare、Yelp和YouTube的api中发现了已确认的错误,全球有数百万应用程序访问这些api。更重要的是,我们的报告指导开发人员改进他们的api,包括修复Amadeus和YouTube api中的错误和文档更新。我们的研究结果显示了RESTful api在线测试作为工业界下一个必备特性的潜力,但也显示了在实践中全面采用RESTful api需要克服的一些关键挑战。
{"title":"Online testing of RESTful APIs: promises and challenges","authors":"Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés","doi":"10.1145/3540250.3549144","DOIUrl":"https://doi.org/10.1145/3540250.3549144","url":null,"abstract":"Online testing of web APIs—testing APIs in production—is gaining traction in industry. Platforms such as RapidAPI and Sauce Labs provide online testing and monitoring services of web APIs 24/7, typically by re-executing manually designed test cases on the target APIs on a regular basis. In parallel, research on the automated generation of test cases for RESTful APIs has seen significant advances in recent years. However, despite their promising results in the lab, it is unclear whether research tools would scale to industrial-size settings and, more importantly, how they would perform in an online testing setup, increasingly common in practice. In this paper, we report the results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs. Specifically, we used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases. To scale at this level, we had to transition from a monolithic tool approach to a multi-bot architecture with over 200 bots working cooperatively in tasks like test generation and reporting. As a result, we uncovered about 390K failures, which we conservatively triaged into 254 bugs, 65 of which have been acknowledged or fixed by developers to date. Among others, we identified confirmed faults in the APIs of Amadeus, Foursquare, Yelp, and YouTube, accessed by millions of applications worldwide. More importantly, our reports have guided developers on improving their APIs, including bug fixes and documentation updates in the APIs of Amadeus and YouTube. Our results show the potential of online testing of RESTful APIs as the next must-have feature in industry, but also some of the key challenges to overcome for its full adoption in practice.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74352645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microservice architecture design highly relies on expert experience and may often result in improper service decomposition. Moreover, a microservice architecture is likely to degrade with the continuous evolution of services. Architecture measurement is thus important for the long-term evolution of microservice architectures. Due to the independent and dynamic nature of services, source code analysis based approaches cannot well capture the interactions between services. In this paper, we propose a trace analysis based microservice architecture measurement approach. We define a trace data model for microservice architecture measurement, which enables fine-grained analysis of the execution processes of requests and the interactions between interfaces and services. Based on the data model, we define 14 architectural metrics to measure the service independence and invocation chain complexity of a microservice system. We implement the approach and conduct three case studies with a student course project, an open-source microservice benchmark system, and three industrial microservice systems. The results show that our approach can well characterize the independence and invocation chain complexity of microservice architectures and help developers to identify microservice architecture issues caused by improper service decomposition and architecture degradation.
{"title":"Trace analysis based microservice architecture measurement","authors":"Xin Peng, Chenxi Zhang, Zhongyuan Zhao, Akasaka Isami, Xiaofeng Guo, Yunna Cui","doi":"10.1145/3540250.3558951","DOIUrl":"https://doi.org/10.1145/3540250.3558951","url":null,"abstract":"Microservice architecture design highly relies on expert experience and may often result in improper service decomposition. Moreover, a microservice architecture is likely to degrade with the continuous evolution of services. Architecture measurement is thus important for the long-term evolution of microservice architectures. Due to the independent and dynamic nature of services, source code analysis based approaches cannot well capture the interactions between services. In this paper, we propose a trace analysis based microservice architecture measurement approach. We define a trace data model for microservice architecture measurement, which enables fine-grained analysis of the execution processes of requests and the interactions between interfaces and services. Based on the data model, we define 14 architectural metrics to measure the service independence and invocation chain complexity of a microservice system. We implement the approach and conduct three case studies with a student course project, an open-source microservice benchmark system, and three industrial microservice systems. The results show that our approach can well characterize the independence and invocation chain complexity of microservice architectures and help developers to identify microservice architecture issues caused by improper service decomposition and architecture degradation.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74563266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}