首页 > 最新文献

软件产业与工程最新文献

英文 中文
Minerva: browser API fuzzing with dynamic mod-ref analysis Minerva:浏览器API模糊测试与动态模式引用分析
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3549107
Chijin Zhou, Quan Zhang, Mingzhe Wang, Lihua Guo, Jie Liang, Zhe Liu, Mathias Payer, Yuting Jiang
Browser APIs are essential to the modern web experience. Due to their large number and complexity, they vastly expand the attack surface of browsers. To detect vulnerabilities in these APIs, fuzzers generate test cases with a large amount of random API invocations. However, the massive search space formed by arbitrary API combinations hinders their effectiveness: since randomly-picked API invocations unlikely interfere with each other (i.e., compute on partially shared data), few interesting API interactions are explored. Consequently, reducing the search space by revealing inter-API relations is a major challenge in browser fuzzing. We propose Minerva, an efficient browser fuzzer for browser API bug detection. The key idea is to leverage API interference relations to reduce redundancy and improve coverage. Minerva consists of two modules: dynamic mod-ref analysis and guided code generation. Before fuzzing starts, the dynamic mod-ref analysis module builds an API interference graph. It first automatically identifies individual browser APIs from the browser’s code base. Next, it instruments the browser to dynamically collect mod-ref relations between APIs. During fuzzing, the guided code generation module synthesizes highly-relevant API invocations guided by the mod-ref relations. We evaluate Minerva on three mainstream browsers, i.e. Safari, FireFox, and Chromium. Compared to state-of-the-art fuzzers, Minerva improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs. Besides, Minerva has discovered 35 previously-unknown bugs out of which 20 have been fixed with 5 CVEs assigned and acknowledged by browser vendors.
浏览器api对现代网络体验至关重要。由于它们的数量和复杂性,它们极大地扩展了浏览器的攻击面。为了检测这些API中的漏洞,fuzzers使用大量随机API调用生成测试用例。然而,由任意API组合形成的巨大搜索空间阻碍了它们的有效性:由于随机选择的API调用不太可能相互干扰(即,对部分共享数据进行计算),因此很少有有趣的API交互被探索。因此,通过揭示api之间的关系来减少搜索空间是浏览器模糊测试的主要挑战。我们提出Minerva,一个高效的浏览器模糊器,用于浏览器API错误检测。关键思想是利用API干扰关系来减少冗余并提高覆盖范围。Minerva由两个模块组成:动态模型引用分析和引导代码生成。在模糊测试开始之前,动态模态参考分析模块构建API干扰图。它首先从浏览器的代码库中自动识别各个浏览器api。接下来,它使浏览器能够动态地收集api之间的mode -ref关系。在模糊测试期间,被引导的代码生成模块在模型引用关系的引导下合成高度相关的API调用。我们在三种主流浏览器上对Minerva进行了评估,即Safari、FireFox和Chromium。与最先进的fuzzers相比,Minerva将边缘覆盖率提高了19.63%至229.62%,并发现了2到3倍的独特漏洞。此外,Minerva还发现了35个以前未知的错误,其中20个已经修复,其中5个cve被浏览器供应商分配并承认。
{"title":"Minerva: browser API fuzzing with dynamic mod-ref analysis","authors":"Chijin Zhou, Quan Zhang, Mingzhe Wang, Lihua Guo, Jie Liang, Zhe Liu, Mathias Payer, Yuting Jiang","doi":"10.1145/3540250.3549107","DOIUrl":"https://doi.org/10.1145/3540250.3549107","url":null,"abstract":"Browser APIs are essential to the modern web experience. Due to their large number and complexity, they vastly expand the attack surface of browsers. To detect vulnerabilities in these APIs, fuzzers generate test cases with a large amount of random API invocations. However, the massive search space formed by arbitrary API combinations hinders their effectiveness: since randomly-picked API invocations unlikely interfere with each other (i.e., compute on partially shared data), few interesting API interactions are explored. Consequently, reducing the search space by revealing inter-API relations is a major challenge in browser fuzzing. We propose Minerva, an efficient browser fuzzer for browser API bug detection. The key idea is to leverage API interference relations to reduce redundancy and improve coverage. Minerva consists of two modules: dynamic mod-ref analysis and guided code generation. Before fuzzing starts, the dynamic mod-ref analysis module builds an API interference graph. It first automatically identifies individual browser APIs from the browser’s code base. Next, it instruments the browser to dynamically collect mod-ref relations between APIs. During fuzzing, the guided code generation module synthesizes highly-relevant API invocations guided by the mod-ref relations. We evaluate Minerva on three mainstream browsers, i.e. Safari, FireFox, and Chromium. Compared to state-of-the-art fuzzers, Minerva improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs. Besides, Minerva has discovered 35 previously-unknown bugs out of which 20 have been fixed with 5 CVEs assigned and acknowledged by browser vendors.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87123057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cross-device record and replay for Android apps Android应用程序的跨设备记录和重播
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3549083
Cong Li, Yanyan Jiang, Chang Xu
Cross-device replay for Android apps is challenging because apps have to adapt or even restructure their GUIs responsively upon screen-size or orientation change across devices. As a first exploratory work, this paper demonstrates that cross-device record and replay can be made simple and practical by a one-pass, greedy algorithm by the Rx framework leveraging the least surprise principle in the GUI design. The experimental results of over 1,000 replay settings encouragingly show that our implemented Rx prototype tool effectively solved non-trivial cross-device replay cases beyond any known non-search-based work's scope, and had still competitive capabilities on same-device replay with start-of-the-art techniques.
Android应用的跨设备重放具有挑战性,因为应用必须根据不同设备的屏幕大小或方向变化来调整甚至重构其gui。作为第一个探索性工作,本文证明了Rx框架利用GUI设计中的最小意外原则,通过一遍贪婪算法可以使跨设备记录和重播变得简单实用。超过1000个重播设置的实验结果令人鼓舞地表明,我们实现的Rx原型工具有效地解决了任何已知的非基于搜索的工作范围之外的重要跨设备重播情况,并且使用最先进的技术在同一设备重播方面仍然具有竞争力。
{"title":"Cross-device record and replay for Android apps","authors":"Cong Li, Yanyan Jiang, Chang Xu","doi":"10.1145/3540250.3549083","DOIUrl":"https://doi.org/10.1145/3540250.3549083","url":null,"abstract":"Cross-device replay for Android apps is challenging because apps have to adapt or even restructure their GUIs responsively upon screen-size or orientation change across devices. As a first exploratory work, this paper demonstrates that cross-device record and replay can be made simple and practical by a one-pass, greedy algorithm by the Rx framework leveraging the least surprise principle in the GUI design. The experimental results of over 1,000 replay settings encouragingly show that our implemented Rx prototype tool effectively solved non-trivial cross-device replay cases beyond any known non-search-based work's scope, and had still competitive capabilities on same-device replay with start-of-the-art techniques.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87559080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-perspective representation learning for source code analytics (invited tutorial) 源代码分析的多角度表示学习(特邀教程)
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3569446
Zhi Jin
Programming languages are artificial and highly restricted languages. But source code is there to tell computers as well as programmers what to do, as an act of communication. Despite its weird syntax and is riddled with different delimiters, the good news is that the very large corpus of open-source code is available. That makes it reasonable to apply machine learning techniques to source code to enable the source code analytics. Despite there are plenty of deep learning frameworks in the field of NLP, source code analytics has different features. In addition to the conventional way of coding, understanding the meaning of code involves many perspectives. The source code representation could be the token sequence, the API call sequence, the data dependency graph, and the control flow graph, as well as the program hierarchy, etc. This tutorial will tell the long, ongoing, and fruitful journey on exploiting the potential power of deep learning techniques in source code analytics. It will highlight that how code representation models can be utilized to support software engineers to perform different tasks that require proficient programming knowledge. The exploratory work show that code does imply the learnable knowledge, more precisely the learnable tacit knowledge. Although such knowledge is not easily transferrable between humans, it can be transferred between the automated programming tasks. A vision for future research will be stated for source code analytics.
编程语言是人工的、高度受限的语言。但是,作为一种交流行为,源代码的存在是为了告诉计算机和程序员该做什么。尽管它的语法很奇怪,并且充满了不同的分隔符,但好消息是有非常大的开源代码语料库可用。这使得将机器学习技术应用于源代码以实现源代码分析是合理的。尽管在NLP领域有很多深度学习框架,但源代码分析有不同的特点。除了传统的编码方式之外,理解代码的含义还涉及许多角度。源代码表示可以是令牌序列、API调用序列、数据依赖关系图和控制流图,以及程序层次结构等。本教程将讲述在源代码分析中利用深度学习技术的潜在力量的漫长、持续和富有成效的旅程。它将强调如何利用代码表示模型来支持软件工程师执行需要精通编程知识的不同任务。探索性的工作表明,代码确实隐含着可学习的知识,更确切地说是可学习的隐性知识。尽管这些知识在人与人之间不容易转移,但它可以在自动化编程任务之间转移。本文将对源代码分析的未来研究进行展望。
{"title":"Multi-perspective representation learning for source code analytics (invited tutorial)","authors":"Zhi Jin","doi":"10.1145/3540250.3569446","DOIUrl":"https://doi.org/10.1145/3540250.3569446","url":null,"abstract":"Programming languages are artificial and highly restricted languages. But source code is there to tell computers as well as programmers what to do, as an act of communication. Despite its weird syntax and is riddled with different delimiters, the good news is that the very large corpus of open-source code is available. That makes it reasonable to apply machine learning techniques to source code to enable the source code analytics. Despite there are plenty of deep learning frameworks in the field of NLP, source code analytics has different features. In addition to the conventional way of coding, understanding the meaning of code involves many perspectives. The source code representation could be the token sequence, the API call sequence, the data dependency graph, and the control flow graph, as well as the program hierarchy, etc. This tutorial will tell the long, ongoing, and fruitful journey on exploiting the potential power of deep learning techniques in source code analytics. It will highlight that how code representation models can be utilized to support software engineers to perform different tasks that require proficient programming knowledge. The exploratory work show that code does imply the learnable knowledge, more precisely the learnable tacit knowledge. Although such knowledge is not easily transferrable between humans, it can be transferred between the automated programming tasks. A vision for future research will be stated for source code analytics.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86629648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Security code smells in apps: are we getting better? 应用程序的安全代码气味:我们正在变得更好吗?
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3549091
Steven Arzt
Users increasingly rely on mobile apps for everyday tasks, including security- and privacy-sensitive tasks such as online banking, e-health, and e-government. Additionally, a wealth of sensors captures the movements and habits of the users for fitness tracking and convenience. Despite legal regulations imposing requirements and limits on the processing of privacy-sensitive data, users must still trust the app developers to apply suffcient protections. In this paper, we investigate the state of security in Android apps and how security-related code smells have evolved since the introduction of the Android operating system. With an analysis of 300 apps per year over 12 years between 2010 and 2021 from the Google Play Store, we find that the number of code scanner findings per thousand lines of code decreases over time. Still, this development is offset by the increase in code size. Apps have more and more findings, suggesting that the overall security level decreases. This trend is driven by flaws in the use of cryptography, insecure compiler flags, insecure uses of WebView components, and insecure uses of language features such as reflection. Based on our data, we argue for stricter controls on apps before admission to the store.
用户越来越依赖移动应用程序来完成日常任务,包括对安全和隐私敏感的任务,如网上银行、电子医疗和电子政务。此外,丰富的传感器捕捉用户的运动和习惯,以进行健身跟踪和方便。尽管法律法规对隐私敏感数据的处理施加了要求和限制,但用户仍然必须相信应用程序开发人员会提供足够的保护。在本文中,我们研究了Android应用程序的安全状态,以及自Android操作系统引入以来,与安全相关的代码气味是如何演变的。通过对Google Play Store从2010年到2021年的12年间每年300个应用的分析,我们发现每千行代码中代码扫描器发现的数量随着时间的推移而减少。但是,这种开发被代码大小的增加所抵消。应用程序有越来越多的发现,表明整体安全水平下降。这种趋势是由加密使用中的缺陷、不安全的编译器标志、不安全的WebView组件使用以及不安全的语言特性(如反射)使用所驱动的。根据我们的数据,我们主张在应用程序进入商店之前对其进行更严格的控制。
{"title":"Security code smells in apps: are we getting better?","authors":"Steven Arzt","doi":"10.1145/3540250.3549091","DOIUrl":"https://doi.org/10.1145/3540250.3549091","url":null,"abstract":"Users increasingly rely on mobile apps for everyday tasks, including security- and privacy-sensitive tasks such as online banking, e-health, and e-government. Additionally, a wealth of sensors captures the movements and habits of the users for fitness tracking and convenience. Despite legal regulations imposing requirements and limits on the processing of privacy-sensitive data, users must still trust the app developers to apply suffcient protections. In this paper, we investigate the state of security in Android apps and how security-related code smells have evolved since the introduction of the Android operating system. With an analysis of 300 apps per year over 12 years between 2010 and 2021 from the Google Play Store, we find that the number of code scanner findings per thousand lines of code decreases over time. Still, this development is offset by the increase in code size. Apps have more and more findings, suggesting that the overall security level decreases. This trend is driven by flaws in the use of cryptography, insecure compiler flags, insecure uses of WebView components, and insecure uses of language features such as reflection. Based on our data, we argue for stricter controls on apps before admission to the store.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83504917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pair programming conversations with agents vs. developers: challenges and opportunities for SE community 与代理和开发人员的结对编程对话:SE社区的挑战和机遇
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3549127
Peter Robe, S. Kuttal, J. AuBuchon, Jacob C. Hart
Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed ~10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.
最近的研究表明了交互式结对编程会话代理的可行性,但是实现这样的代理面临三个挑战:缺乏基准数据集,缺乏软件工程特定标签,需要理解开发人员的对话。为了解决这些挑战,我们进行了一项绿野仙踪研究,有14名参与者使用模拟代理进行结对编程,并收集了4443个开发人员-代理的话语。基于此数据集,我们使用开放编码过程创建了26个软件工程标签,以开发分层分类方案。为了理解标记的开发人员-代理对话,我们比较了三种最先进的基于转换的语言模型(BERT、GPT-2和XLNet)的准确性,它们可以互换执行。为了开始创建开发者-代理数据集,研究人员和实践者需要进行资源密集型的绿野仙踪研究。目前,在视频托管网站上存在着大量的开发者与开发者之间的对话。为了研究使用开发人员-开发人员对话的可行性,我们用我们的分层分类方案标记了一个公开可用的开发人员-开发人员数据集(3,436个话语),并发现在开发人员-开发人员数据上训练的BERT模型的表现比在开发人员-代理数据上训练的BERT差约10%,但是当使用迁移学习时,准确性得到了提高。最后,我们的定性分析表明,开发人员与开发人员之间的对话比开发人员与代理之间的对话更加含蓄、中立和固执己见。我们的研究结果对开发会话代理的软件工程研究人员和实践者具有启示意义。
{"title":"Pair programming conversations with agents vs. developers: challenges and opportunities for SE community","authors":"Peter Robe, S. Kuttal, J. AuBuchon, Jacob C. Hart","doi":"10.1145/3540250.3549127","DOIUrl":"https://doi.org/10.1145/3540250.3549127","url":null,"abstract":"Recent research has shown feasibility of an interactive pair-programming conversational agent, but implementing such an agent poses three challenges: a lack of benchmark datasets, absence of software engineering specific labels, and the need to understand developer conversations. To address these challenges, we conducted a Wizard of Oz study with 14 participants pair programming with a simulated agent and collected 4,443 developer-agent utterances. Based on this dataset, we created 26 software engineering labels using an open coding process to develop a hierarchical classification scheme. To understand labeled developer-agent conversations, we compared the accuracy of three state-of-the-art transformer-based language models, BERT, GPT-2, and XLNet, which performed interchangeably. In order to begin creating a developer-agent dataset, researchers and practitioners need to conduct resource intensive Wizard of Oz studies. Presently, there exists vast amounts of developer-developer conversations on video hosting websites. To investigate the feasibility of using developer-developer conversations, we labeled a publicly available developer-developer dataset (3,436 utterances) with our hierarchical classification scheme and found that a BERT model trained on developer-developer data performed ~10% worse than the BERT trained on developer-agent data, but when using transfer-learning, accuracy improved. Finally, our qualitative analysis revealed that developer-developer conversations are more implicit, neutral, and opinionated than developer-agent conversations. Our results have implications for software engineering researchers and practitioners developing conversational agents.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90327735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
GFI-bot: automated good first issue recommendation on GitHub GFI-bot:在GitHub上自动推荐好的第一个问题
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3558922
Hao He, Haonan Su, Wenxin Xiao, Runzhi He, Minghui Zhou
To facilitate newcomer onboarding, GitHub recommends the use of "good first issue" (GFI) labels to signal issues suitable for newcomers to resolve. However, previous research shows that manually labeled GFIs are scarce and inappropriate, showing a need for automated recommendations. In this paper, we present GFI-Bot (accessible at https://gfibot.io), a proof-of-concept machine learning powered bot for automated GFI recommendation in practice. Project maintainers can configure GFI-Bot to discover and label possible GFIs so that newcomers can easily locate issues for making their first contributions. GFI-Bot also provides a high-quality, up-to-date dataset for advancing GFI recommendation research.
为了方便新人入职,GitHub建议使用“良好的第一个问题”(GFI)标签来标记适合新人解决的问题。然而,先前的研究表明,人工标记的gfi是稀缺的和不合适的,表明需要自动推荐。在本文中,我们提出了GFI- bot(可访问https://gfibot.io),这是一个概念验证的机器学习驱动的机器人,用于在实践中自动推荐GFI。项目维护者可以配置GFI-Bot来发现和标记可能的gfi,这样新手就可以很容易地找到问题,以便首次做出贡献。GFI- bot还为推进GFI推荐研究提供了高质量、最新的数据集。
{"title":"GFI-bot: automated good first issue recommendation on GitHub","authors":"Hao He, Haonan Su, Wenxin Xiao, Runzhi He, Minghui Zhou","doi":"10.1145/3540250.3558922","DOIUrl":"https://doi.org/10.1145/3540250.3558922","url":null,"abstract":"To facilitate newcomer onboarding, GitHub recommends the use of \"good first issue\" (GFI) labels to signal issues suitable for newcomers to resolve. However, previous research shows that manually labeled GFIs are scarce and inappropriate, showing a need for automated recommendations. In this paper, we present GFI-Bot (accessible at https://gfibot.io), a proof-of-concept machine learning powered bot for automated GFI recommendation in practice. Project maintainers can configure GFI-Bot to discover and label possible GFIs so that newcomers can easily locate issues for making their first contributions. GFI-Bot also provides a high-quality, up-to-date dataset for advancing GFI recommendation research.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"2020 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75481897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SemCluster: a semi-supervised clustering tool for crowdsourced test reports with deep image understanding SemCluster:一个半监督的聚类工具,用于具有深度图像理解的众包测试报告
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3558933
Mingzhe Du, Shengcheng Yu, Chunrong Fang, Tongyu Li, Heyuan Zhang, Zhenyu Chen
Due to the openness of crowdsourced testing, mobile app crowdsourced testing has been subject to duplicate reports. The previous research methods extract the textual features of the crowdsourced test reports, combine with shallow image analysis, and perform unsupervised clustering on the crowdsourced test reports to clarify the duplication of crowdsourced test reports and solve the problem. However, these methods ignore the semantic connection between textual descriptions and screenshots, making the clustering results unsatisfactory and the deduplication effect less accurate. This paper proposes a semi-supervised clustering tool for crowdsourced test reports with deep image understanding, namely SemCluster, which makes the most of the semantic connection between textual descriptions and screenshots by constructing semantic binding rules and performing semi-supervised clustering. SemCluster improves six metrics of clustering results in the experiment compared to the state-of-the-art method, which verifies that SemCluster has achieved a good deduplication effect. The demo can be found at: https://sites.google.com/view/semcluster-demo.
由于众包测试的开放性,移动应用众包测试一直存在重复报告。以往的研究方法是提取众包测试报告的文本特征,结合浅层图像分析,对众包测试报告进行无监督聚类,以澄清众包测试报告的重复,解决问题。然而,这些方法忽略了文本描述和截图之间的语义联系,使得聚类结果不理想,重复数据删除效果不准确。本文针对具有深度图像理解能力的众包测试报告提出了一种半监督聚类工具SemCluster,该工具通过构建语义绑定规则并进行半监督聚类,充分利用文本描述与截图之间的语义联系。与现有方法相比,SemCluster在实验中改进了6个聚类结果指标,验证了SemCluster取得了良好的重复数据删除效果。该演示可以在https://sites.google.com/view/semcluster-demo上找到。
{"title":"SemCluster: a semi-supervised clustering tool for crowdsourced test reports with deep image understanding","authors":"Mingzhe Du, Shengcheng Yu, Chunrong Fang, Tongyu Li, Heyuan Zhang, Zhenyu Chen","doi":"10.1145/3540250.3558933","DOIUrl":"https://doi.org/10.1145/3540250.3558933","url":null,"abstract":"Due to the openness of crowdsourced testing, mobile app crowdsourced testing has been subject to duplicate reports. The previous research methods extract the textual features of the crowdsourced test reports, combine with shallow image analysis, and perform unsupervised clustering on the crowdsourced test reports to clarify the duplication of crowdsourced test reports and solve the problem. However, these methods ignore the semantic connection between textual descriptions and screenshots, making the clustering results unsatisfactory and the deduplication effect less accurate. This paper proposes a semi-supervised clustering tool for crowdsourced test reports with deep image understanding, namely SemCluster, which makes the most of the semantic connection between textual descriptions and screenshots by constructing semantic binding rules and performing semi-supervised clustering. SemCluster improves six metrics of clustering results in the experiment compared to the state-of-the-art method, which verifies that SemCluster has achieved a good deduplication effect. The demo can be found at: https://sites.google.com/view/semcluster-demo.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74919479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PyTER: effective program repair for Python type errors PyTER: Python类型错误的有效程序修复
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3549130
Wonseok Oh, Hakjoo Oh
We present PyTER, an automated program repair (APR) technique for Python type errors. Python developers struggle with type error exceptions that are prevalent and difficult to fix. Despite the importance, however, automatically repairing type errors in dynamically typed languages such as Python has received little attention in the APR community and no existing techniques are readily available for practical use. PyTER is the first technique that is carefully designed to fix diverse type errors in real-world Python applications. To this end, we present a novel APR approach that uses dynamic and static analyses to infer correct and incorrect types of program variables, and leverage their difference to effectively identify faulty locations and patch candidates. We evaluated PyTER on 93 type errors collected from open-source projects. The result shows that PyTER is able to fix 48.4% of them with a precision of 77.6%.
我们提出PyTER,一种Python类型错误的自动程序修复(APR)技术。Python开发人员正在与普遍存在且难以修复的类型错误异常作斗争。然而,尽管在动态类型语言(如Python)中自动修复类型错误很重要,但在APR社区中却很少受到关注,而且没有现成的技术可供实际使用。PyTER是第一种被精心设计用于修复实际Python应用程序中各种类型错误的技术。为此,我们提出了一种新的APR方法,该方法使用动态和静态分析来推断正确和不正确的程序变量类型,并利用它们的差异来有效地识别故障位置和候选补丁。我们对从开源项目中收集的93种类型错误进行了PyTER评估。结果表明,PyTER能够固定其中的48.4%,精度为77.6%。
{"title":"PyTER: effective program repair for Python type errors","authors":"Wonseok Oh, Hakjoo Oh","doi":"10.1145/3540250.3549130","DOIUrl":"https://doi.org/10.1145/3540250.3549130","url":null,"abstract":"We present PyTER, an automated program repair (APR) technique for Python type errors. Python developers struggle with type error exceptions that are prevalent and difficult to fix. Despite the importance, however, automatically repairing type errors in dynamically typed languages such as Python has received little attention in the APR community and no existing techniques are readily available for practical use. PyTER is the first technique that is carefully designed to fix diverse type errors in real-world Python applications. To this end, we present a novel APR approach that uses dynamic and static analyses to infer correct and incorrect types of program variables, and leverage their difference to effectively identify faulty locations and patch candidates. We evaluated PyTER on 93 type errors collected from open-source projects. The result shows that PyTER is able to fix 48.4% of them with a precision of 77.6%.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72910499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Online testing of RESTful APIs: promises and challenges RESTful api的在线测试:承诺与挑战
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3549144
Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés
Online testing of web APIs—testing APIs in production—is gaining traction in industry. Platforms such as RapidAPI and Sauce Labs provide online testing and monitoring services of web APIs 24/7, typically by re-executing manually designed test cases on the target APIs on a regular basis. In parallel, research on the automated generation of test cases for RESTful APIs has seen significant advances in recent years. However, despite their promising results in the lab, it is unclear whether research tools would scale to industrial-size settings and, more importantly, how they would perform in an online testing setup, increasingly common in practice. In this paper, we report the results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs. Specifically, we used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases. To scale at this level, we had to transition from a monolithic tool approach to a multi-bot architecture with over 200 bots working cooperatively in tasks like test generation and reporting. As a result, we uncovered about 390K failures, which we conservatively triaged into 254 bugs, 65 of which have been acknowledged or fixed by developers to date. Among others, we identified confirmed faults in the APIs of Amadeus, Foursquare, Yelp, and YouTube, accessed by millions of applications worldwide. More importantly, our reports have guided developers on improving their APIs, including bug fixes and documentation updates in the APIs of Amadeus and YouTube. Our results show the potential of online testing of RESTful APIs as the next must-have feature in industry, but also some of the key challenges to overcome for its full adoption in practice.
web api的在线测试——在生产环境中测试api——在工业中越来越受欢迎。RapidAPI和Sauce Labs等平台提供全天候的web api在线测试和监控服务,通常是定期在目标api上重新执行手动设计的测试用例。与此同时,对RESTful api测试用例自动生成的研究近年来取得了重大进展。然而,尽管这些研究工具在实验室中取得了可喜的成果,但目前尚不清楚它们是否可以扩展到工业规模,更重要的是,它们在实践中日益普遍的在线测试设置中如何表现。在本文中,我们报告了一项关于使用自动化测试用例生成方法在线测试RESTful api的实证研究结果。具体来说,我们使用rest框架在13个工业api中自动生成和执行测试用例,连续15天不间断,产生了超过一百万个测试用例。为了在这个级别上进行扩展,我们必须从单一的工具方法过渡到多机器人架构,其中有超过200个机器人在测试生成和报告等任务中协同工作。结果,我们发现了大约390K个故障,我们保守地将其分类为254个错误,其中65个已经被开发人员承认或修复。其中,我们在Amadeus、Foursquare、Yelp和YouTube的api中发现了已确认的错误,全球有数百万应用程序访问这些api。更重要的是,我们的报告指导开发人员改进他们的api,包括修复Amadeus和YouTube api中的错误和文档更新。我们的研究结果显示了RESTful api在线测试作为工业界下一个必备特性的潜力,但也显示了在实践中全面采用RESTful api需要克服的一些关键挑战。
{"title":"Online testing of RESTful APIs: promises and challenges","authors":"Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés","doi":"10.1145/3540250.3549144","DOIUrl":"https://doi.org/10.1145/3540250.3549144","url":null,"abstract":"Online testing of web APIs—testing APIs in production—is gaining traction in industry. Platforms such as RapidAPI and Sauce Labs provide online testing and monitoring services of web APIs 24/7, typically by re-executing manually designed test cases on the target APIs on a regular basis. In parallel, research on the automated generation of test cases for RESTful APIs has seen significant advances in recent years. However, despite their promising results in the lab, it is unclear whether research tools would scale to industrial-size settings and, more importantly, how they would perform in an online testing setup, increasingly common in practice. In this paper, we report the results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs. Specifically, we used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases. To scale at this level, we had to transition from a monolithic tool approach to a multi-bot architecture with over 200 bots working cooperatively in tasks like test generation and reporting. As a result, we uncovered about 390K failures, which we conservatively triaged into 254 bugs, 65 of which have been acknowledged or fixed by developers to date. Among others, we identified confirmed faults in the APIs of Amadeus, Foursquare, Yelp, and YouTube, accessed by millions of applications worldwide. More importantly, our reports have guided developers on improving their APIs, including bug fixes and documentation updates in the APIs of Amadeus and YouTube. Our results show the potential of online testing of RESTful APIs as the next must-have feature in industry, but also some of the key challenges to overcome for its full adoption in practice.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74352645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Trace analysis based microservice architecture measurement 基于跟踪分析的微服务架构度量
Pub Date : 2022-11-07 DOI: 10.1145/3540250.3558951
Xin Peng, Chenxi Zhang, Zhongyuan Zhao, Akasaka Isami, Xiaofeng Guo, Yunna Cui
Microservice architecture design highly relies on expert experience and may often result in improper service decomposition. Moreover, a microservice architecture is likely to degrade with the continuous evolution of services. Architecture measurement is thus important for the long-term evolution of microservice architectures. Due to the independent and dynamic nature of services, source code analysis based approaches cannot well capture the interactions between services. In this paper, we propose a trace analysis based microservice architecture measurement approach. We define a trace data model for microservice architecture measurement, which enables fine-grained analysis of the execution processes of requests and the interactions between interfaces and services. Based on the data model, we define 14 architectural metrics to measure the service independence and invocation chain complexity of a microservice system. We implement the approach and conduct three case studies with a student course project, an open-source microservice benchmark system, and three industrial microservice systems. The results show that our approach can well characterize the independence and invocation chain complexity of microservice architectures and help developers to identify microservice architecture issues caused by improper service decomposition and architecture degradation.
微服务架构设计高度依赖专家经验,可能经常导致服务分解不当。此外,微服务架构可能会随着服务的不断发展而降级。因此,体系结构度量对于微服务体系结构的长期发展非常重要。由于服务的独立性和动态性,基于源代码分析的方法不能很好地捕获服务之间的交互。在本文中,我们提出了一种基于跟踪分析的微服务架构度量方法。我们为微服务架构度量定义了跟踪数据模型,它支持对请求的执行过程以及接口和服务之间的交互进行细粒度分析。基于数据模型,我们定义了14个体系结构指标来度量微服务系统的服务独立性和调用链复杂性。我们实施了该方法,并对一个学生课程项目、一个开源微服务基准系统和三个工业微服务系统进行了三个案例研究。结果表明,我们的方法可以很好地表征微服务架构的独立性和调用链的复杂性,并帮助开发人员识别由于不适当的服务分解和架构退化而导致的微服务架构问题。
{"title":"Trace analysis based microservice architecture measurement","authors":"Xin Peng, Chenxi Zhang, Zhongyuan Zhao, Akasaka Isami, Xiaofeng Guo, Yunna Cui","doi":"10.1145/3540250.3558951","DOIUrl":"https://doi.org/10.1145/3540250.3558951","url":null,"abstract":"Microservice architecture design highly relies on expert experience and may often result in improper service decomposition. Moreover, a microservice architecture is likely to degrade with the continuous evolution of services. Architecture measurement is thus important for the long-term evolution of microservice architectures. Due to the independent and dynamic nature of services, source code analysis based approaches cannot well capture the interactions between services. In this paper, we propose a trace analysis based microservice architecture measurement approach. We define a trace data model for microservice architecture measurement, which enables fine-grained analysis of the execution processes of requests and the interactions between interfaces and services. Based on the data model, we define 14 architectural metrics to measure the service independence and invocation chain complexity of a microservice system. We implement the approach and conduct three case studies with a student course project, an open-source microservice benchmark system, and three industrial microservice systems. The results show that our approach can well characterize the independence and invocation chain complexity of microservice architectures and help developers to identify microservice architecture issues caused by improper service decomposition and architecture degradation.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74563266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
软件产业与工程
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1