通过图像和文本语义理解和特征集成增强众包测试报告优先级

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-12-12 DOI:10.1109/TSE.2024.3516372

Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen

{"title":"通过图像和文本语义理解和特征集成增强众包测试报告优先级","authors":"Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen","doi":"10.1109/TSE.2024.3516372","DOIUrl":null,"url":null,"abstract":"Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose \n<sc>EncrePrior\n, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. \n<sc>EncrePrior\n extracts distinctive features from crowdsourced test reports. For app screenshots, \n<sc>EncrePrior\n considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, \n<sc>EncrePrior\n considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that \n<sc>EncrePrior\n outperforms the representative baseline approach \n<sc>DeepPrior\n by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of \n<sc>EncrePrior\n.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 1","pages":"283-304"},"PeriodicalIF":6.5000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration\",\"authors\":\"Chunrong Fang;Shengcheng Yu;Quanjun Zhang;Xin Li;Yulei Liu;Zhenyu Chen\",\"doi\":\"10.1109/TSE.2024.3516372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose \\n<sc>EncrePrior\\n, an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. \\n<sc>EncrePrior\\n extracts distinctive features from crowdsourced test reports. For app screenshots, \\n<sc>EncrePrior\\n considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, \\n<sc>EncrePrior\\n considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that \\n<sc>EncrePrior\\n outperforms the representative baseline approach \\n<sc>DeepPrior\\n by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of \\n<sc>EncrePrior\\n.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 1\",\"pages\":\"283-304\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10795658/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10795658/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

众包测试在软件测试领域获得了突出的地位，因为它能够有效地解决手机应用测试中碎片化问题所带来的挑战。众包测试固有的开放性为测试结果带来了多样性。然而，它也给应用程序开发人员带来了检查大量测试报告的挑战。为了帮助应用开发者尽早发现众包测试报告中的bug，众包测试报告优先级排序作为一种有效的技术应运而生，通过建立系统的最优报告检测顺序。尽管如此，众包测试报告由应用截图和文本描述组成，但目前的优先级排序方法主要依赖于文本描述，有些人可能会在图像整体级别或小部件级别添加矢量化图像功能。它们在准确地描述众包测试报告的独特特征方面仍然缺乏精确性。在优先排序策略方面，主流方法采用简单的基于特征组合的优先排序，仅使用加权系数，没有充分考虑语义，可能导致结果有偏差和无效。在本文中，我们提出了一种通过图像和文本语义理解和特征集成的增强众包测试报告优先级方法EncrePrior。EncrePrior从众包测试报告中提取出独特的特征。对于应用截图，EncrePrior会考虑结构（即GUI布局）和内容（即GUI小部件），分别从宏观和微观的角度来查看应用截图。对于文本描述，EncrePrior将Bug描述和复制步骤视为Bug上下文。在优先级划分过程中，我们不会直接将特征与权重合并来指导优先级划分。相反，为了全面考虑语义，我们采用了优先级-重新优先级策略。这种做法通过考虑它们各自的等级，将不同的特征组合在一起。报告首先分别对四个特性进行优先级排序。然后，使用四个序列上的排名，结合应用程序截图和文本描述的功能，按字典顺序重新排列测试报告的优先级。实证研究结果表明，在不同的应用中，EncrePrior的准确率在2.99% ~ 63.64%之间，平均优于具有代表性的基线方法deepreor 15.61%，而新提出的特征和优先级策略都是EncrePrior表现优异的原因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhanced Crowdsourced Test Report Prioritization via Image-and-Text Semantic Understanding and Feature Integration

Crowdsourced testing has gained prominence in the field of software testing due to its ability to effectively address the challenges posed by the fragmentation problem in mobile app testing. The inherent openness of crowdsourced testing brings diversity to the testing outcome. However, it also presents challenges for app developers in inspecting a substantial quantity of test reports. To help app developers inspect the bugs in crowdsourced test reports as early as possible, crowdsourced test report prioritization has emerged as an effective technology by establishing a systematic optimal report inspecting sequence. Nevertheless, crowdsourced test reports consist of app screenshots and textual descriptions, but current prioritization approaches mostly rely on textual descriptions, and some may add vectorized image features at the image-as-a-whole level or widget level. They still lack precision in accurately characterizing the distinctive features of crowdsourced test reports. In terms of prioritization strategy, prevailing approaches adopt simple prioritization based on features combined merely using weighted coefficients, without adequately considering the semantics, which may result in biased and ineffective outcomes. In this paper, we propose EncrePrior , an enhanced crowdsourced test report prioritization approach via image-and-text semantic understanding and feature integration. EncrePrior extracts distinctive features from crowdsourced test reports. For app screenshots, EncrePrior considers the structure (i.e., GUI layout) and the contents (i.e., GUI widgets), viewing the app screenshot from the macroscopic and microscopic perspectives, respectively. For textual descriptions, EncrePrior considers the Bug Description and Reproduction Step as the bug context. During the prioritization, we do not directly merge the features with weights to guide the prioritization. Instead, in order to comprehensively consider the semantics, we adopt a prioritize-reprioritize strategy. This practice combines different features together by considering their individual ranks. The reports are first prioritized on four features separately. Then, the ranks on four sequences are used to lexicographically reprioritize the test reports with an integration of features from app screenshots and textual descriptions. Results of an empirical study show that EncrePrior outperforms the representative baseline approach DeepPrior by 15.61% on average, ranging from 2.99% to 63.64% on different apps, and the novelly proposed features and prioritization strategy all contribute to the excellent performance of EncrePrior .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.

期刊最新文献

One Sentence Can Kill the Bug: Auto-replay Mobile App Crashes from One-sentence Overviews Retrospective: Data Mining Static Code Attributes to Learn Defect Predictors PATEN: Identifying Unpatched Third-Party APIs via Fine-grained Patch-enhanced AST-level Signature Three “Influential” Software Design Papers A Reflection on “Advances in Software Inspections”