This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the process model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.
{"title":"Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing","authors":"David Chapela-Campa, Marlon Dumas","doi":"arxiv-2409.05658","DOIUrl":"https://doi.org/arxiv-2409.05658","url":null,"abstract":"This paper addresses the following problem: Given a process model and an\u0000event log containing trace prefixes of ongoing cases of a process, map each\u0000case to its corresponding state (i.e., marking) in the model. This state\u0000computation operation is a building block of other process mining operations,\u0000such as log animation and short-term simulation. An approach to this state\u0000computation problem is to perform a token-based replay of each trace prefix\u0000against the model. However, when a trace prefix does not strictly follow the\u0000behavior of the process model, token replay may produce a state that is not\u0000reachable from the initial state of the process. An alternative approach is to\u0000first compute an alignment between the trace prefix of each ongoing case and\u0000the model, and then replay the aligned trace prefix. However,\u0000(prefix-)alignment is computationally expensive. This paper proposes a method\u0000that, given a trace prefix of an ongoing case, computes its state in constant\u0000time using an index that represents states as n-grams. An empirical evaluation\u0000shows that the proposed approach has an accuracy comparable to that of the\u0000prefix-alignment approach, while achieving a throughput of hundreds of\u0000thousands of traces per second.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Mustakim Billah, Palash Ranjan Roy, Zadia Codabux, Banani Roy
Competitive programming platforms like LeetCode, Codeforces, and HackerRank evaluate programming skills, often used by recruiters for screening. With the rise of advanced Large Language Models (LLMs) such as ChatGPT, Gemini, and Meta AI, their problem-solving ability on these platforms needs assessment. This study explores LLMs' ability to tackle diverse programming challenges across platforms with varying difficulty, offering insights into their real-time and offline performance and comparing them with human programmers. We tested 98 problems from LeetCode, 126 from Codeforces, covering 15 categories. Nine online contests from Codeforces and LeetCode were conducted, along with two certification tests on HackerRank, to assess real-time performance. Prompts and feedback mechanisms were used to guide LLMs, and correlations were explored across different scenarios. LLMs, like ChatGPT (71.43% success on LeetCode), excelled in LeetCode and HackerRank certifications but struggled in virtual contests, particularly on Codeforces. They performed better than users in LeetCode archives, excelling in time and memory efficiency but underperforming in harder Codeforces contests. While not immediately threatening, LLMs performance on these platforms is concerning, and future improvements will need addressing.
{"title":"Are Large Language Models a Threat to Programming Platforms? An Exploratory Study","authors":"Md Mustakim Billah, Palash Ranjan Roy, Zadia Codabux, Banani Roy","doi":"arxiv-2409.05824","DOIUrl":"https://doi.org/arxiv-2409.05824","url":null,"abstract":"Competitive programming platforms like LeetCode, Codeforces, and HackerRank\u0000evaluate programming skills, often used by recruiters for screening. With the\u0000rise of advanced Large Language Models (LLMs) such as ChatGPT, Gemini, and Meta\u0000AI, their problem-solving ability on these platforms needs assessment. This\u0000study explores LLMs' ability to tackle diverse programming challenges across\u0000platforms with varying difficulty, offering insights into their real-time and\u0000offline performance and comparing them with human programmers. We tested 98 problems from LeetCode, 126 from Codeforces, covering 15\u0000categories. Nine online contests from Codeforces and LeetCode were conducted,\u0000along with two certification tests on HackerRank, to assess real-time\u0000performance. Prompts and feedback mechanisms were used to guide LLMs, and\u0000correlations were explored across different scenarios. LLMs, like ChatGPT (71.43% success on LeetCode), excelled in LeetCode and\u0000HackerRank certifications but struggled in virtual contests, particularly on\u0000Codeforces. They performed better than users in LeetCode archives, excelling in\u0000time and memory efficiency but underperforming in harder Codeforces contests.\u0000While not immediately threatening, LLMs performance on these platforms is\u0000concerning, and future improvements will need addressing.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software testing is a crucial phase in the software development lifecycle (SDLC), ensuring that products meet necessary functional, performance, and quality benchmarks before release. Despite advancements in automation, traditional methods of generating and validating test cases still face significant challenges, including prolonged timelines, human error, incomplete test coverage, and high costs of manual intervention. These limitations often lead to delayed product launches and undetected defects that compromise software quality and user satisfaction. The integration of artificial intelligence (AI) into software testing presents a promising solution to these persistent challenges. AI-driven testing methods automate the creation of comprehensive test cases, dynamically adapt to changes, and leverage machine learning to identify high-risk areas in the codebase. This approach enhances regression testing efficiency while expanding overall test coverage. Furthermore, AI-powered tools enable continuous testing and self-healing test cases, significantly reducing manual oversight and accelerating feedback loops, ultimately leading to faster and more reliable software releases. This paper explores the transformative potential of AI in improving test case generation and validation, focusing on its ability to enhance efficiency, accuracy, and scalability in testing processes. It also addresses key challenges associated with adapting AI for testing, including the need for high quality training data, ensuring model transparency, and maintaining a balance between automation and human oversight. Through case studies and examples of real-world applications, this paper illustrates how AI can significantly enhance testing efficiency across both legacy and modern software systems.
{"title":"The Future of Software Testing: AI-Powered Test Case Generation and Validation","authors":"Mohammad Baqar, Rajat Khanda","doi":"arxiv-2409.05808","DOIUrl":"https://doi.org/arxiv-2409.05808","url":null,"abstract":"Software testing is a crucial phase in the software development lifecycle\u0000(SDLC), ensuring that products meet necessary functional, performance, and\u0000quality benchmarks before release. Despite advancements in automation,\u0000traditional methods of generating and validating test cases still face\u0000significant challenges, including prolonged timelines, human error, incomplete\u0000test coverage, and high costs of manual intervention. These limitations often\u0000lead to delayed product launches and undetected defects that compromise\u0000software quality and user satisfaction. The integration of artificial\u0000intelligence (AI) into software testing presents a promising solution to these\u0000persistent challenges. AI-driven testing methods automate the creation of\u0000comprehensive test cases, dynamically adapt to changes, and leverage machine\u0000learning to identify high-risk areas in the codebase. This approach enhances\u0000regression testing efficiency while expanding overall test coverage.\u0000Furthermore, AI-powered tools enable continuous testing and self-healing test\u0000cases, significantly reducing manual oversight and accelerating feedback loops,\u0000ultimately leading to faster and more reliable software releases. This paper\u0000explores the transformative potential of AI in improving test case generation\u0000and validation, focusing on its ability to enhance efficiency, accuracy, and\u0000scalability in testing processes. It also addresses key challenges associated\u0000with adapting AI for testing, including the need for high quality training\u0000data, ensuring model transparency, and maintaining a balance between automation\u0000and human oversight. Through case studies and examples of real-world\u0000applications, this paper illustrates how AI can significantly enhance testing\u0000efficiency across both legacy and modern software systems.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large language models (LLMs) have achieved impressive performance on code generation. Although prior studies enhanced LLMs with prompting techniques and code refinement, they still struggle with complex programming problems due to rigid solution plans. In this paper, we draw on pair programming practices to propose PairCoder, a novel LLM-based framework for code generation. PairCoder incorporates two collaborative LLM agents, namely a Navigator agent for high-level planning and a Driver agent for specific implementation. The Navigator is responsible for proposing promising solution plans, selecting the current optimal plan, and directing the next iteration round based on execution feedback. The Driver follows the guidance of Navigator to undertake initial code generation, code testing, and refinement. This interleaved and iterative workflow involves multi-plan exploration and feedback-based refinement, which mimics the collaboration of pair programmers. We evaluate PairCoder with both open-source and closed-source LLMs on various code generation benchmarks. Extensive experimental results demonstrate the superior accuracy of PairCoder, achieving relative pass@1 improvements of 12.00%-162.43% compared to prompting LLMs directly.
{"title":"A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement","authors":"Huan Zhang, Wei Cheng, Yuhan Wu, Wei Hu","doi":"arxiv-2409.05001","DOIUrl":"https://doi.org/arxiv-2409.05001","url":null,"abstract":"Large language models (LLMs) have achieved impressive performance on code\u0000generation. Although prior studies enhanced LLMs with prompting techniques and\u0000code refinement, they still struggle with complex programming problems due to\u0000rigid solution plans. In this paper, we draw on pair programming practices to\u0000propose PairCoder, a novel LLM-based framework for code generation. PairCoder\u0000incorporates two collaborative LLM agents, namely a Navigator agent for\u0000high-level planning and a Driver agent for specific implementation. The\u0000Navigator is responsible for proposing promising solution plans, selecting the\u0000current optimal plan, and directing the next iteration round based on execution\u0000feedback. The Driver follows the guidance of Navigator to undertake initial\u0000code generation, code testing, and refinement. This interleaved and iterative\u0000workflow involves multi-plan exploration and feedback-based refinement, which\u0000mimics the collaboration of pair programmers. We evaluate PairCoder with both\u0000open-source and closed-source LLMs on various code generation benchmarks.\u0000Extensive experimental results demonstrate the superior accuracy of PairCoder,\u0000achieving relative pass@1 improvements of 12.00%-162.43% compared to prompting\u0000LLMs directly.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In 2023, Sonatype reported a 200% increase in software supply chain attacks, including major build infrastructure attacks. To secure the software supply chain, practitioners can follow security framework guidance like the Supply-chain Levels for Software Artifacts (SLSA). However, recent surveys and industry summits have shown that despite growing interest, the adoption of SLSA is not widespread. To understand adoption challenges, textit{the goal of this study is to aid framework authors and practitioners in improving the adoption and development of Supply-Chain Levels for Software Artifacts (SLSA) through a qualitative study of SLSA-related issues on GitHub}. We analyzed 1,523 SLSA-related issues extracted from 233 GitHub repositories. We conducted a topic-guided thematic analysis, leveraging the Latent Dirichlet Allocation (LDA) unsupervised machine learning algorithm, to explore the challenges of adopting SLSA and the strategies for overcoming these challenges. We identified four significant challenges and five suggested adoption strategies. The two main challenges reported are complex implementation and unclear communication, highlighting the difficulties in implementing and understanding the SLSA process across diverse ecosystems. The suggested strategies include streamlining provenance generation processes, improving the SLSA verification process, and providing specific and detailed documentation. Our findings indicate that some strategies can help mitigate multiple challenges, and some challenges need future research and tool enhancement.
{"title":"Unraveling Challenges with Supply-Chain Levels for Software Artifacts (SLSA) for Securing the Software Supply Chain","authors":"Mahzabin Tamanna, Sivana Hamer, Mindy Tran, Sascha Fahl, Yasemin Acar, Laurie Williams","doi":"arxiv-2409.05014","DOIUrl":"https://doi.org/arxiv-2409.05014","url":null,"abstract":"In 2023, Sonatype reported a 200% increase in software supply chain attacks,\u0000including major build infrastructure attacks. To secure the software supply\u0000chain, practitioners can follow security framework guidance like the\u0000Supply-chain Levels for Software Artifacts (SLSA). However, recent surveys and\u0000industry summits have shown that despite growing interest, the adoption of SLSA\u0000is not widespread. To understand adoption challenges, textit{the goal of this\u0000study is to aid framework authors and practitioners in improving the adoption\u0000and development of Supply-Chain Levels for Software Artifacts (SLSA) through a\u0000qualitative study of SLSA-related issues on GitHub}. We analyzed 1,523\u0000SLSA-related issues extracted from 233 GitHub repositories. We conducted a\u0000topic-guided thematic analysis, leveraging the Latent Dirichlet Allocation\u0000(LDA) unsupervised machine learning algorithm, to explore the challenges of\u0000adopting SLSA and the strategies for overcoming these challenges. We identified\u0000four significant challenges and five suggested adoption strategies. The two\u0000main challenges reported are complex implementation and unclear communication,\u0000highlighting the difficulties in implementing and understanding the SLSA\u0000process across diverse ecosystems. The suggested strategies include\u0000streamlining provenance generation processes, improving the SLSA verification\u0000process, and providing specific and detailed documentation. Our findings\u0000indicate that some strategies can help mitigate multiple challenges, and some\u0000challenges need future research and tool enhancement.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding stakeholder needs is essential for project success, as stakeholder importance varies across projects. This study proposes a framework for early stakeholder identification and continuous engagement throughout the project lifecycle. The framework addresses common organizational failures in stakeholder communication that lead to project delays and cancellations. By classifying stakeholders by influence and interest, establishing clear communication channels, and implementing regular feedback loops, the framework ensures effective stakeholder involvement. This approach allows for necessary project adjustments and builds long-term relationships, validated by a survey of IT professionals. Engaging stakeholders strategically at all stages minimizes misunderstandings and project risks, contributing to better project management and lifecycle outcomes.
了解利益相关者的需求对项目的成功至关重要,因为利益相关者在不同项目中的重要性各不相同。本研究提出了一个框架,用于早期识别利益相关者,并在整个项目生命周期中持续参与。该框架解决了导致项目延误和取消的利益相关者沟通中常见的组织失误问题。通过对利益相关者的影响力和兴趣进行分类、建立清晰的沟通渠道以及实施定期反馈循环,该框架可确保利益相关者的有效参与。这种方法允许对项目进行必要的调整,并建立长期的合作关系,这一点在一项针对 IT 专业人士的调查中得到了验证。让利益相关者在各个阶段都战略性地参与进来,可以最大限度地减少误解和项目风险,从而促进更好的项目管理和生命周期成果。
{"title":"Unified External Stakeholder Engagement and Requirements Strategy","authors":"Ahmed Abdulaziz Alnhari, Rizwan Qureshi","doi":"arxiv-2409.05019","DOIUrl":"https://doi.org/arxiv-2409.05019","url":null,"abstract":"Understanding stakeholder needs is essential for project success, as\u0000stakeholder importance varies across projects. This study proposes a framework\u0000for early stakeholder identification and continuous engagement throughout the\u0000project lifecycle. The framework addresses common organizational failures in\u0000stakeholder communication that lead to project delays and cancellations. By\u0000classifying stakeholders by influence and interest, establishing clear\u0000communication channels, and implementing regular feedback loops, the framework\u0000ensures effective stakeholder involvement. This approach allows for necessary\u0000project adjustments and builds long-term relationships, validated by a survey\u0000of IT professionals. Engaging stakeholders strategically at all stages\u0000minimizes misunderstandings and project risks, contributing to better project\u0000management and lifecycle outcomes.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Lin, Jiajing Wu, Yuxin Su, Ziye Zheng, Yuhong Nan, Zibin Zheng
Decentralized bridge applications are important software that connects various blockchains and facilitates cross-chain asset transfer in the decentralized finance (DeFi) ecosystem which currently operates in a multi-chain environment. Cross-chain transaction association identifies and matches unique transactions executed by bridge DApps, which is important research to enhance the traceability of cross-chain bridge DApps. However, existing methods rely entirely on unobservable internal ledgers or APIs, violating the open and decentralized properties of blockchain. In this paper, we analyze the challenges of this issue and then present CONNECTOR, an automated cross-chain transaction association analysis method based on bridge smart contracts. Specifically, CONNECTOR first identifies deposit transactions by extracting distinctive and generic features from the transaction traces of bridge contracts. With the accurate deposit transactions, CONNECTOR mines the execution logs of bridge contracts to achieve withdrawal transaction matching. We conduct real-world experiments on different types of bridges to demonstrate the effectiveness of CONNECTOR. The experiment demonstrates that CONNECTOR successfully identifies 100% deposit transactions, associates 95.81% withdrawal transactions, and surpasses methods for CeFi bridges. Based on the association results, we obtain interesting findings about cross-chain transaction behaviors in DeFi bridges and analyze the tracing abilities of CONNECTOR to assist the DeFi bridge apps.
{"title":"CONNECTOR: Enhancing the Traceability of Decentralized Bridge Applications via Automatic Cross-chain Transaction Association","authors":"Dan Lin, Jiajing Wu, Yuxin Su, Ziye Zheng, Yuhong Nan, Zibin Zheng","doi":"arxiv-2409.04937","DOIUrl":"https://doi.org/arxiv-2409.04937","url":null,"abstract":"Decentralized bridge applications are important software that connects\u0000various blockchains and facilitates cross-chain asset transfer in the\u0000decentralized finance (DeFi) ecosystem which currently operates in a\u0000multi-chain environment. Cross-chain transaction association identifies and\u0000matches unique transactions executed by bridge DApps, which is important\u0000research to enhance the traceability of cross-chain bridge DApps. However,\u0000existing methods rely entirely on unobservable internal ledgers or APIs,\u0000violating the open and decentralized properties of blockchain. In this paper,\u0000we analyze the challenges of this issue and then present CONNECTOR, an\u0000automated cross-chain transaction association analysis method based on bridge\u0000smart contracts. Specifically, CONNECTOR first identifies deposit transactions\u0000by extracting distinctive and generic features from the transaction traces of\u0000bridge contracts. With the accurate deposit transactions, CONNECTOR mines the\u0000execution logs of bridge contracts to achieve withdrawal transaction matching.\u0000We conduct real-world experiments on different types of bridges to demonstrate\u0000the effectiveness of CONNECTOR. The experiment demonstrates that CONNECTOR\u0000successfully identifies 100% deposit transactions, associates 95.81% withdrawal\u0000transactions, and surpasses methods for CeFi bridges. Based on the association\u0000results, we obtain interesting findings about cross-chain transaction behaviors\u0000in DeFi bridges and analyze the tracing abilities of CONNECTOR to assist the\u0000DeFi bridge apps.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ehsan Firouzi, Ammar Mansuri, Mohammad Ghafari, Maziar Kaveh
Cryptography misuses are prevalent in the wild. Crypto APIs are hard to use for developers, and static analysis tools do not detect every misuse. We developed SafEncrypt, an API that streamlines encryption tasks for Java developers. It is built on top of the native Java Cryptography Architecture, and it shields developers from crypto complexities and erroneous low-level details. Experiments showed that SafEncrypt is suitable for developers with varying levels of experience.
{"title":"From Struggle to Simplicity with a Usable and Secure API for Encryption in Java","authors":"Ehsan Firouzi, Ammar Mansuri, Mohammad Ghafari, Maziar Kaveh","doi":"arxiv-2409.05128","DOIUrl":"https://doi.org/arxiv-2409.05128","url":null,"abstract":"Cryptography misuses are prevalent in the wild. Crypto APIs are hard to use\u0000for developers, and static analysis tools do not detect every misuse. We\u0000developed SafEncrypt, an API that streamlines encryption tasks for Java\u0000developers. It is built on top of the native Java Cryptography Architecture,\u0000and it shields developers from crypto complexities and erroneous low-level\u0000details. Experiments showed that SafEncrypt is suitable for developers with\u0000varying levels of experience.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yakun Zhang, Chen Liu, Xiaofei Xie, Yun Lin, Jin Song Dong, Dan Hao, Lu Zhang
GUI test migration aims to produce test cases with events and assertions to test specific functionalities of a target app. Existing migration approaches typically focus on the widget-mapping paradigm that maps widgets from source apps to target apps. However, since different apps may implement the same functionality in different ways, direct mapping may result in incomplete or buggy test cases, thus significantly impacting the effectiveness of testing target functionality and the practical applicability. In this paper, we propose a new migration paradigm (i.e., abstraction-concretization paradigm) that first abstracts the test logic for the target functionality and then utilizes this logic to generate the concrete GUI test case. Furthermore, we introduce MACdroid, the first approach that migrates GUI test cases based on this paradigm. Specifically, we propose an abstraction technique that utilizes source test cases from source apps targeting the same functionality to extract a general test logic for that functionality. Then, we propose a concretization technique that utilizes the general test logic to guide an LLM in generating the corresponding GUI test case (including events and assertions) for the target app. We evaluate MACdroid on two widely-used datasets (including 31 apps, 34 functionalities, and 123 test cases). On the FrUITeR dataset, the test cases generated by MACdroid successfully test 64% of the target functionalities, improving the baselines by 191%. On the Lin dataset, MACdroid successfully tests 75% of the target functionalities, outperforming the baselines by 42%. These results underscore the effectiveness of MACdroid in GUI test migration.
{"title":"LLM-based Abstraction and Concretization for GUI Test Migration","authors":"Yakun Zhang, Chen Liu, Xiaofei Xie, Yun Lin, Jin Song Dong, Dan Hao, Lu Zhang","doi":"arxiv-2409.05028","DOIUrl":"https://doi.org/arxiv-2409.05028","url":null,"abstract":"GUI test migration aims to produce test cases with events and assertions to\u0000test specific functionalities of a target app. Existing migration approaches\u0000typically focus on the widget-mapping paradigm that maps widgets from source\u0000apps to target apps. However, since different apps may implement the same\u0000functionality in different ways, direct mapping may result in incomplete or\u0000buggy test cases, thus significantly impacting the effectiveness of testing\u0000target functionality and the practical applicability. In this paper, we propose a new migration paradigm (i.e.,\u0000abstraction-concretization paradigm) that first abstracts the test logic for\u0000the target functionality and then utilizes this logic to generate the concrete\u0000GUI test case. Furthermore, we introduce MACdroid, the first approach that\u0000migrates GUI test cases based on this paradigm. Specifically, we propose an\u0000abstraction technique that utilizes source test cases from source apps\u0000targeting the same functionality to extract a general test logic for that\u0000functionality. Then, we propose a concretization technique that utilizes the\u0000general test logic to guide an LLM in generating the corresponding GUI test\u0000case (including events and assertions) for the target app. We evaluate MACdroid\u0000on two widely-used datasets (including 31 apps, 34 functionalities, and 123\u0000test cases). On the FrUITeR dataset, the test cases generated by MACdroid\u0000successfully test 64% of the target functionalities, improving the baselines by\u0000191%. On the Lin dataset, MACdroid successfully tests 75% of the target\u0000functionalities, outperforming the baselines by 42%. These results underscore\u0000the effectiveness of MACdroid in GUI test migration.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jadson Santos, Daniel Alencar da Costa, Shane McIntosh, Uirá Kulesza
Continuous Integration (CI) encompasses a set of widely adopted practices that enhance software development. However, there are indications that developers may not adequately monitor CI practices. Hence, this paper explores developers' perceptions regarding the monitoring CI practices. To achieve this, we first perform a Document Analysis to assess developers' expressed need for practice monitoring in pull requests comments generated by developers during the development process. After that, we conduct a survey among developers from 121 open-source projects to understand perception of the significance of monitoring seven CI practices in their projects. Finally, we triangulate the emergent themes from our survey by performing a second Document Analysis to understand the extent of monitoring features supported by existing CI services. Our key findings indicate that: 1) the most frequently mentioned CI practice during the development process is ``Test Coverage'' (> 80%), while ``Build Health'' and ``Time to Fix a Broken Build'' present notable opportunities for monitoring CI practices; 2) developers do not adequately monitor all CI practices and express interest in monitoring additional practices; and 3) the most popular CI services currently offer limited native support for monitoring CI practices, requiring the use of third-party tools. Our results lead us to conclude that monitoring CI practices is often overlooked by both CI services and developers. Using third-party tools in conjunction with CI services is challenging, they monitor some redundant practices and still falls short of fully supporting CI practices monitoring. Therefore, CI services should implement CI practices monitoring, which would facilitate and encourage developers to monitor them.
持续集成(CI)包含一系列被广泛采用的实践,可增强软件开发能力。然而,有迹象表明,开发人员可能没有充分监控 CI 实践。因此,本文探讨了开发人员对监控 CI 实践的看法。为此,我们首先进行了文档分析,以评估开发人员在开发过程中产生的拉取请求评论中表达的对实践监控的需求。之后,我们对 121 个开源项目的开发人员进行了调查,以了解他们对其项目中监控七项 CI 实践的重要性的看法。最后,我们通过第二次文档分析,对调查中发现的主题进行了三角测量,以了解现有 CI 服务所支持的监控功能的范围:1)开发过程中最常提及的 CI 实践是 "测试覆盖率"(> 80%),而 "构建健康状况 "和 "修复错误构建的时间 "则为监控 CI 实践提供了显著的机会;2)开发人员并未充分监控所有 CI 实践,并表示有兴趣监控其他实践;3)目前最流行的 CI 服务为监控 CI 实践提供的本地支持有限,需要使用第三方工具。我们的研究结果使我们得出结论,CI 服务和开发人员都经常忽视对 CI 实践的监控。将第三方工具与 CI 服务结合使用具有挑战性,因为它们会监控一些冗余的实践,而且仍然无法完全支持 CI 实践监控。因此,CI 服务应实施 CI 实践监控,这将促进并鼓励开发人员对其进行监控。
{"title":"On the Need to Monitor Continuous Integration Practices -- An Empirical Study","authors":"Jadson Santos, Daniel Alencar da Costa, Shane McIntosh, Uirá Kulesza","doi":"arxiv-2409.05101","DOIUrl":"https://doi.org/arxiv-2409.05101","url":null,"abstract":"Continuous Integration (CI) encompasses a set of widely adopted practices\u0000that enhance software development. However, there are indications that\u0000developers may not adequately monitor CI practices. Hence, this paper explores\u0000developers' perceptions regarding the monitoring CI practices. To achieve this,\u0000we first perform a Document Analysis to assess developers' expressed need for\u0000practice monitoring in pull requests comments generated by developers during\u0000the development process. After that, we conduct a survey among developers from\u0000121 open-source projects to understand perception of the significance of\u0000monitoring seven CI practices in their projects. Finally, we triangulate the\u0000emergent themes from our survey by performing a second Document Analysis to\u0000understand the extent of monitoring features supported by existing CI services.\u0000Our key findings indicate that: 1) the most frequently mentioned CI practice\u0000during the development process is ``Test Coverage'' (> 80%), while ``Build\u0000Health'' and ``Time to Fix a Broken Build'' present notable opportunities for\u0000monitoring CI practices; 2) developers do not adequately monitor all CI\u0000practices and express interest in monitoring additional practices; and 3) the\u0000most popular CI services currently offer limited native support for monitoring\u0000CI practices, requiring the use of third-party tools. Our results lead us to\u0000conclude that monitoring CI practices is often overlooked by both CI services\u0000and developers. Using third-party tools in conjunction with CI services is\u0000challenging, they monitor some redundant practices and still falls short of\u0000fully supporting CI practices monitoring. Therefore, CI services should\u0000implement CI practices monitoring, which would facilitate and encourage\u0000developers to monitor them.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}