Nowadays, companies are highly exposed to cyber security threats. In many industrial domains, protective measures are being deployed and actively supported by standards. However the global process remains largely dependent on document driven approach or partial modelling which impacts both the efficiency and effectiveness of the cybersecurity process from the risk analysis step. In this paper, we report on our experience in applying a model-driven approach on the initial risk analysis step in connection with a later security testing. Our work rely on a common metamodel which is used to map, synchronise and ensure information traceability across different tools. We validate our approach using different scenarios relying domain modelling, system modelling, risk assessment and security testing tools.
{"title":"Building a Cybersecurity Risk Metamodel for Improved Method and Tool Integration","authors":"Christophe Ponsard","doi":"arxiv-2409.07906","DOIUrl":"https://doi.org/arxiv-2409.07906","url":null,"abstract":"Nowadays, companies are highly exposed to cyber security threats. In many\u0000industrial domains, protective measures are being deployed and actively\u0000supported by standards. However the global process remains largely dependent on\u0000document driven approach or partial modelling which impacts both the efficiency\u0000and effectiveness of the cybersecurity process from the risk analysis step. In\u0000this paper, we report on our experience in applying a model-driven approach on\u0000the initial risk analysis step in connection with a later security testing. Our\u0000work rely on a common metamodel which is used to map, synchronise and ensure\u0000information traceability across different tools. We validate our approach using\u0000different scenarios relying domain modelling, system modelling, risk assessment\u0000and security testing tools.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
UI automation tests play a crucial role in ensuring the quality of mobile applications. Despite the growing popularity of machine learning techniques to generate these tests, they still face several challenges, such as the mismatch of UI elements. The recent advances in Large Language Models (LLMs) have addressed these issues by leveraging their semantic understanding capabilities. However, a significant gap remains in applying these models to industrial-level app testing, particularly in terms of cost optimization and knowledge limitation. To address this, we introduce CAT to create cost-effective UI automation tests for industry apps by combining machine learning and LLMs with best practices. Given the task description, CAT employs Retrieval Augmented Generation (RAG) to source examples of industrial app usage as the few-shot learning context, assisting LLMs in generating the specific sequence of actions. CAT then employs machine learning techniques, with LLMs serving as a complementary optimizer, to map the target element on the UI screen. Our evaluations on the WeChat testing dataset demonstrate the CAT's performance and cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming the state-of-the-art. We have also integrated our approach into the real-world WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and enhancing the developers' testing process.
{"title":"Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat","authors":"Sidong Feng, Haochuan Lu, Jianqin Jiang, Ting Xiong, Likun Huang, Yinglin Liang, Xiaoqin Li, Yuetang Deng, Aldeida Aleti","doi":"arxiv-2409.07829","DOIUrl":"https://doi.org/arxiv-2409.07829","url":null,"abstract":"UI automation tests play a crucial role in ensuring the quality of mobile\u0000applications. Despite the growing popularity of machine learning techniques to\u0000generate these tests, they still face several challenges, such as the mismatch\u0000of UI elements. The recent advances in Large Language Models (LLMs) have\u0000addressed these issues by leveraging their semantic understanding capabilities.\u0000However, a significant gap remains in applying these models to industrial-level\u0000app testing, particularly in terms of cost optimization and knowledge\u0000limitation. To address this, we introduce CAT to create cost-effective UI\u0000automation tests for industry apps by combining machine learning and LLMs with\u0000best practices. Given the task description, CAT employs Retrieval Augmented\u0000Generation (RAG) to source examples of industrial app usage as the few-shot\u0000learning context, assisting LLMs in generating the specific sequence of\u0000actions. CAT then employs machine learning techniques, with LLMs serving as a\u0000complementary optimizer, to map the target element on the UI screen. Our\u0000evaluations on the WeChat testing dataset demonstrate the CAT's performance and\u0000cost-effectiveness, achieving 90% UI automation with $0.34 cost, outperforming\u0000the state-of-the-art. We have also integrated our approach into the real-world\u0000WeChat testing platform, demonstrating its usefulness in detecting 141 bugs and\u0000enhancing the developers' testing process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine/deep learning models have been widely adopted for predicting the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL, based on the new paradigm of dividable learning that builds a model via "divide-and-learn". To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, DaL adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaL performs no worse than the best counterpart on 44 out of 60 cases with up to 1.61x improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter d can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, DaL considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.
{"title":"Dividable Configuration Performance Learning","authors":"Jingzhi Gong, Tao Chen, Rami Bahsoon","doi":"arxiv-2409.07629","DOIUrl":"https://doi.org/arxiv-2409.07629","url":null,"abstract":"Machine/deep learning models have been widely adopted for predicting the\u0000configuration performance of software systems. However, a crucial yet\u0000unaddressed challenge is how to cater for the sparsity inherited from the\u0000configuration landscape: the influence of configuration options (features) and\u0000the distribution of data samples are highly sparse. In this paper, we propose a\u0000model-agnostic and sparsity-robust framework for predicting configuration\u0000performance, dubbed DaL, based on the new paradigm of dividable learning that\u0000builds a model via \"divide-and-learn\". To handle sample sparsity, the samples\u0000from the configuration landscape are divided into distant divisions, for each\u0000of which we build a sparse local model, e.g., regularized Hierarchical\u0000Interaction Neural Network, to deal with the feature sparsity. A newly given\u0000configuration would then be assigned to the right model of division for the\u0000final prediction. Further, DaL adaptively determines the optimal number of\u0000divisions required for a system and sample size without any extra training or\u0000profiling. Experiment results from 12 real-world systems and five sets of\u0000training data reveal that, compared with the state-of-the-art approaches, DaL\u0000performs no worse than the best counterpart on 44 out of 60 cases with up to\u00001.61x improvement on accuracy; requires fewer samples to reach the same/better\u0000accuracy; and producing acceptable training overhead. In particular, the\u0000mechanism that adapted the parameter d can reach the optimal value for 76.43%\u0000of the individual runs. The result also confirms that the paradigm of dividable\u0000learning is more suitable than other similar paradigms such as ensemble\u0000learning for predicting configuration performance. Practically, DaL\u0000considerably improves different global models when using them as the underlying\u0000local models, which further strengthens its flexibility.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Pomponio, Maximiliano Cristiá, Estanislao Ruiz Sorazábal, Maximiliano García
We show the design of the software of the microcontroller unit of a weeding robot based on the Process Control architectural style and design patterns. The design consists of 133 modules resulting from using 8 design patterns for a total of 30 problems. As a result the design yields more reusable components and an easily modifiable and extensible program. Design documentation is also presented. Finally, the implementation (12 KLOC of C++ code) is empirically evaluated to prove that the design does not produce an inefficient implementation.
{"title":"Reusability and Modifiability in Robotics Software (Extended Version)","authors":"Laura Pomponio, Maximiliano Cristiá, Estanislao Ruiz Sorazábal, Maximiliano García","doi":"arxiv-2409.07228","DOIUrl":"https://doi.org/arxiv-2409.07228","url":null,"abstract":"We show the design of the software of the microcontroller unit of a weeding\u0000robot based on the Process Control architectural style and design patterns. The\u0000design consists of 133 modules resulting from using 8 design patterns for a\u0000total of 30 problems. As a result the design yields more reusable components\u0000and an easily modifiable and extensible program. Design documentation is also\u0000presented. Finally, the implementation (12 KLOC of C++ code) is empirically\u0000evaluated to prove that the design does not produce an inefficient\u0000implementation.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot
Given that Large Language Models (LLMs) have made significant progress in writing code, can they now be used to autonomously reproduce results from research repositories? Such a capability would be a boon to the research community, helping researchers validate, understand, and extend prior work. To advance towards this goal, we introduce SUPER, the first benchmark designed to evaluate the capability of LLMs in setting up and executing tasks from research repositories. SUPERaims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories. Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges (e.g., configuring a trainer), and 602 automatically generated problems for larger-scale development. We introduce various evaluation measures to assess both task success and progress, utilizing gold solutions when available or approximations otherwise. We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios. This illustrates the challenge of this task, and suggests that SUPER can serve as a valuable resource for the community to make and measure progress.
{"title":"SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories","authors":"Ben Bogin, Kejuan Yang, Shashank Gupta, Kyle Richardson, Erin Bransom, Peter Clark, Ashish Sabharwal, Tushar Khot","doi":"arxiv-2409.07440","DOIUrl":"https://doi.org/arxiv-2409.07440","url":null,"abstract":"Given that Large Language Models (LLMs) have made significant progress in\u0000writing code, can they now be used to autonomously reproduce results from\u0000research repositories? Such a capability would be a boon to the research\u0000community, helping researchers validate, understand, and extend prior work. To\u0000advance towards this goal, we introduce SUPER, the first benchmark designed to\u0000evaluate the capability of LLMs in setting up and executing tasks from research\u0000repositories. SUPERaims to capture the realistic challenges faced by\u0000researchers working with Machine Learning (ML) and Natural Language Processing\u0000(NLP) research repositories. Our benchmark comprises three distinct problem\u0000sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems\u0000derived from the expert set that focus on specific challenges (e.g.,\u0000configuring a trainer), and 602 automatically generated problems for\u0000larger-scale development. We introduce various evaluation measures to assess\u0000both task success and progress, utilizing gold solutions when available or\u0000approximations otherwise. We show that state-of-the-art approaches struggle to\u0000solve these problems with the best model (GPT-4o) solving only 16.3% of the\u0000end-to-end set, and 46.1% of the scenarios. This illustrates the challenge of\u0000this task, and suggests that SUPER can serve as a valuable resource for the\u0000community to make and measure progress.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analyzing user reviews for sentiment towards app features can provide valuable insights into users' perceptions of app functionality and their evolving needs. Given the volume of user reviews received daily, an automated mechanism to generate feature-level sentiment summaries of user reviews is needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have shown impressive performance on several new tasks without updating the model's parameters i.e. using zero or a few labeled examples. Despite these advancements, LLMs' capabilities to perform feature-specific sentiment analysis of user reviews remain unexplored. This study compares the performance of state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for extracting app features and associated sentiments under 0-shot, 1-shot, and 5-shot scenarios. Results indicate the best-performing GPT-4 model outperforms rule-based approaches by 23.6% in f1-score with zero-shot feature extraction; 5-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting positive sentiment towards correctly predicted app features, with 5-shot enhancing it by 7%. Our study suggests that LLM models are promising for generating feature-specific sentiment summaries of user reviews.
{"title":"A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study","authors":"Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma","doi":"arxiv-2409.07162","DOIUrl":"https://doi.org/arxiv-2409.07162","url":null,"abstract":"Analyzing user reviews for sentiment towards app features can provide\u0000valuable insights into users' perceptions of app functionality and their\u0000evolving needs. Given the volume of user reviews received daily, an automated\u0000mechanism to generate feature-level sentiment summaries of user reviews is\u0000needed. Recent advances in Large Language Models (LLMs) such as ChatGPT have\u0000shown impressive performance on several new tasks without updating the model's\u0000parameters i.e. using zero or a few labeled examples. Despite these\u0000advancements, LLMs' capabilities to perform feature-specific sentiment analysis\u0000of user reviews remain unexplored. This study compares the performance of\u0000state-of-the-art LLMs, including GPT-4, ChatGPT, and LLama-2-chat variants, for\u0000extracting app features and associated sentiments under 0-shot, 1-shot, and\u00005-shot scenarios. Results indicate the best-performing GPT-4 model outperforms\u0000rule-based approaches by 23.6% in f1-score with zero-shot feature extraction;\u00005-shot further improving it by 6%. GPT-4 achieves a 74% f1-score for predicting\u0000positive sentiment towards correctly predicted app features, with 5-shot\u0000enhancing it by 7%. Our study suggests that LLM models are promising for\u0000generating feature-specific sentiment summaries of user reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the substantial number of enrollments in programming courses, a key challenge is delivering personalized feedback to students. The nature of this feedback varies significantly, contingent on the subject and the chosen evaluation method. However, tailoring current Automated Assessment Tools (AATs) to integrate other program analysis tools is not straightforward. Moreover, AATs usually support only specific programming languages, providing feedback exclusively through dedicated websites based on test suites. This paper introduces GitSEED, a language-agnostic automated assessment tool designed for Programming Education and Software Engineering (SE) and backed by GitLab. The students interact with GitSEED through GitLab. Using GitSEED, students in Computer Science (CS) and SE can master the fundamentals of git while receiving personalized feedback on their programming assignments and projects. Furthermore, faculty members can easily tailor GitSEED's pipeline by integrating various code evaluation tools (e.g., memory leak detection, fault localization, program repair, etc.) to offer personalized feedback that aligns with the needs of each CS/SE course. Our experiments assess GitSEED's efficacy via comprehensive user evaluation, examining the impact of feedback mechanisms and features on student learning outcomes. Findings reveal positive correlations between GitSEED usage and student engagement.
{"title":"GitSEED: A Git-backed Automated Assessment Tool for Software Engineering and Programming Education","authors":"Pedro Orvalho, Mikoláš Janota, Vasco Manquinho","doi":"arxiv-2409.07362","DOIUrl":"https://doi.org/arxiv-2409.07362","url":null,"abstract":"Due to the substantial number of enrollments in programming courses, a key\u0000challenge is delivering personalized feedback to students. The nature of this\u0000feedback varies significantly, contingent on the subject and the chosen\u0000evaluation method. However, tailoring current Automated Assessment Tools (AATs)\u0000to integrate other program analysis tools is not straightforward. Moreover,\u0000AATs usually support only specific programming languages, providing feedback\u0000exclusively through dedicated websites based on test suites. This paper introduces GitSEED, a language-agnostic automated assessment tool\u0000designed for Programming Education and Software Engineering (SE) and backed by\u0000GitLab. The students interact with GitSEED through GitLab. Using GitSEED,\u0000students in Computer Science (CS) and SE can master the fundamentals of git\u0000while receiving personalized feedback on their programming assignments and\u0000projects. Furthermore, faculty members can easily tailor GitSEED's pipeline by\u0000integrating various code evaluation tools (e.g., memory leak detection, fault\u0000localization, program repair, etc.) to offer personalized feedback that aligns\u0000with the needs of each CS/SE course. Our experiments assess GitSEED's efficacy\u0000via comprehensive user evaluation, examining the impact of feedback mechanisms\u0000and features on student learning outcomes. Findings reveal positive\u0000correlations between GitSEED usage and student engagement.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Umm-e- Habiba, Markus Haug, Justus Bogner, Stefan Wagner
Artificial intelligence (AI) permeates all fields of life, which resulted in new challenges in requirements engineering for artificial intelligence (RE4AI), e.g., the difficulty in specifying and validating requirements for AI or considering new quality requirements due to emerging ethical implications. It is currently unclear if existing RE methods are sufficient or if new ones are needed to address these challenges. Therefore, our goal is to provide a comprehensive overview of RE4AI to researchers and practitioners. What has been achieved so far, i.e., what practices are available, and what research gaps and challenges still need to be addressed? To achieve this, we conducted a systematic mapping study combining query string search and extensive snowballing. The extracted data was aggregated, and results were synthesized using thematic analysis. Our selection process led to the inclusion of 126 primary studies. Existing RE4AI research focuses mainly on requirements analysis and elicitation, with most practices applied in these areas. Furthermore, we identified requirements specification, explainability, and the gap between machine learning engineers and end-users as the most prevalent challenges, along with a few others. Additionally, we proposed seven potential research directions to address these challenges. Practitioners can use our results to identify and select suitable RE methods for working on their AI-based systems, while researchers can build on the identified gaps and research directions to push the field forward.
人工智能(AI)已渗透到生活的各个领域,这给人工智能需求工程(RE4AI)带来了新的挑战,例如,很难明确和验证人工智能的需求,或者由于新出现的伦理问题而需要考虑新的质量要求。目前还不清楚现有的 RE 方法是否足够,或者是否需要新的方法来应对这些挑战。因此,我们的目标是为研究人员和从业人员提供 RE4AI 的全面概述。迄今为止已经取得了哪些成果,即有哪些实践,还有哪些研究空白和挑战需要解决?为此,我们结合查询字符串搜索和广泛的雪球搜索,开展了系统的绘图研究。我们对提取的数据进行了汇总,并通过专题分析对结果进行了综合。通过筛选,我们纳入了 126 项主要研究。现有的 RE4AI 研究主要集中在需求分析和诱导方面,大多数实践都应用于这些领域。此外,我们还发现需求规范、可解释性、机器学习工程师与最终用户之间的差距以及其他一些问题是最普遍的挑战。此外,我们还提出了应对这些挑战的七个潜在研究方向。实践者可以利用我们的研究成果来确定和选择合适的可再生能源方法,用于他们基于人工智能的系统,而研究人员则可以在已确定的差距和研究方向的基础上推动该领域向前发展。
{"title":"How Mature is Requirements Engineering for AI-based Systems? A Systematic Mapping Study on Practices, Challenges, and Future Research Directions","authors":"Umm-e- Habiba, Markus Haug, Justus Bogner, Stefan Wagner","doi":"arxiv-2409.07192","DOIUrl":"https://doi.org/arxiv-2409.07192","url":null,"abstract":"Artificial intelligence (AI) permeates all fields of life, which resulted in\u0000new challenges in requirements engineering for artificial intelligence (RE4AI),\u0000e.g., the difficulty in specifying and validating requirements for AI or\u0000considering new quality requirements due to emerging ethical implications. It\u0000is currently unclear if existing RE methods are sufficient or if new ones are\u0000needed to address these challenges. Therefore, our goal is to provide a\u0000comprehensive overview of RE4AI to researchers and practitioners. What has been\u0000achieved so far, i.e., what practices are available, and what research gaps and\u0000challenges still need to be addressed? To achieve this, we conducted a\u0000systematic mapping study combining query string search and extensive\u0000snowballing. The extracted data was aggregated, and results were synthesized\u0000using thematic analysis. Our selection process led to the inclusion of 126\u0000primary studies. Existing RE4AI research focuses mainly on requirements\u0000analysis and elicitation, with most practices applied in these areas.\u0000Furthermore, we identified requirements specification, explainability, and the\u0000gap between machine learning engineers and end-users as the most prevalent\u0000challenges, along with a few others. Additionally, we proposed seven potential\u0000research directions to address these challenges. Practitioners can use our\u0000results to identify and select suitable RE methods for working on their\u0000AI-based systems, while researchers can build on the identified gaps and\u0000research directions to push the field forward.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"235 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach
Context: Regulations, such as the European Accessibility Act (EAA), impact the engineering of software products and services. Managing that impact while providing meaningful inputs to development teams is one of the emerging requirements engineering (RE) challenges. Problem: Enterprises conduct Regulatory Impact Analysis (RIA) to consider the effects of regulations on software products offered and formulate requirements at an enterprise level. Despite its practical relevance, we are unaware of any studies on this large-scale regulatory RE process. Methodology: We conducted an exploratory interview study of RIA in three large enterprises. We focused on how they conduct RIA, emphasizing cross-functional interactions, and using the EAA as an example. Results: RIA, as a regulatory RE process, is conducted to address the needs of executive management and central functions. It involves coordination between different functions and levels of enterprise hierarchy. Enterprises use artifacts to support interpretation and communication of the results of RIA. Challenges to RIA are mainly related to the execution of such coordination and managing the knowledge involved. Conclusion: RIA in large enterprises demands close coordination of multiple stakeholders and roles. Applying interpretation and compliance artifacts is one approach to support such coordination. However, there are no established practices for creating and managing such artifacts.
背景:欧洲无障碍法案》(EAA)等法规对软件产品和服务的工程设计产生了影响。管理这种影响,同时为开发团队提供有意义的投入,是新出现的需求工程(RE)挑战之一。问题:企业会进行法规影响分析(RIA),以考虑法规对所提供软件产品的影响,并制定企业级需求。尽管具有实际意义,但我们还不知道有任何关于这种大规模监管 RE 流程的研究。研究方法:我们对三家大型企业的 RIA 进行了探索性访谈研究。我们重点研究了他们如何开展监管影响评估,强调跨职能互动,并以监管局为例。研究结果监管影响评估作为一种监管 RE 流程,是为了满足执行管理层和中央职能部门的需求而开展的。它涉及不同职能部门和企业层级之间的协调。监管影响评估面临的挑战主要与执行此类协调和管理相关知识有关。结论:大型企业的 RIA 需要多个利益相关者和角色的密切协调。应用解释和合规工件是支持这种协调的一种方法。然而,目前还没有创建和管理此类人工制品的成熟做法。
{"title":"Regulatory Requirements Engineering in Large Enterprises: An Interview Study on the European Accessibility Act","authors":"Oleksandr Kosenkov, Michael Unterkalmsteiner, Daniel Mendez, Jannik Fischbach","doi":"arxiv-2409.07313","DOIUrl":"https://doi.org/arxiv-2409.07313","url":null,"abstract":"Context: Regulations, such as the European Accessibility Act (EAA), impact\u0000the engineering of software products and services. Managing that impact while\u0000providing meaningful inputs to development teams is one of the emerging\u0000requirements engineering (RE) challenges. Problem: Enterprises conduct Regulatory Impact Analysis (RIA) to consider the\u0000effects of regulations on software products offered and formulate requirements\u0000at an enterprise level. Despite its practical relevance, we are unaware of any\u0000studies on this large-scale regulatory RE process. Methodology: We conducted an exploratory interview study of RIA in three\u0000large enterprises. We focused on how they conduct RIA, emphasizing\u0000cross-functional interactions, and using the EAA as an example. Results: RIA, as a regulatory RE process, is conducted to address the needs\u0000of executive management and central functions. It involves coordination between\u0000different functions and levels of enterprise hierarchy. Enterprises use\u0000artifacts to support interpretation and communication of the results of RIA.\u0000Challenges to RIA are mainly related to the execution of such coordination and\u0000managing the knowledge involved. Conclusion: RIA in large enterprises demands close coordination of multiple\u0000stakeholders and roles. Applying interpretation and compliance artifacts is one\u0000approach to support such coordination. However, there are no established\u0000practices for creating and managing such artifacts.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Selecting the appropriate communication protocol is crucial for optimizing the performance, scalability, and user experience of web applications. In the diverse ecosystem of web technologies, various protocols like RESTful APIs, gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widely favored for their simplicity and stateless nature, making them ideal for standard CRUD operations. They offer a straightforward approach to interacting with resources over HTTP/1.1, providing broad compatibility and ease of integration across different platforms. However, in scenarios where applications require high efficiency and real-time communication, gRPC and WebSockets emerge as powerful alternatives. Each protocol comes with its strengths and limitations, influencing factors such as ease of implementation, performance under load, and support for complex data structures. RESTful APIs, while easy to use and widely supported, may introduce overhead due to their stateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPC advanced features, while powerful, require a steeper learning curve and more sophisticated infrastructure. Similarly, WebSockets, while excellent for real-time applications, require careful management of persistent connections and security considerations. This paper explores the key considerations in choosing the right communication protocol, emphasizing the need to align technical choices with application requirements and user expectations. By understanding the unique attributes of each protocol, developers can make informed decisions that enhance the responsiveness and reliability of their web applications. The choice of protocol can significantly impact the user experience, scalability, and maintainability of the application, making it a critical decision in the web development process.
{"title":"Choosing the Right Communication Protocol for your Web Application","authors":"Mohamed Hassan","doi":"arxiv-2409.07360","DOIUrl":"https://doi.org/arxiv-2409.07360","url":null,"abstract":"Selecting the appropriate communication protocol is crucial for optimizing\u0000the performance, scalability, and user experience of web applications. In the\u0000diverse ecosystem of web technologies, various protocols like RESTful APIs,\u0000gRPC, WebSockets, and others serve distinct purposes. RESTful APIs are widely\u0000favored for their simplicity and stateless nature, making them ideal for\u0000standard CRUD operations. They offer a straightforward approach to interacting\u0000with resources over HTTP/1.1, providing broad compatibility and ease of\u0000integration across different platforms. However, in scenarios where\u0000applications require high efficiency and real-time communication, gRPC and\u0000WebSockets emerge as powerful alternatives. Each protocol comes with its\u0000strengths and limitations, influencing factors such as ease of implementation,\u0000performance under load, and support for complex data structures. RESTful APIs,\u0000while easy to use and widely supported, may introduce overhead due to their\u0000stateless nature and reliance on multiple HTTP/1.1 requests. In contrast, gRPC\u0000advanced features, while powerful, require a steeper learning curve and more\u0000sophisticated infrastructure. Similarly, WebSockets, while excellent for\u0000real-time applications, require careful management of persistent connections\u0000and security considerations. This paper explores the key considerations in\u0000choosing the right communication protocol, emphasizing the need to align\u0000technical choices with application requirements and user expectations. By\u0000understanding the unique attributes of each protocol, developers can make\u0000informed decisions that enhance the responsiveness and reliability of their web\u0000applications. The choice of protocol can significantly impact the user\u0000experience, scalability, and maintainability of the application, making it a\u0000critical decision in the web development process.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}