arXiv - CS - Software Engineering最新文献

英文中文

Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review 软件工程中机器人和对话式代理的动机、挑战、最佳实践和优势：多语种文献综述

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11864

Stefano Lambiase, Gemma Catolino, Fabio Palomba, Filomena Ferrucci

Bots are software systems designed to support users by automating a specificprocess, task, or activity. When such systems implement a conversationalcomponent to interact with the users, they are also known as conversationalagents. Bots, particularly in their conversation-oriented version andAI-powered, have seen their adoption increase over time for softwaredevelopment and engineering purposes. Despite their exciting potential,ulteriorly enhanced by the advent of Generative AI and Large Language Models,bots still need to be improved to develop and integrate into the developmentcycle since practitioners report that bots add additional challenges that mayworsen rather than improve. In this work, we aim to provide a taxonomy forcharacterizing bots, as well as a series of challenges for their adoption forSoftware Engineering associated with potential mitigation strategies. To reachour objectives, we conducted a multivocal literature review, reviewing bothresearch and practitioner's literature. Through such an approach, we hope tocontribute to both researchers and practitioners by providing first, a seriesof future research routes to follow, second, a list of strategies to adopt forimproving the use of bots for software engineering purposes, and third, enforcea technology and knowledge transfer from the research field to thepractitioners one, that is one of the primary goal of multivocal literaturereviews.

机器人是一种软件系统，旨在通过自动化特定流程、任务或活动为用户提供支持。当这类系统实现了与用户交互的会话组件时，它们也被称为会话代理（conversationalagents）。随着时间的推移，机器人，尤其是面向对话的机器人和人工智能机器人，在软件开发和工程设计中的应用越来越广泛。尽管机器人具有令人兴奋的潜力，而生成式人工智能和大型语言模型的出现又进一步增强了这种潜力，但机器人的开发和融入开发周期仍需改进，因为从业人员报告说，机器人增加了额外的挑战，可能会使情况更糟，而不是有所改善。在这项工作中，我们旨在提供一种用于描述机器人特征的分类方法，以及一系列与潜在缓解策略相关的软件工程采用机器人所面临的挑战。为了实现我们的目标，我们进行了多角度的文献综述，既回顾了研究文献，也回顾了实践文献。通过这种方法，我们希望能为研究人员和从业人员做出贡献，首先是提供一系列未来研究路线，其次是提供一系列改进软件工程中机器人使用的策略，第三是实现从研究领域到从业人员的技术和知识转移，这也是多语种文献综述的主要目标之一。

{"title":"Motivations, Challenges, Best Practices, and Benefits for Bots and Conversational Agents in Software Engineering: A Multivocal Literature Review","authors":"Stefano Lambiase, Gemma Catolino, Fabio Palomba, Filomena Ferrucci","doi":"arxiv-2409.11864","DOIUrl":"https://doi.org/arxiv-2409.11864","url":null,"abstract":"Bots are software systems designed to support users by automating a specific\u0000process, task, or activity. When such systems implement a conversational\u0000component to interact with the users, they are also known as conversational\u0000agents. Bots, particularly in their conversation-oriented version and\u0000AI-powered, have seen their adoption increase over time for software\u0000development and engineering purposes. Despite their exciting potential,\u0000ulteriorly enhanced by the advent of Generative AI and Large Language Models,\u0000bots still need to be improved to develop and integrate into the development\u0000cycle since practitioners report that bots add additional challenges that may\u0000worsen rather than improve. In this work, we aim to provide a taxonomy for\u0000characterizing bots, as well as a series of challenges for their adoption for\u0000Software Engineering associated with potential mitigation strategies. To reach\u0000our objectives, we conducted a multivocal literature review, reviewing both\u0000research and practitioner's literature. Through such an approach, we hope to\u0000contribute to both researchers and practitioners by providing first, a series\u0000of future research routes to follow, second, a list of strategies to adopt for\u0000improving the use of bots for software engineering purposes, and third, enforce\u0000a technology and knowledge transfer from the research field to the\u0000practitioners one, that is one of the primary goal of multivocal literature\u0000reviews.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization 协作式代码生成模式的前景与危险：平衡效率与记忆

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.12020

Zhi Chen, Lingxiao Jiang

In the rapidly evolving field of machine learning, training models withdatasets from various locations and organizations presents significantchallenges due to privacy and legal concerns. The exploration of effectivecollaborative training settings capable of leveraging valuable knowledge fromdistributed and isolated datasets is increasingly crucial. This studyinvestigates key factors that impact the effectiveness of collaborativetraining methods in code next-token prediction, as well as the correctness andutility of the generated code, demonstrating the promise of such methods.Additionally, we evaluate the memorization of different participant trainingdata across various collaborative training settings, including centralized,federated, and incremental training, highlighting their potential risks inleaking data. Our findings indicate that the size and diversity of codedatasets are pivotal factors influencing the success of collaboratively trainedcode models. We show that federated learning achieves competitive performancecompared to centralized training while offering better data protection, asevidenced by lower memorization ratios in the generated code. However,federated learning can still produce verbatim code snippets from hiddentraining data, potentially violating privacy or copyright. Our study furtherexplores effectiveness and memorization patterns in incremental learning,emphasizing the sequence in which individual participant datasets areintroduced. We also identify cross-organizational clones as a prevalentchallenge in both centralized and federated learning scenarios. Our findingshighlight the persistent risk of data leakage during inference, even whentraining data remains unseen. We conclude with recommendations forpractitioners and researchers to optimize multisource datasets, propellingcross-organizational collaboration forward.

在快速发展的机器学习领域，由于隐私和法律问题，从不同地点和组织中提取数据集来训练模型面临着巨大挑战。探索能从分布式和孤立数据集中获取有价值知识的有效协作训练设置变得越来越重要。本研究调查了影响协作训练方法在代码下一个标记预测中的有效性以及生成代码的正确性和实用性的关键因素，展示了此类方法的前景。此外，我们还评估了不同协作训练设置（包括集中式、联合式和增量式训练）中不同参与者训练数据的记忆情况，强调了它们在泄露数据方面的潜在风险。我们的研究结果表明，编码集的规模和多样性是影响协作训练编码模型成功与否的关键因素。我们的研究结果表明，与集中式训练相比，联合学习能实现有竞争力的性能，同时还能提供更好的数据保护，生成代码的记忆率较低就是证明。不过，联合学习仍可能从隐藏的训练数据中生成逐字代码片段，从而可能侵犯隐私或版权。我们的研究进一步探索了增量学习的有效性和记忆模式，强调了引入单个参与者数据集的顺序。我们还发现跨组织克隆是集中式和联合式学习场景中普遍存在的挑战。我们的发现凸显了推理过程中持续存在的数据泄露风险，即使训练数据仍未被看到。最后，我们为实践者和研究人员提出了优化多源数据集的建议，以推动跨组织协作向前发展。

{"title":"Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization","authors":"Zhi Chen, Lingxiao Jiang","doi":"arxiv-2409.12020","DOIUrl":"https://doi.org/arxiv-2409.12020","url":null,"abstract":"In the rapidly evolving field of machine learning, training models with\u0000datasets from various locations and organizations presents significant\u0000challenges due to privacy and legal concerns. The exploration of effective\u0000collaborative training settings capable of leveraging valuable knowledge from\u0000distributed and isolated datasets is increasingly crucial. This study\u0000investigates key factors that impact the effectiveness of collaborative\u0000training methods in code next-token prediction, as well as the correctness and\u0000utility of the generated code, demonstrating the promise of such methods.\u0000Additionally, we evaluate the memorization of different participant training\u0000data across various collaborative training settings, including centralized,\u0000federated, and incremental training, highlighting their potential risks in\u0000leaking data. Our findings indicate that the size and diversity of code\u0000datasets are pivotal factors influencing the success of collaboratively trained\u0000code models. We show that federated learning achieves competitive performance\u0000compared to centralized training while offering better data protection, as\u0000evidenced by lower memorization ratios in the generated code. However,\u0000federated learning can still produce verbatim code snippets from hidden\u0000training data, potentially violating privacy or copyright. Our study further\u0000explores effectiveness and memorization patterns in incremental learning,\u0000emphasizing the sequence in which individual participant datasets are\u0000introduced. We also identify cross-organizational clones as a prevalent\u0000challenge in both centralized and federated learning scenarios. Our findings\u0000highlight the persistent risk of data leakage during inference, even when\u0000training data remains unseen. We conclude with recommendations for\u0000practitioners and researchers to optimize multisource datasets, propelling\u0000cross-organizational collaboration forward.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prototypical Leadership in Agile Software Development 敏捷软件开发中的原型领导力

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11685

Jina Dawood, Lucas Gren

Leadership in agile teams is a collective responsibility where team membersshare leadership work based on expertise and skills. However, the understandingof leadership in this context is limited. This study explores theunder-researched area of prototypical leadership, aiming to understand if andhow leaders who are perceived as more representative of the team are moreeffective leaders. Qualitative interviews were conducted with eleven members ofsix agile software teams in five Swedish companies from various industries andsizes. In this study, the effectiveness of leadership was perceived as higherwhen it emerged from within the team or when leaders aligned with the group. Inaddition, leaders in managerial roles that align with the team's shared valuesand traits were perceived as more effective, contributing to overall teamsuccess.

敏捷团队中的领导力是一种集体责任，团队成员根据专业知识和技能分担领导工作。然而，人们对这种情况下领导力的理解还很有限。本研究探讨了领导力原型这一研究不足的领域，旨在了解那些被认为更能代表团队的领导者是否以及如何成为更有效的领导者。研究人员对来自不同行业和规模的五家瑞典公司的六个敏捷软件团队的十一名成员进行了定性访谈。在这项研究中，当领导来自团队内部或领导与团队保持一致时，领导的有效性被认为更高。此外，担任管理角色的领导者如果与团队的共同价值观和特质相一致，则会被认为更有效，从而促进团队的整体成功。

引用次数: 0

Shannon Entropy is better Feature than Category and Sentiment in User Feedback Processing 在用户反馈处理中，香农熵是比类别和情感更好的特征

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.12012

Andres Rojas Paredes, Brenda Mareco

App reviews in mobile app stores contain useful information which is used toimprove applications and promote software evolution. This information isprocessed by automatic tools which prioritize reviews. In order to carry outthis prioritization, reviews are decomposed into features like category andsentiment. Then, a weighted function assigns a weight to each feature and areview ranking is calculated. Unfortunately, in order to extract category andsentiment from reviews, its is required at least a classifier trained in anannotated corpus. Therefore this task is computational demanding. Thus, in thiswork, we propose Shannon Entropy as a simple feature which can replace standardfeatures. Our results show that a Shannon Entropy based ranking is better thana standard ranking according to the NDCG metric. This result is promising evenif we require fairness by means of algorithmic bias. Finally, we highlight acomputational limit which appears in the search of the best ranking.

移动应用商店中的应用评论包含有用信息，可用于改进应用和促进软件发展。这些信息由自动工具处理，这些工具会对评论进行优先排序。为了进行优先级排序，评论会被分解成类别和情感等特征。然后，用加权函数为每个特征分配权重，并计算出评论排名。遗憾的是，要从评论中提取类别和情感，至少需要一个在有注释的语料库中训练过的分类器。因此，这项任务对计算要求很高。因此，在这项工作中，我们提出香农熵作为一种简单的特征，可以取代标准特征。我们的结果表明，根据 NDCG 指标，基于香农熵的排序优于标准排序。即使我们通过算法偏差来要求公平性，这一结果也是很有希望的。最后，我们强调了在搜索最佳排名时出现的计算极限。

引用次数: 0

From Group Psychology to Software Engineering Research to Automotive R&D: Measuring Team Development at Volvo Cars 从群体心理学到软件工程研究再到汽车研发：沃尔沃汽车公司团队发展的衡量标准

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11778

Lucas Gren, Christian Jacobsson

From 2019 to 2022, Volvo Cars successfully translated our researchdiscoveries regarding group dynamics within agile teams into widespreadindustrial practice. We wish to illuminate the insights gained through theprocess of garnering support, providing training, executing implementation, andsustaining a tool embraced by approximately 700 teams and 9,000 employees. Thistool was designed to empower agile teams and propel their internal development.Our experiences underscore the necessity of comprehensive team training, thecultivation of a cadre of trainers across the organization, and the creation ofa novel software solution. In essence, we deduce that an automated concisesurvey tool, coupled with a repository of actionable strategies, holdsremarkable potential in fostering the maturation of agile teams, but we alsoshare many of the challenges we encountered during the implementation.

从 2019 年到 2022 年，沃尔沃汽车公司成功地将我们对敏捷团队中群体动力的研究发现转化为广泛的行业实践。我们希望阐明在争取支持、提供培训、执行实施和维持一个被约 700 个团队和 9000 名员工所接受的工具的过程中所获得的启示。我们的经验强调了对团队进行全面培训、在整个组织内培养一批培训师以及创建一个新颖的软件解决方案的必要性。从本质上讲，我们得出结论，自动化简明调查工具与可操作策略库相结合，在促进敏捷团队成熟方面具有显著的潜力，但我们也分享了在实施过程中遇到的许多挑战。

引用次数: 0

Model-Checking the Implementation of Consent 对同意的执行情况进行示范检查

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11803

Raúl Pardo, Daniel Le Métayer

Privacy policies define the terms under which personal data may be collectedand processed by data controllers. The General Data Protection Regulation(GDPR) imposes requirements on these policies that are often difficult toimplement. Difficulties arise in particular due to the heterogeneity ofexisting systems (e.g., the Internet of Things (IoT), web technology, etc.). Inthis paper, we propose a method to refine high level GDPR privacy requirementsfor informed consent into low-level computational models. The method is aimedat software developers implementing systems that require consent management. Wemechanize our models in TLA+ and use model-checking to prove that the low-levelcomputational models implement the high-level privacy requirements; TLA+ hasbeen used by software engineers in companies such as Microsoft or Amazon. Wedemonstrate our method in two real world scenarios: an implementation of cookiebanners and a IoT system communicating via Bluetooth low energy.

隐私政策规定了数据控制者收集和处理个人数据的条款。通用数据保护条例》（GDPR）对这些政策提出了要求，而这些要求往往难以执行。由于现有系统（如物联网、网络技术等）的异构性，困难尤其突出。在本文中，我们提出了一种方法，将 GDPR 对知情同意的高层次隐私要求细化为低层次的计算模型。该方法主要针对软件开发人员实施需要同意管理的系统。我们在 TLA+ 中对模型进行了机械化，并使用模型检查来证明低级计算模型实现了高级隐私要求；TLA+ 已被微软或亚马逊等公司的软件工程师使用。我们在两个真实场景中演示了我们的方法：cookiebanners 的实现和通过蓝牙低能耗通信的物联网系统。

引用次数: 0

A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems 深度学习系统中自我承认的技术债务分类标准

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11826

Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta

The development of Machine Learning (ML)- and, more recently, of DeepLearning (DL)-intensive systems requires suitable choices, e.g., in terms oftechnology, algorithms, and hyper-parameters. Such choices depend ondevelopers' experience, as well as on proper experimentation. Due to limitedtime availability, developers may adopt suboptimal, sometimes temporarychoices, leading to a technical debt (TD) specifically related to the ML code.This paper empirically analyzes the presence of Self-Admitted Technical Debt(SATD) in DL systems. After selecting 100 open-source Python projects usingpopular DL frameworks, we identified SATD from their source comments andcreated a stratified sample of 443 SATD to analyze manually. We derived ataxonomy of DL-specific SATD through open coding, featuring seven categoriesand 41 leaves. The identified SATD categories pertain to different aspects ofDL models, some of which are technological (e.g., due to hardware or libraries)and some related to suboptimal choices in the DL process, model usage, orconfiguration. Our findings indicate that DL-specific SATD differs from DL bugsfound in previous studies, as it typically pertains to suboptimal solutionsrather than functional (eg blocking) problems. Last but not least, we foundthat state-of-the-art static analysis tools do not help developers avoid suchproblems, and therefore, specific support is needed to cope with DL-specificSATD.

开发机器学习（ML）--以及最近的深度学习（DL）--密集型系统，需要在技术、算法和超参数等方面做出适当的选择。这种选择取决于开发人员的经验以及适当的实验。由于时间有限，开发人员可能会采用次优的、有时是临时性的选择，从而导致与 ML 代码相关的技术债务（TD）。在选择了 100 个使用流行 DL 框架的开源 Python 项目后，我们从其源代码注释中识别出了 SATD，并创建了一个包含 443 个 SATD 的分层样本进行人工分析。通过开放式编码，我们得出了针对 DL 的 SATD 分类法，其中包括 7 个类别和 41 个叶子。所确定的 SATD 类别涉及 DL 模型的不同方面，其中一些是技术方面的（例如，由于硬件或库），另一些则与 DL 过程中的次优选择、模型使用或配置有关。我们的研究结果表明，针对 DL 的 SATD 不同于以往研究中发现的 DL 错误，因为它通常与次优解决方案有关，而不是功能性（阻塞）问题。最后但并非最不重要的一点是，我们发现最先进的静态分析工具无法帮助开发人员避免此类问题，因此需要特定的支持来应对 DL-specificSATD。

{"title":"A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems","authors":"Federica Pepe, Fiorella Zampetti, Antonio Mastropaolo, Gabriele Bavota, Massimiliano Di Penta","doi":"arxiv-2409.11826","DOIUrl":"https://doi.org/arxiv-2409.11826","url":null,"abstract":"The development of Machine Learning (ML)- and, more recently, of Deep\u0000Learning (DL)-intensive systems requires suitable choices, e.g., in terms of\u0000technology, algorithms, and hyper-parameters. Such choices depend on\u0000developers' experience, as well as on proper experimentation. Due to limited\u0000time availability, developers may adopt suboptimal, sometimes temporary\u0000choices, leading to a technical debt (TD) specifically related to the ML code.\u0000This paper empirically analyzes the presence of Self-Admitted Technical Debt\u0000(SATD) in DL systems. After selecting 100 open-source Python projects using\u0000popular DL frameworks, we identified SATD from their source comments and\u0000created a stratified sample of 443 SATD to analyze manually. We derived a\u0000taxonomy of DL-specific SATD through open coding, featuring seven categories\u0000and 41 leaves. The identified SATD categories pertain to different aspects of\u0000DL models, some of which are technological (e.g., due to hardware or libraries)\u0000and some related to suboptimal choices in the DL process, model usage, or\u0000configuration. Our findings indicate that DL-specific SATD differs from DL bugs\u0000found in previous studies, as it typically pertains to suboptimal solutions\u0000rather than functional (eg blocking) problems. Last but not least, we found\u0000that state-of-the-art static analysis tools do not help developers avoid such\u0000problems, and therefore, specific support is needed to cope with DL-specific\u0000SATD.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating team maturity in an agile automotive reorganization 调查敏捷汽车重组中的团队成熟度

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11781

Lucas Gren, Niclas Pettersson

About seven years ago, Volvo Cars initiated a large-scale agiletransformation. Midst this journey, a significant restructuring of the R&Ddepartment took place. Our study aims to illuminate how team maturity levelsare impacted during such comprehensive reorganizations. We collected data from63 teams to comprehend the effects of organizational changes on these agileteams. Additionally, qualitative data was gathered to validate our findings andexplore underlying reasons. Contrary to what was expected, the reorganizationdid not significantly alter the distribution of team maturity. High turnoverrates and frequent reorganizations were identified as key factors to why theless mature teams remained in the early stages of team development. Conversely,teams in the second category remained stable at a higher maturity stage,primarily because the teams themselves remained largely intact, with onlymanagement structures changing. In conclusion, while reorganizations may hindersome teams' development, others maintain stability at a higher level ofmaturity despite substantial managerial changes.

大约七年前，沃尔沃汽车公司启动了大规模的敏捷转型。在这一过程中，研发部门进行了重大重组。我们的研究旨在揭示团队成熟度在此类全面重组中受到的影响。我们收集了 63 个团队的数据，以了解组织变革对这些敏捷团队的影响。此外，我们还收集了定性数据，以验证我们的研究结果并探索其背后的原因。与预期相反，重组并没有明显改变团队成熟度的分布。高更替率和频繁重组被认为是成熟度较低的团队仍处于团队发展早期阶段的关键因素。相反，第二类团队仍然稳定在较高的成熟阶段，主要是因为团队本身基本保持不变，只有管理结构发生了变化。总之，尽管重组可能会阻碍一些团队的发展，但另一些团队尽管在管理上发生了重大变化，仍能在较高的成熟度阶段保持稳定。

引用次数: 0

Bridging Design and Development with Automated Declarative UI Code Generation 自动生成声明式用户界面代码，架起设计与开发的桥梁

arXiv - CS - Software Engineering

Pub Date : 2024-09-18 DOI: arxiv-2409.11667

Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, Haoyu Wang

Declarative UI frameworks have gained widespread adoption in mobile appdevelopment, offering benefits such as improved code readability and easiermaintenance. Despite these advantages, the process of translating UI designsinto functional code remains challenging and time-consuming. Recentadvancements in multimodal large language models (MLLMs) have shown promise indirectly generating mobile app code from user interface (UI) designs. However,the direct application of MLLMs to this task is limited by challenges inaccurately recognizing UI components and comprehensively capturing interactionlogic. To address these challenges, we propose DeclarUI, an automated approach thatsynergizes computer vision (CV), MLLMs, and iterative compiler-drivenoptimization to generate and refine declarative UI code from designs. DeclarUIenhances visual fidelity, functional completeness, and code quality throughprecise component segmentation, Page Transition Graphs (PTGs) for modelingcomplex inter-page relationships, and iterative optimization. In ourevaluation, DeclarUI outperforms baselines on React Native, a widely adopteddeclarative UI framework, achieving a 96.8% PTG coverage rate and a 98%compilation success rate. Notably, DeclarUI demonstrates significantimprovements over state-of-the-art MLLMs, with a 123% increase in PTG coveragerate, up to 55% enhancement in visual similarity scores, and a 29% boost incompilation success rate. We further demonstrate DeclarUI's generalizabilitythrough successful applications to Flutter and ArkUI frameworks.

声明式用户界面框架在移动应用开发中得到了广泛采用，它具有提高代码可读性和便于维护等优点。尽管有这些优势，但将用户界面设计转化为功能代码的过程仍然充满挑战且耗费时间。多模态大语言模型（MLLMs）的最新进展表明，从用户界面（UI）设计间接生成移动应用代码是大有可为的。然而，由于无法准确识别用户界面组件和全面捕捉交互逻辑，MLLMs 在这项任务中的直接应用受到了限制。为了应对这些挑战，我们提出了一种自动方法--DeclarUI，它将计算机视觉（CV）、MLLMs 和迭代编译器驱动的优化结合起来，从设计中生成并完善声明式 UI 代码。DeclarUI 通过精确的组件分割、用于模拟复杂页面间关系的页面转换图 (PTG)，以及迭代优化，提高了视觉保真度、功能完整性和代码质量。在我们的评估中，DeclarUI 在 React Native（一种被广泛采用的分解式用户界面框架）上的表现优于基准，实现了 96.8% 的 PTG 覆盖率和 98% 的编译成功率。值得注意的是，与最先进的 MLLM 相比，DeclarUI 的性能有了显著提高，PTG 覆盖率提高了 123%，视觉相似性得分提高了 55%，编译成功率提高了 29%。通过在 Flutter 和 ArkUI 框架中的成功应用，我们进一步证明了 DeclarUI 的通用性。

{"title":"Bridging Design and Development with Automated Declarative UI Code Generation","authors":"Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, Haoyu Wang","doi":"arxiv-2409.11667","DOIUrl":"https://doi.org/arxiv-2409.11667","url":null,"abstract":"Declarative UI frameworks have gained widespread adoption in mobile app\u0000development, offering benefits such as improved code readability and easier\u0000maintenance. Despite these advantages, the process of translating UI designs\u0000into functional code remains challenging and time-consuming. Recent\u0000advancements in multimodal large language models (MLLMs) have shown promise in\u0000directly generating mobile app code from user interface (UI) designs. However,\u0000the direct application of MLLMs to this task is limited by challenges in\u0000accurately recognizing UI components and comprehensively capturing interaction\u0000logic. To address these challenges, we propose DeclarUI, an automated approach that\u0000synergizes computer vision (CV), MLLMs, and iterative compiler-driven\u0000optimization to generate and refine declarative UI code from designs. DeclarUI\u0000enhances visual fidelity, functional completeness, and code quality through\u0000precise component segmentation, Page Transition Graphs (PTGs) for modeling\u0000complex inter-page relationships, and iterative optimization. In our\u0000evaluation, DeclarUI outperforms baselines on React Native, a widely adopted\u0000declarative UI framework, achieving a 96.8% PTG coverage rate and a 98%\u0000compilation success rate. Notably, DeclarUI demonstrates significant\u0000improvements over state-of-the-art MLLMs, with a 123% increase in PTG coverage\u0000rate, up to 55% enhancement in visual similarity scores, and a 29% boost in\u0000compilation success rate. We further demonstrate DeclarUI's generalizability\u0000through successful applications to Flutter and ArkUI frameworks.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AutoSpec: Automated Generation of Neural Network Specifications AutoSpec：自动生成神经网络规范

arXiv - CS - Software Engineering

Pub Date : 2024-09-17 DOI: arxiv-2409.10897

Shuowei Jin, Francis Y. Yan, Cheng Tan, Anuj Kalia, Xenofon Foukas, Z. Morley Mao

The increasing adoption of neural networks in learning-augmented systemshighlights the importance of model safety and robustness, particularly insafety-critical domains. Despite progress in the formal verification of neuralnetworks, current practices require users to manually define modelspecifications -- properties that dictate expected model behavior in variousscenarios. This manual process, however, is prone to human error, limited inscope, and time-consuming. In this paper, we introduce AutoSpec, the firstframework to automatically generate comprehensive and accurate specificationsfor neural networks in learning-augmented systems. We also propose the firstset of metrics for assessing the accuracy and coverage of model specifications,establishing a benchmark for future comparisons. Our evaluation across fourdistinct applications shows that AutoSpec outperforms human-definedspecifications as well as two baseline approaches introduced in this study.

神经网络在学习增强系统中的应用日益增多，这凸显了模型安全性和鲁棒性的重要性，尤其是在安全关键领域。尽管在神经网络的形式验证方面取得了进展，但目前的做法仍要求用户手动定义模型规范--即在各种情况下决定预期模型行为的属性。然而，这种手动过程容易出现人为错误，范围有限，而且耗时。在本文中，我们介绍了 AutoSpec，它是第一个为学习增强系统中的神经网络自动生成全面准确规范的框架。我们还提出了第一套用于评估模型规范准确性和覆盖范围的指标，为未来的比较建立了基准。我们对四种不同应用的评估表明，AutoSpec 的性能优于人类定义的规范以及本研究中引入的两种基线方法。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Software Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀