数据密集型系统中的数据访问性能反模式

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Empirical Software Engineering Pub Date : 2024-08-29 DOI:10.1007/s10664-024-10535-8

Biruk Asmare Muse, Kawser Wazed Nafi, Foutse Khomh, Giuliano Antoniol

{"title":"数据密集型系统中的数据访问性能反模式","authors":"Biruk Asmare Muse, Kawser Wazed Nafi, Foutse Khomh, Giuliano Antoniol","doi":"10.1007/s10664-024-10535-8","DOIUrl":null,"url":null,"abstract":"Data-intensive systems handle variable, high-volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems, as it determines their overall performance and functionality. While data access technical debts are getting attention from the research community, technical debts that affect performance are not well investigated. This study aims to identify, categorize, and validate data-access performance anti-patterns. We collected issues from NoSQL-based and polyglot persistence open-source data-intensive systems, implemented in Java programing language, and identified 14 new data access-performance anti-patterns categorized under seven high-level categories. We conducted a developer survey to evaluate the perceived relevance and criticality of the newly identified anti-patterns and found that Improper Handling of Node Failures, Using Synchronous Connection, and Inefficient Driver API performance anti-patterns are the most critical data-access performance anti-patterns. The study findings can help improve the quality of data-intensive software systems by raising awareness of practitioners about the impact of the data-access performance anti-patterns. At the same time, the findings will help quality assurance teams to prioritize the correction of performance anti-patterns based on their criticality.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"19 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-access performance anti-patterns in data-intensive systems\",\"authors\":\"Biruk Asmare Muse, Kawser Wazed Nafi, Foutse Khomh, Giuliano Antoniol\",\"doi\":\"10.1007/s10664-024-10535-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data-intensive systems handle variable, high-volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems, as it determines their overall performance and functionality. While data access technical debts are getting attention from the research community, technical debts that affect performance are not well investigated. This study aims to identify, categorize, and validate data-access performance anti-patterns. We collected issues from NoSQL-based and polyglot persistence open-source data-intensive systems, implemented in Java programing language, and identified 14 new data access-performance anti-patterns categorized under seven high-level categories. We conducted a developer survey to evaluate the perceived relevance and criticality of the newly identified anti-patterns and found that Improper Handling of Node Failures, Using Synchronous Connection, and Inefficient Driver API performance anti-patterns are the most critical data-access performance anti-patterns. The study findings can help improve the quality of data-intensive software systems by raising awareness of practitioners about the impact of the data-access performance anti-patterns. At the same time, the findings will help quality assurance teams to prioritize the correction of performance anti-patterns based on their criticality.\",\"PeriodicalId\":11525,\"journal\":{\"name\":\"Empirical Software Engineering\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Empirical Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10664-024-10535-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10535-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

数据密集型系统处理人类和数字设备产生的可变、大量和高速数据。与传统软件一样，数据密集型系统也容易出现技术欠账，以应对开发人员在时间和资源方面的压力。数据访问是数据密集型系统的关键组成部分，因为它决定了系统的整体性能和功能。虽然数据访问技术债务受到研究界的关注，但影响性能的技术债务却没有得到很好的研究。本研究旨在识别、分类和验证数据访问性能反模式。我们从使用 Java 编程语言实现的基于 NoSQL 和多重持久性的开源数据密集型系统中收集问题，并确定了 14 种新的数据访问性能反模式，分为 7 个高级类别。我们对开发人员进行了调查，以评估新发现的反模式的相关性和关键性，并发现节点故障处理不当、使用同步连接和低效驱动程序 API 性能反模式是最关键的数据访问性能反模式。研究结果有助于提高从业人员对数据访问性能反模式影响的认识，从而提高数据密集型软件系统的质量。同时，研究结果还有助于质量保证团队根据性能反模式的关键程度，确定纠正性能反模式的优先次序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data-access performance anti-patterns in data-intensive systems

Data-intensive systems handle variable, high-volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems, as it determines their overall performance and functionality. While data access technical debts are getting attention from the research community, technical debts that affect performance are not well investigated. This study aims to identify, categorize, and validate data-access performance anti-patterns. We collected issues from NoSQL-based and polyglot persistence open-source data-intensive systems, implemented in Java programing language, and identified 14 new data access-performance anti-patterns categorized under seven high-level categories. We conducted a developer survey to evaluate the perceived relevance and criticality of the newly identified anti-patterns and found that Improper Handling of Node Failures, Using Synchronous Connection, and Inefficient Driver API performance anti-patterns are the most critical data-access performance anti-patterns. The study findings can help improve the quality of data-intensive software systems by raising awareness of practitioners about the impact of the data-access performance anti-patterns. At the same time, the findings will help quality assurance teams to prioritize the correction of performance anti-patterns based on their criticality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.