论大数据框架在大业务管理中的应用

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Software-Evolution and Process Pub Date : 2023-12-01 DOI:10.1002/smr.2642

Fedia Ghedass, Faouzi Ben Charrada

{"title":"论大数据框架在大业务管理中的应用","authors":"Fedia Ghedass, Faouzi Ben Charrada","doi":"10.1002/smr.2642","DOIUrl":null,"url":null,"abstract":"Over the last few years, big data have emerged as a paradigm for processing and analyzing a large volume of data. Coupled with other paradigms, such as cloud computing, service computing, and Internet of Things, big data processing takes advantage of the underlying cloud infrastructure, which allows hosting and managing massive amounts of data, while service computing allows to process and deliver various data sources as on-demand services. This synergy between multiple paradigms has led to the emergence of big services, as a cross-domain, large-scale, and big data-centric service model. Apart from the adaptation issues (e.g., need of high reaction to changes) inherited from other service models, the massiveness and heterogeneity of big services add a new factor of complexity to the way such a large-scale service ecosystem is managed in case of execution deviations. Indeed, big services are often subject to frequent deviations at both the functional (e.g., service failure, QoS degradation, and IoT resource unavailability) and data (e.g., data source unavailability or access restrictions) levels. Handling these execution problems is beyond the capacity of traditional web/cloud service management tools, and the majority of big service approaches have targeted specific management operations, such as selection and composition. To maintain a moderate state and high quality of their cross-domain execution, big services should be continuously monitored and managed in a scalable and autonomous way. To cope with the absence of self-management frameworks for large-scale services, the goal of this work is to design an autonomic management solution that takes the whole control of big services in an autonomous and distributed lifecycle process. We combine autonomic computing and big data processing paradigms to endow big services with self-* and parallel processing capabilities. The proposed management framework takes advantage of the well-known MapReduce programming model and Apache Spark and manages big service's related data using knowledge graph technology. We also define a scalable embedding model that allows processing and learning latent big service knowledge in a distributed manner. Finally, a cooperative decision mechanism is defined to trigger non-conflicting management policies in response to the captured deviations of the running big service. Big services' management tasks (monitoring, embedding, and decision), as well as the core modules (autonomic managers' controller, embedding module, and coordinator), are implemented on top of Apache Spark as MapReduce jobs, while the processed data are represented as resilient distributed dataset (RDD) structures. To exploit the shared information exchanged between the workers and the master node (coordinator), and for further resolution of conflicts between management policies, we endowed the proposed framework with a lightweight communication mechanism that allows transferring useful knowledge between the running map-reduce tasks and filtering inappropriate intermediate data (e.g., conflicting actions). The experimental results proved the increased quality of embeddings and the high performance of autonomic managers in a parallel and cooperative setting, thanks to the shared knowledge.","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 7","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the use of big data frameworks in big service management\",\"authors\":\"Fedia Ghedass, Faouzi Ben Charrada\",\"doi\":\"10.1002/smr.2642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last few years, big data have emerged as a paradigm for processing and analyzing a large volume of data. Coupled with other paradigms, such as cloud computing, service computing, and Internet of Things, big data processing takes advantage of the underlying cloud infrastructure, which allows hosting and managing massive amounts of data, while service computing allows to process and deliver various data sources as on-demand services. This synergy between multiple paradigms has led to the emergence of big services, as a cross-domain, large-scale, and big data-centric service model. Apart from the adaptation issues (e.g., need of high reaction to changes) inherited from other service models, the massiveness and heterogeneity of big services add a new factor of complexity to the way such a large-scale service ecosystem is managed in case of execution deviations. Indeed, big services are often subject to frequent deviations at both the functional (e.g., service failure, QoS degradation, and IoT resource unavailability) and data (e.g., data source unavailability or access restrictions) levels. Handling these execution problems is beyond the capacity of traditional web/cloud service management tools, and the majority of big service approaches have targeted specific management operations, such as selection and composition. To maintain a moderate state and high quality of their cross-domain execution, big services should be continuously monitored and managed in a scalable and autonomous way. To cope with the absence of self-management frameworks for large-scale services, the goal of this work is to design an autonomic management solution that takes the whole control of big services in an autonomous and distributed lifecycle process. We combine autonomic computing and big data processing paradigms to endow big services with self-* and parallel processing capabilities. The proposed management framework takes advantage of the well-known MapReduce programming model and Apache Spark and manages big service's related data using knowledge graph technology. We also define a scalable embedding model that allows processing and learning latent big service knowledge in a distributed manner. Finally, a cooperative decision mechanism is defined to trigger non-conflicting management policies in response to the captured deviations of the running big service. Big services' management tasks (monitoring, embedding, and decision), as well as the core modules (autonomic managers' controller, embedding module, and coordinator), are implemented on top of Apache Spark as MapReduce jobs, while the processed data are represented as resilient distributed dataset (RDD) structures. To exploit the shared information exchanged between the workers and the master node (coordinator), and for further resolution of conflicts between management policies, we endowed the proposed framework with a lightweight communication mechanism that allows transferring useful knowledge between the running map-reduce tasks and filtering inappropriate intermediate data (e.g., conflicting actions). The experimental results proved the increased quality of embeddings and the high performance of autonomic managers in a parallel and cooperative setting, thanks to the shared knowledge.\",\"PeriodicalId\":48898,\"journal\":{\"name\":\"Journal of Software-Evolution and Process\",\"volume\":\"36 7\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Software-Evolution and Process\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/smr.2642\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2642","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几年里，大数据已经成为处理和分析大量数据的范例。结合其他范例，如云计算、服务计算和物联网，大数据处理利用底层云基础设施，它允许托管和管理大量数据，而服务计算允许处理和交付各种数据源作为按需服务。多个范例之间的协同作用导致了大服务的出现，即跨域、大规模和以大数据为中心的服务模型。除了从其他服务模型继承的适应问题(例如，需要对更改做出高反应)之外，大型服务的大规模和异构性为这种大规模服务生态系统在执行偏差情况下的管理方式增加了新的复杂性因素。事实上，大型服务通常在功能(例如，服务故障，QoS降级和物联网资源不可用)和数据(例如，数据源不可用或访问限制)级别上经常出现偏差。处理这些执行问题超出了传统web/云服务管理工具的能力，并且大多数大型服务方法都针对特定的管理操作，例如选择和组合。为了保持适度的状态和高质量的跨域执行，应该以可扩展和自治的方式持续监控和管理大型服务。为了解决大规模服务缺乏自我管理框架的问题，这项工作的目标是设计一个自治管理解决方案，在自治和分布式生命周期过程中对大型服务进行全面控制。我们将自主计算和大数据处理模式相结合，赋予大服务自处理和并行处理能力。提出的管理框架利用知名的MapReduce编程模型和Apache Spark，利用知识图技术对大服务的相关数据进行管理。我们还定义了一个可扩展的嵌入模型，允许以分布式的方式处理和学习潜在的大服务知识。最后，定义了一个协作决策机制来触发无冲突的管理策略，以响应正在运行的大型服务的捕获偏差。大型服务的管理任务(监控、嵌入和决策)以及核心模块(自治管理器的控制器、嵌入模块和协调器)都在Apache Spark上作为MapReduce作业实现，而处理的数据则表示为弹性分布式数据集(RDD)结构。为了利用工作节点和主节点(协调器)之间交换的共享信息，并进一步解决管理策略之间的冲突，我们赋予所提出的框架一个轻量级的通信机制，允许在运行的map-reduce任务之间传递有用的知识，并过滤不适当的中间数据(例如，冲突的操作)。实验结果表明，在并行和合作环境下，由于知识共享，自主管理者的嵌入质量和绩效都有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

On the use of big data frameworks in big service management

Over the last few years, big data have emerged as a paradigm for processing and analyzing a large volume of data. Coupled with other paradigms, such as cloud computing, service computing, and Internet of Things, big data processing takes advantage of the underlying cloud infrastructure, which allows hosting and managing massive amounts of data, while service computing allows to process and deliver various data sources as on-demand services. This synergy between multiple paradigms has led to the emergence of big services, as a cross-domain, large-scale, and big data-centric service model. Apart from the adaptation issues (e.g., need of high reaction to changes) inherited from other service models, the massiveness and heterogeneity of big services add a new factor of complexity to the way such a large-scale service ecosystem is managed in case of execution deviations. Indeed, big services are often subject to frequent deviations at both the functional (e.g., service failure, QoS degradation, and IoT resource unavailability) and data (e.g., data source unavailability or access restrictions) levels. Handling these execution problems is beyond the capacity of traditional web/cloud service management tools, and the majority of big service approaches have targeted specific management operations, such as selection and composition. To maintain a moderate state and high quality of their cross-domain execution, big services should be continuously monitored and managed in a scalable and autonomous way. To cope with the absence of self-management frameworks for large-scale services, the goal of this work is to design an autonomic management solution that takes the whole control of big services in an autonomous and distributed lifecycle process. We combine autonomic computing and big data processing paradigms to endow big services with self-* and parallel processing capabilities. The proposed management framework takes advantage of the well-known MapReduce programming model and Apache Spark and manages big service's related data using knowledge graph technology. We also define a scalable embedding model that allows processing and learning latent big service knowledge in a distributed manner. Finally, a cooperative decision mechanism is defined to trigger non-conflicting management policies in response to the captured deviations of the running big service. Big services' management tasks (monitoring, embedding, and decision), as well as the core modules (autonomic managers' controller, embedding module, and coordinator), are implemented on top of Apache Spark as MapReduce jobs, while the processed data are represented as resilient distributed dataset (RDD) structures. To exploit the shared information exchanged between the workers and the master node (coordinator), and for further resolution of conflicts between management policies, we endowed the proposed framework with a lightweight communication mechanism that allows transferring useful knowledge between the running map-reduce tasks and filtering inappropriate intermediate data (e.g., conflicting actions). The experimental results proved the increased quality of embeddings and the high performance of autonomic managers in a parallel and cooperative setting, thanks to the shared knowledge.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109