Pub Date : 2025-11-06DOI: 10.1109/tsc.2025.3629534
Sandeep Singh Sikarwar, Rakesh Kumar, Benay Kumar Ray
{"title":"Energy Efficient Resource Sharing in Trustworthy Federated Cloud Environment: A Bayesian Game and Double Auction Based Approach","authors":"Sandeep Singh Sikarwar, Rakesh Kumar, Benay Kumar Ray","doi":"10.1109/tsc.2025.3629534","DOIUrl":"https://doi.org/10.1109/tsc.2025.3629534","url":null,"abstract":"","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"30 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2025-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implementing large language models (LLMs)-driven root cause analysis (RCA) in cloud-native systems has become a key topic of modern software operations and maintenance. However, existing LLM-based approaches face three key challenges: multi-modality input constraint, context window limitation, and dynamic dependence graph. To address these issues, we propose a tool-assisted LLM agent with multi-modality observation data for fine-grained RCA, namely TAMO, including multi-modality alignment tool, root cause localization tool, and fault types classification tool. In detail, TAMO unifies multi-modal observation data into time-aligned representations for cross-modal feature consistency. Based on the unified representations, TAMO then invokes its specialized root cause localization tool and fault types classification tool for further identifying root cause and fault type underlying system context. This approach overcomes the limitations of LLMs in processing real-time raw observational data and dynamic service dependencies, guiding the model to generate repair strategies that align with system context through structured prompt design. Experiments on two benchmark datasets demonstrate that TAMO outperforms state-of-the-art (SOTA) approaches with comparable performance.
{"title":"TAMO:Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent With Multi-Modality Observation Data in Cloud-Native Systems","authors":"Xiao Zhang;Qi Wang;Mingyi Li;Yuan Yuan;Mengbai Xiao;Fuzhen Zhuang;Dongxiao Yu","doi":"10.1109/TSC.2025.3629066","DOIUrl":"10.1109/TSC.2025.3629066","url":null,"abstract":"Implementing large language models (LLMs)-driven root cause analysis (RCA) in cloud-native systems has become a key topic of modern software operations and maintenance. However, existing LLM-based approaches face three key challenges: multi-modality input constraint, context window limitation, and dynamic dependence graph. To address these issues, we propose a tool-assisted LLM agent with multi-modality observation data for fine-grained RCA, namely TAMO, including multi-modality alignment tool, root cause localization tool, and fault types classification tool. In detail, TAMO unifies multi-modal observation data into time-aligned representations for cross-modal feature consistency. Based on the unified representations, TAMO then invokes its specialized root cause localization tool and fault types classification tool for further identifying root cause and fault type underlying system context. This approach overcomes the limitations of LLMs in processing real-time raw observational data and dynamic service dependencies, guiding the model to generate repair strategies that align with system context through structured prompt design. Experiments on two benchmark datasets demonstrate that TAMO outperforms state-of-the-art (SOTA) approaches with comparable performance.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 6","pages":"4221-4233"},"PeriodicalIF":5.8,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145447355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-31DOI: 10.1109/TSC.2025.3627934
Nicolas Buitrago;Hector Camacho;Miguel Jimeno;Cesar Viloria-Nuñez;Jairo A. Cardona;Augusto Salazar
Serverless computing has revolutionized application development but faces a significant challenge: cold starts, which introduce delays when a function is called after a period of inactivity. Addressing these delays is crucial because they affect efficiency, performance, cost, and scalability. Existing mitigation strategies come with trade-offs, such as increased resource overhead and the need for precise resource management predictions. Also, optimizing the function startup process requires detailed knowledge of the runtime characteristics and the isolation technique used, such as using a container-based or a micro virtual machine setup. This work presents PARSEC, a comprehensive solution for cold start issues in serverless computing. By focusing on reducing initialization latency in idle containers, this research seeks to preserve scalability and ease of deployment features of serverless computing while overcoming cold start limitations. The proposed architecture improved cold start by streamlining the initialization of containers to reduce overhead. This involves minimizing unnecessary operations and customizing launches for serverless needs, aiming for a faster and more efficient setup. It also enhances the provisioning of Zygotes to speed up sandbox launches. The results show better performance for PARSEC when compared with other architectures, particularly at shorter wait times, suggesting effective cold start management. The strategic management of Zygotes and their provisioning scaling plays a critical role in managing large numbers of packages and instances, thereby enhancing the performance of package management. The cache system also evolves to become more selective, reducing overhead by focusing on essential packages.
{"title":"PARSEC: An Adaptive and Efficient Platform for Reducing Cold Start in Serverless Computing","authors":"Nicolas Buitrago;Hector Camacho;Miguel Jimeno;Cesar Viloria-Nuñez;Jairo A. Cardona;Augusto Salazar","doi":"10.1109/TSC.2025.3627934","DOIUrl":"10.1109/TSC.2025.3627934","url":null,"abstract":"Serverless computing has revolutionized application development but faces a significant challenge: cold starts, which introduce delays when a function is called after a period of inactivity. Addressing these delays is crucial because they affect efficiency, performance, cost, and scalability. Existing mitigation strategies come with trade-offs, such as increased resource overhead and the need for precise resource management predictions. Also, optimizing the function startup process requires detailed knowledge of the runtime characteristics and the isolation technique used, such as using a container-based or a micro virtual machine setup. This work presents PARSEC, a comprehensive solution for cold start issues in serverless computing. By focusing on reducing initialization latency in idle containers, this research seeks to preserve scalability and ease of deployment features of serverless computing while overcoming cold start limitations. The proposed architecture improved cold start by streamlining the initialization of containers to reduce overhead. This involves minimizing unnecessary operations and customizing launches for serverless needs, aiming for a faster and more efficient setup. It also enhances the provisioning of Zygotes to speed up sandbox launches. The results show better performance for PARSEC when compared with other architectures, particularly at shorter wait times, suggesting effective cold start management. The strategic management of Zygotes and their provisioning scaling plays a critical role in managing large numbers of packages and instances, thereby enhancing the performance of package management. The cache system also evolves to become more selective, reducing overhead by focusing on essential packages.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 6","pages":"4082-4095"},"PeriodicalIF":5.8,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145412105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1109/TSC.2025.3625955
Jing Liu;Quan Zhou;Xingyu Chen;Jiantao Zhou;Keqin Li
Docker has evolved into core container platform in cloud-native environments. However, it is susceptible to software aging problem after long-time running, which seriously impairs the overall system reliability and performance. Existing aging related research mainly focuses on verifying the presence of software aging phenomena and predicting resource consumption changes caused by it, with insufficient research on how to implement targeted and effective rejuvenation strategies on the Docker platform. This paper proposes an integrated framework, called TiAD-DQR, to comprehensively and effectively mitigate the platform aging challenge, where Trend Decomposition Dense Encoder (TDDE) and Gaussian Mixture Aging Detection (GMAD) are combined for accurate determination of aging states and then to assist in intelligent rejuvenation decision generation based on the Double Q-Learning (DQL). The sufficient experimental results show that our TiAD-DQR can effectively delay the software aging process, maximize system availability, and significantly improve the service quality and system stability of the Docker platform.
{"title":"TiAD-DQR: Software Aging States Determination and Rejuvenation Decision Generation for Docker Platform","authors":"Jing Liu;Quan Zhou;Xingyu Chen;Jiantao Zhou;Keqin Li","doi":"10.1109/TSC.2025.3625955","DOIUrl":"10.1109/TSC.2025.3625955","url":null,"abstract":"Docker has evolved into core container platform in cloud-native environments. However, it is susceptible to software aging problem after long-time running, which seriously impairs the overall system reliability and performance. Existing aging related research mainly focuses on verifying the presence of software aging phenomena and predicting resource consumption changes caused by it, with insufficient research on how to implement targeted and effective rejuvenation strategies on the Docker platform. This paper proposes an integrated framework, called TiAD-DQR, to comprehensively and effectively mitigate the platform aging challenge, where Trend Decomposition Dense Encoder (TDDE) and Gaussian Mixture Aging Detection (GMAD) are combined for accurate determination of aging states and then to assist in intelligent rejuvenation decision generation based on the Double Q-Learning (DQL). The sufficient experimental results show that our TiAD-DQR can effectively delay the software aging process, maximize system availability, and significantly improve the service quality and system stability of the Docker platform.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"18 6","pages":"4248-4260"},"PeriodicalIF":5.8,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145381310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}