Pub Date : 2024-03-02DOI: 10.1109/TCC.2024.3393735
Yuchao Zhang;Haoqiang Huang;Ahmed M. Abdelmoniem;Gaoxiong Zeng;Chenyue Zheng;Xirong Que;Wendong Wang;Ke Xu
Due to the fast developments of 5G and IoT technologies, Inter-Datacenter (Inter-DC) networks are facing unprecedented pressure to duplicate large volumes of geographically distributed user data in a real-time manner. Meanwhile, with the expansion of Inter-DC networks scale, link/node failures also become increasingly frequent, negatively affecting the data transmission efficiency. Therefore, link failure recovery methods become of utmost importance. Many works investigated fast failure recovery, yet none of them consider the deployment overhead of such recovery schemes. While in this article, we found that the side-effect of deploying recovery strategies and the future availability of the recovered transmissions are also crucial for fast recovery. So we propose a fast and low-redundancy failure recovery framework, FLAIR, which consists of a fast recovery strategy FRAVaR and a redundancy removal algorithm ROSE. FRAVaR takes full consideration of deployment overhead by minimizing shuffle traffic. On its base, ROSE regularly eliminates the cumulative rerouting redundancy by removing unnecessary routing updates. The experiment results on 4 realistic network topologies show that FLAIR successfully reduces up to 48.2% deployment overhead compared with the state-of-the-art solutions, and thus reduces up to 70.2% recovery speed and improves up to 36% network utilization.
{"title":"FLAIR: A Fast and Low-Redundancy Failure Recovery Framework for Inter Data Center Network","authors":"Yuchao Zhang;Haoqiang Huang;Ahmed M. Abdelmoniem;Gaoxiong Zeng;Chenyue Zheng;Xirong Que;Wendong Wang;Ke Xu","doi":"10.1109/TCC.2024.3393735","DOIUrl":"10.1109/TCC.2024.3393735","url":null,"abstract":"Due to the fast developments of 5G and IoT technologies, Inter-Datacenter (Inter-DC) networks are facing unprecedented pressure to duplicate large volumes of geographically distributed user data in a real-time manner. Meanwhile, with the expansion of Inter-DC networks scale, link/node failures also become increasingly frequent, negatively affecting the data transmission efficiency. Therefore, link failure recovery methods become of utmost importance. Many works investigated fast failure recovery, yet none of them consider the deployment overhead of such recovery schemes. While in this article, we found that the side-effect of deploying recovery strategies and the future availability of the recovered transmissions are also crucial for fast recovery. So we propose a fast and low-redundancy failure recovery framework, FLAIR, which consists of a fast recovery strategy FRAVaR and a redundancy removal algorithm ROSE. FRAVaR takes full consideration of deployment overhead by minimizing shuffle traffic. On its base, ROSE regularly eliminates the cumulative rerouting redundancy by removing unnecessary routing updates. The experiment results on 4 realistic network topologies show that FLAIR successfully reduces up to 48.2% deployment overhead compared with the state-of-the-art solutions, and thus reduces up to 70.2% recovery speed and improves up to 36% network utilization.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"737-749"},"PeriodicalIF":6.5,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140836776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-01DOI: 10.1109/TCC.2024.3372370
Bishakh Chandra Ghosh;Sandip Chakraborty
Multi-cloud environments such as OnApp and Cloudflare have turned the cloud marketplace towards a new horizon where end-users can host applications transparently over different cloud service providers (CSPs) simultaneously by taking the best from each. Existing cloud federations are typically driven by a broker service which provides a trusted interface allowing the participant CSPs and end-users to coordinate. However, such a broker has the limitations of any centralized trusted authority like risk of manipulation, bias, censorship, single point of failure, etc. In this paper, we propose a decentralized trustless cloud federation architecture called CollabCloud