IEEE Transactions on Cloud Computing最新文献_第10页

Locality-Aware and Fault-Tolerant Batching for Machine Learning on Distributed Datasets 分布式数据集上机器学习的本地感知和容错批处理

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2024-01-09 DOI: 10.1109/TCC.2024.3351716

Liu Liu;Zhijun Ding;Dazhao Cheng;Xiaobo Zhou

The performance of distributed ML training is largely determined by workers that generate gradients in the slowest pace, i.e., stragglers. The state-of-the-art load balancing approaches consider that each worker stores a complete dataset locally and the data fetching time can be ignored. They only consider the computation capacity of workers in equalizing the gradient computation time. However, we find that in scenarios of ML on distributed datasets, whether in edge computing or distributed data cache systems, the data fetching time is non-negligible and often becomes the primary cause of stragglers. In this paper, we present LOFT, an adaptive load balancing approach for ML upon distributed datasets at the edge. It aims to balance the time to generate gradients at each worker while ensuring the model accuracy. Specifically, LOFT features a locality-aware batching. It builds performance and optimization models upon data fetching and gradient computation time. Leveraging the models, it develops an adaptive scheme based on grid search. Furthermore, it offers Byzantine gradient aggregation upon Ring All-Reduce, which makes itself fault-tolerant under Byzantine gradients brought by a small batch size. Experiments with twelve public DNN models and four open datasets show that LOFT reduces the training time by up to 46%, while reducing the training loss by up to 67% compared to LB-BSP.

分布式人工智能训练的性能在很大程度上取决于以最慢的速度生成梯度的工作者，即落伍者。最先进的负载均衡方法认为，每个工作者都在本地存储一个完整的数据集，数据获取时间可以忽略不计。在均衡梯度计算时间时，它们只考虑工作者的计算能力。然而，我们发现，在分布式数据集上的 ML 场景中，无论是在边缘计算还是分布式数据缓存系统中，数据获取时间都是不可忽略的，往往成为造成滞后的主要原因。在本文中，我们介绍了 LOFT，一种用于边缘分布式数据集上的 ML 的自适应负载平衡方法。它旨在平衡每个工作者生成梯度的时间，同时确保模型的准确性。具体来说，LOFT 具有局部感知批处理功能。它根据数据获取和梯度计算时间建立性能和优化模型。利用这些模型，它开发了一种基于网格搜索的自适应方案。此外，它还在环形全还原（Ring All-Reduce）基础上提供拜占庭梯度聚合，从而在批量较小带来拜占庭梯度的情况下实现容错。用 12 个公共 DNN 模型和 4 个开放数据集进行的实验表明，与 LB-BSP 相比，LOFT 最多缩短了 46% 的训练时间，同时最多减少了 67% 的训练损失。

{"title":"Locality-Aware and Fault-Tolerant Batching for Machine Learning on Distributed Datasets","authors":"Liu Liu;Zhijun Ding;Dazhao Cheng;Xiaobo Zhou","doi":"10.1109/TCC.2024.3351716","DOIUrl":"10.1109/TCC.2024.3351716","url":null,"abstract":"The performance of distributed ML training is largely determined by workers that generate gradients in the slowest pace, i.e., stragglers. The state-of-the-art load balancing approaches consider that each worker stores a complete dataset locally and the data fetching time can be ignored. They only consider the computation capacity of workers in equalizing the gradient computation time. However, we find that in scenarios of ML on distributed datasets, whether in edge computing or distributed data cache systems, the data fetching time is non-negligible and often becomes the primary cause of stragglers. In this paper, we present LOFT, an adaptive load balancing approach for ML upon distributed datasets at the edge. It aims to balance the time to generate gradients at each worker while ensuring the model accuracy. Specifically, LOFT features a locality-aware batching. It builds performance and optimization models upon data fetching and gradient computation time. Leveraging the models, it develops an adaptive scheme based on grid search. Furthermore, it offers Byzantine gradient aggregation upon Ring All-Reduce, which makes itself fault-tolerant under Byzantine gradients brought by a small batch size. Experiments with twelve public DNN models and four open datasets show that LOFT reduces the training time by up to 46%, while reducing the training loss by up to 67% compared to LB-BSP.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"370-387"},"PeriodicalIF":6.5,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing BatOpt：使用动态批处理优化基于 GPU 的深度学习推理

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2024-01-08 DOI: 10.1109/TCC.2024.3350561

Deyu Zhang;Yunzhen Luo;Yaobo Wang;Xiaoyan Kui;Ju Ren

Deep learning (DL) has been applied in billions of mobile devices due to its astonishing performance in image, text, and audio processing. However, limited by the computing capability of mobile devices, a large amount of DL inference tasks need to be offloaded to edge or cloud servers, which makes powerful GPU servers are struggling to ensure the quality of service(QoS). To better utilize the highly parallel computing architecture of GPU to improve the QoS, we propose BatOpt, a framework that uses dynamic batch processing to strike a good balance between service latency and GPU memory usage in DL inference services. Specifically, BatOpt innovatively models the DL inference service as a

$M/G(a,b)/1/N$

queue, with the consideration of stochastic task arrivals, which enables it to predict the service latency accurately in different system states. Furthermore, we propose an optimization algorithm to trade off the service latency and GPU memory usage in different system states by analyzing the queueing model. We have implemented BatOpt on Pytorch and evaluated it on an RTX 2080 GPU using real DL models. BatOpt brings up to 31x and 4.3x times performance boost in terms of service latency, compared to single-input and fixed-batch-size strategies, respectively. And BatOpt's maximum GPU memory usage is only 0.3x that of greedy-dynamic-batch-size strategy on the premise of the same service latency.

深度学习（DL）因其在图像、文本和音频处理方面的惊人性能，已被应用于数十亿台移动设备。然而，受限于移动设备的计算能力，大量的深度学习推理任务需要卸载到边缘或云服务器上，这使得功能强大的 GPU 服务器难以保证服务质量（QoS）。为了更好地利用GPU的高度并行计算架构来提高服务质量，我们提出了BatOpt，一个利用动态批处理在DL推理服务中实现服务延迟和GPU内存使用之间良好平衡的框架。具体来说，BatOpt 创新性地将 DL 推理服务建模为 $M/G(a,b)/1/N$ 队列，并考虑到随机任务到达，从而能够准确预测不同系统状态下的服务延迟。此外，我们还提出了一种优化算法，通过分析队列模型来权衡不同系统状态下的服务延迟和 GPU 内存使用量。我们在 Pytorch 上实现了 BatOpt，并使用真实的 DL 模型在 RTX 2080 GPU 上进行了评估。与单输入和固定批量大小策略相比，BatOpt 在服务延迟方面的性能分别提高了 31 倍和 4.3 倍。在服务延迟相同的前提下，BatOpt 的最大 GPU 内存使用量仅为贪婪动态批量大小策略的 0.3 倍。

{"title":"BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing","authors":"Deyu Zhang;Yunzhen Luo;Yaobo Wang;Xiaoyan Kui;Ju Ren","doi":"10.1109/TCC.2024.3350561","DOIUrl":"10.1109/TCC.2024.3350561","url":null,"abstract":"Deep learning (DL) has been applied in billions of mobile devices due to its astonishing performance in image, text, and audio processing. However, limited by the computing capability of mobile devices, a large amount of DL inference tasks need to be offloaded to edge or cloud servers, which makes powerful GPU servers are struggling to ensure the quality of service(QoS). To better utilize the highly parallel computing architecture of GPU to improve the QoS, we propose BatOpt, a framework that uses dynamic batch processing to strike a good balance between service latency and GPU memory usage in DL inference services. Specifically, BatOpt innovatively models the DL inference service as a \u0000<inline-formula><tex-math>$M/G(a,b)/1/N$</tex-math></inline-formula>\u0000 queue, with the consideration of stochastic task arrivals, which enables it to predict the service latency accurately in different system states. Furthermore, we propose an optimization algorithm to trade off the service latency and GPU memory usage in different system states by analyzing the queueing model. We have implemented BatOpt on Pytorch and evaluated it on an RTX 2080 GPU using real DL models. BatOpt brings up to 31x and 4.3x times performance boost in terms of service latency, compared to single-input and fixed-batch-size strategies, respectively. And BatOpt's maximum GPU memory usage is only 0.3x that of greedy-dynamic-batch-size strategy on the premise of the same service latency.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"174-185"},"PeriodicalIF":6.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

psvCNN: A Zero-Knowledge CNN Prediction Integrity Verification Strategy psvCNN：零知识 CNN 预测完整性验证策略

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2024-01-05 DOI: 10.1109/TCC.2024.3350233

Yongkai Fan;Binyuan Xu;Linlin Zhang;Gang Tan;Shui Yu;Kuan-Ching Li;Albert Zomaya

Model prediction based on machine learning is provided as a service in cloud environments, but how to verify that the model prediction service is entirely conducted becomes a critical challenge. Although zero-knowledge proof techniques potentially solve the integrity verification problem, when applied to the prediction integrity of massive privacy-preserving Convolutional Neural Networks (CNNs), the significant proof burden results in low practicality. In this research, we present psvCNN (parallel splitting zero-knowledge technique for integrity verification). The psvCNN scheme effectively improves the utilization of computational resources in CNN prediction integrity proving by an independent splitting design. Through a convolutional kernel-based model splitting design and an underlying zero-knowledge succinct non-interactive knowledge argument, our psvCNN develops parallelizable zero-knowledge proof circuits for CNN prediction. Furthermore, psvCNN presents an updated Freivalds algorithm for a faster integrity verification process. In terms of proof time and storage, experiments show that psvCNN is practical and efficient. psvCNN generates a prediction integrity proof with a proof size of 1.2MB in 7.65s for the structurally complicated CNN model VGG16. psvCNN is 3765 times quicker than the latest zk-SNARK-based non-interactive method vCNN and 12 times faster than the latest sumcheck-based interactive technique zkCNN in terms of proving time.

在云环境中，基于机器学习的模型预测被作为一种服务提供，但如何验证模型预测服务是否完全进行成为一个关键挑战。尽管零知识证明技术有可能解决完整性验证问题，但当应用于大规模隐私保护卷积神经网络（CNN）的预测完整性时，证明负担过重导致实用性较低。在这项研究中，我们提出了 psvCNN（用于完整性验证的并行分裂零知识技术）。psvCNN 方案通过独立拆分设计，有效提高了 CNN 预测完整性证明中计算资源的利用率。通过基于卷积核的模型拆分设计和底层零知识简洁非交互知识论证，我们的 psvCNN 开发出了用于 CNN 预测的可并行零知识证明电路。此外，psvCNN 还提出了一种更新的 Freivalds 算法，以加快完整性验证过程。在证明时间和存储方面，实验表明 psvCNN 是实用而高效的。对于结构复杂的 CNN 模型 VGG16，psvCNN 在 7.65s 内生成了证明大小为 1.2MB 的预测完整性证明。

{"title":"psvCNN: A Zero-Knowledge CNN Prediction Integrity Verification Strategy","authors":"Yongkai Fan;Binyuan Xu;Linlin Zhang;Gang Tan;Shui Yu;Kuan-Ching Li;Albert Zomaya","doi":"10.1109/TCC.2024.3350233","DOIUrl":"10.1109/TCC.2024.3350233","url":null,"abstract":"Model prediction based on machine learning is provided as a service in cloud environments, but how to verify that the model prediction service is entirely conducted becomes a critical challenge. Although zero-knowledge proof techniques potentially solve the integrity verification problem, when applied to the prediction integrity of massive privacy-preserving Convolutional Neural Networks (CNNs), the significant proof burden results in low practicality. In this research, we present psvCNN (parallel splitting zero-knowledge technique for integrity verification). The psvCNN scheme effectively improves the utilization of computational resources in CNN prediction integrity proving by an independent splitting design. Through a convolutional kernel-based model splitting design and an underlying zero-knowledge succinct non-interactive knowledge argument, our psvCNN develops parallelizable zero-knowledge proof circuits for CNN prediction. Furthermore, psvCNN presents an updated Freivalds algorithm for a faster integrity verification process. In terms of proof time and storage, experiments show that psvCNN is practical and efficient. psvCNN generates a prediction integrity proof with a proof size of 1.2MB in 7.65s for the structurally complicated CNN model VGG16. psvCNN is 3765 times quicker than the latest zk-SNARK-based non-interactive method vCNN and 12 times faster than the latest sumcheck-based interactive technique zkCNN in terms of proving time.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 2","pages":"359-369"},"PeriodicalIF":6.5,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VSA-SD: A Service Discovery Method Based on Vector Symbol Architecture for Low-Cost IoT System Development VSA-SD：基于矢量符号架构的服务发现方法，适用于低成本物联网系统开发

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-12-19 DOI: 10.1109/TCC.2023.3344512

Haiming Chen;Lei Wang;Wei Qin;Xinyan Zhou;Li Cui

In recent years, with the widening applications of the Internet of Things (IoT), more and more perception services (e.g., air quality indicator services, road traffic congestion monitoring services, etc) with different arguments (e.g., data type, source location, creator, etc) will be deployed by dedicated IT infrastructure service providers for constructing customized IoT systems with low cost by subscription. So it is an indispensable step to check whether the required perception services with specified arguments have been available for the constructing IoT through discovery method to reduce the redundancy of service deployment. However, it is a challenging problem to design efficient (i.e., achieving high accuracy and low response delay with low overhead), highly robust, and trustworthy mechanisms for discovering perception services on resource-constrained IoT devices. To solve this problem, we proposed a distributed service discovery method, named VSA-SD, based on the Vector Symbolic Architecture (VSA). This method employs hyperdimensional vectors to describe services in a distributed manner, and measures the degree of service matching by calculating the Hamming distance, thereby achieving service discovery. We implemented VSA-SD in NBUFlow, which is an IoT task construction and offloading test platform, and evaluated its performance through comprehensive experiments. Results show that VSA-SD outperforms the centralized, hybrid, and other distributed service discovery mechanisms in terms of accuracy, response delay, overhead, robustness, trustability, interoperability, and mobility.

近年来，随着物联网（IoT）应用领域的不断扩大，越来越多的感知服务（如空气质量指标服务、道路交通拥堵监测服务等）将由专门的 IT 基础设施服务提供商部署，这些服务具有不同的参数（如数据类型、来源位置、创建者等），通过订阅的方式以低成本构建定制化的物联网系统。因此，通过发现方法检查是否已为构建物联网提供了所需的具有指定参数的感知服务，以减少服务部署的冗余，是一个不可或缺的步骤。然而，如何在资源受限的物联网设备上设计高效（即实现高准确性、低响应延迟和低开销）、高鲁棒性和可信的感知服务发现机制是一个具有挑战性的问题。为了解决这个问题，我们提出了一种基于矢量符号架构（VSA）的分布式服务发现方法，命名为 VSA-SD。该方法采用超维向量以分布式方式描述服务，并通过计算汉明距离来衡量服务匹配程度，从而实现服务发现。我们在物联网任务构建与卸载测试平台NBUFlow中实现了VSA-SD，并通过综合实验评估了其性能。结果表明，VSA-SD 在准确性、响应延迟、开销、鲁棒性、可信任度、互操作性和移动性等方面都优于集中式、混合式和其他分布式服务发现机制。

{"title":"VSA-SD: A Service Discovery Method Based on Vector Symbol Architecture for Low-Cost IoT System Development","authors":"Haiming Chen;Lei Wang;Wei Qin;Xinyan Zhou;Li Cui","doi":"10.1109/TCC.2023.3344512","DOIUrl":"https://doi.org/10.1109/TCC.2023.3344512","url":null,"abstract":"In recent years, with the widening applications of the Internet of Things (IoT), more and more perception services (e.g., air quality indicator services, road traffic congestion monitoring services, etc) with different arguments (e.g., data type, source location, creator, etc) will be deployed by dedicated IT infrastructure service providers for constructing customized IoT systems with low cost by subscription. So it is an indispensable step to check whether the required perception services with specified arguments have been available for the constructing IoT through discovery method to reduce the redundancy of service deployment. However, it is a challenging problem to design efficient (i.e., achieving high accuracy and low response delay with low overhead), highly robust, and trustworthy mechanisms for discovering perception services on resource-constrained IoT devices. To solve this problem, we proposed a distributed service discovery method, named VSA-SD, based on the Vector Symbolic Architecture (VSA). This method employs hyperdimensional vectors to describe services in a distributed manner, and measures the degree of service matching by calculating the Hamming distance, thereby achieving service discovery. We implemented VSA-SD in NBUFlow, which is an IoT task construction and offloading test platform, and evaluated its performance through comprehensive experiments. Results show that VSA-SD outperforms the centralized, hybrid, and other distributed service discovery mechanisms in terms of accuracy, response delay, overhead, robustness, trustability, interoperability, and mobility.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"145-158"},"PeriodicalIF":6.5,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Mode Instance-Intensive Workflow Task Batch Scheduling in Containerized Hybrid Cloud 容器化混合云中的多模式实例密集型工作流任务批量调度

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-12-19 DOI: 10.1109/TCC.2023.3344194

An Liu;Ming Gao;Jiafu Tang

The migration of containerized microservices from virtual machines (VMs) to cloud data centers has become the most advanced deployment technique for large software applications in the cloud. This study investigates the scheduling of instance-intensive workflow (IWF) tasks to be executed in containers on a hybrid cloud when computational resources are limited. The process of scheduling these IWF tasks becomes complicated when considering the deployment time of containers, inter-task communication time, and their dependencies simultaneously, particularly when the task can choose multi-mode executions due to the flexible computational resource allocation of the container. We propose a batch scheduling strategy (BSS) for the IWF task scheduling problem. The BSS prioritizes the execution of IWF tasks with high repetition rates with a certain probability and records the virtual machines and modes selected for task execution, which can reduce the data transfer time and the randomness of computation. Based on this, we use an improved hybrid algorithm combined with BSS to solve the multi-mode IWF task scheduling problem. The experimental results demonstrate that employing the BSS can reduce the scheduling time by 6% when the number of workflows increases to 80. Additionally, we tested the effectiveness of all operators in the algorithm, and the results show that each step of the algorithm yields good performance. Compared to similar algorithms in related studies, the overall algorithm can achieve a maximum reduction of approximately 18% in the target value.

将容器化微服务从虚拟机（VM）迁移到云数据中心已成为云中大型软件应用程序最先进的部署技术。本研究探讨了在计算资源有限的情况下，混合云上容器执行的实例密集型工作流（IWF）任务的调度问题。当同时考虑容器的部署时间、任务间的通信时间以及它们之间的依赖关系时，这些 IWF 任务的调度过程就变得复杂了，尤其是当任务由于容器灵活的计算资源分配而可以选择多模式执行时。我们针对 IWF 任务调度问题提出了一种批量调度策略（BSS）。BSS 以一定概率优先执行重复率高的 IWF 任务，并记录任务执行所选择的虚拟机和模式，从而减少数据传输时间和计算的随机性。在此基础上，我们采用改进的混合算法结合 BSS 解决多模式 IWF 任务调度问题。实验结果表明，当工作流数量增加到 80 个时，采用 BSS 可以减少 6% 的调度时间。此外，我们还测试了算法中所有算子的有效性，结果表明算法的每个步骤都能产生良好的性能。与相关研究中的类似算法相比，整个算法的目标值最多可减少约 18%。

{"title":"Multi-Mode Instance-Intensive Workflow Task Batch Scheduling in Containerized Hybrid Cloud","authors":"An Liu;Ming Gao;Jiafu Tang","doi":"10.1109/TCC.2023.3344194","DOIUrl":"https://doi.org/10.1109/TCC.2023.3344194","url":null,"abstract":"The migration of containerized microservices from virtual machines (VMs) to cloud data centers has become the most advanced deployment technique for large software applications in the cloud. This study investigates the scheduling of instance-intensive workflow (IWF) tasks to be executed in containers on a hybrid cloud when computational resources are limited. The process of scheduling these IWF tasks becomes complicated when considering the deployment time of containers, inter-task communication time, and their dependencies simultaneously, particularly when the task can choose multi-mode executions due to the flexible computational resource allocation of the container. We propose a batch scheduling strategy (BSS) for the IWF task scheduling problem. The BSS prioritizes the execution of IWF tasks with high repetition rates with a certain probability and records the virtual machines and modes selected for task execution, which can reduce the data transfer time and the randomness of computation. Based on this, we use an improved hybrid algorithm combined with BSS to solve the multi-mode IWF task scheduling problem. The experimental results demonstrate that employing the BSS can reduce the scheduling time by 6% when the number of workflows increases to 80. Additionally, we tested the effectiveness of all operators in the algorithm, and the results show that each step of the algorithm yields good performance. Compared to similar algorithms in related studies, the overall algorithm can achieve a maximum reduction of approximately 18% in the target value.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"159-173"},"PeriodicalIF":6.5,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pay-Per-Proof: Decentralized Outsourced Multi-User PoR for Cloud Storage Payment Using Blockchain 按验证付费：使用区块链支付云存储的去中心化外包多用户 PoR

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-12-18 DOI: 10.1109/TCC.2023.3343710

Hui Cui;Zhiguo Wan;Tianyu Zhaolu;Huaqun Wang;Atsuko Miyaji

Cloud computing has been widely applied in data storage, but cloud computing is not armed with an efficient integrity check mechanism for users to learn whether their large volumes of data have been kept intact by the cloud. The concept of proofs of retrievability (PoR) was introduced to address such an issue by enabling users to check the integrity of their data stored by the cloud. But PoR requires users to regularly send queries to the cloud, and its integrity check method cannot be extended to share the verification responsibility in the multi-user setting where different users store the same data to the cloud. With such concerns in mind, we put forth a notion called outsourced multi-user proofs of retrievability (

$mathtt {OMTPoR}$

) which allows users with the same data stored by the cloud to share the information for the integrity check, and a third party is required to regularly check data integrity on behalf of users using the shared information. We give a concrete construction of

$mathtt {OMTPoR}$

based on the homomorphic property of an existing property and analyze its security. To enforce honest integrity checks, we build the concrete

$mathtt {OMTPoR}$

construction over the blockchain using smart contracts to guarantee the honesty of participants, yielding a decentralized outsourced multi-user PoR solution that utilizes the blockchain miners as the third parties. Furthermore, our solution enables the cloud server to obtain payment for the storage service if the PoR is verified by the miners. We fully implement the

$mathtt {OMTPoR}$

scheme over the blockchain to evaluate its performance, which demonstrates obvious superiority over traditional PoR schemes without the detection of data duplication.

云计算已被广泛应用于数据存储领域，但云计算并不具备有效的完整性检查机制，用户无法了解云计算是否完整地保存了他们的大量数据。为了解决这个问题，人们提出了可检索性证明（PoR）的概念，使用户能够检查云存储数据的完整性。但是，PoR 要求用户定期向云发送查询，而且其完整性检查方法无法扩展到在多用户设置（不同用户向云存储相同数据）中分担验证责任。有鉴于此，我们提出了一种名为外包多用户可检索性证明（$mathtt {OMTPoR}$）的概念，它允许云存储相同数据的用户共享完整性检查信息，并要求第三方定期使用共享信息代表用户检查数据的完整性。我们基于现有属性的同态属性给出了$mathtt {OMTPoR}$的具体构造，并分析了其安全性。为了执行诚实完整性检查，我们在区块链上使用智能合约构建了具体的 $mathtt {OMTPoR}$ 结构，以保证参与者的诚实性，从而产生了一种利用区块链矿工作为第三方的去中心化外包多用户 PoR 解决方案。此外，如果 PoR 得到矿工的验证，我们的解决方案还能让云服务器获得存储服务的付款。我们在区块链上完全实现了$mathtt {OMTPoR}$方案，以评估其性能，该方案在不检测数据重复的情况下明显优于传统的PoR方案。

{"title":"Pay-Per-Proof: Decentralized Outsourced Multi-User PoR for Cloud Storage Payment Using Blockchain","authors":"Hui Cui;Zhiguo Wan;Tianyu Zhaolu;Huaqun Wang;Atsuko Miyaji","doi":"10.1109/TCC.2023.3343710","DOIUrl":"https://doi.org/10.1109/TCC.2023.3343710","url":null,"abstract":"Cloud computing has been widely applied in data storage, but cloud computing is not armed with an efficient integrity check mechanism for users to learn whether their large volumes of data have been kept intact by the cloud. The concept of proofs of retrievability (PoR) was introduced to address such an issue by enabling users to check the integrity of their data stored by the cloud. But PoR requires users to regularly send queries to the cloud, and its integrity check method cannot be extended to share the verification responsibility in the multi-user setting where different users store the same data to the cloud. With such concerns in mind, we put forth a notion called outsourced multi-user proofs of retrievability (\u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000) which allows users with the same data stored by the cloud to share the information for the integrity check, and a third party is required to regularly check data integrity on behalf of users using the shared information. We give a concrete construction of \u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000 based on the homomorphic property of an existing property and analyze its security. To enforce honest integrity checks, we build the concrete \u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000 construction over the blockchain using smart contracts to guarantee the honesty of participants, yielding a decentralized outsourced multi-user PoR solution that utilizes the blockchain miners as the third parties. Furthermore, our solution enables the cloud server to obtain payment for the storage service if the PoR is verified by the miners. We fully implement the \u0000<inline-formula><tex-math>$mathtt {OMTPoR}$</tex-math></inline-formula>\u0000 scheme over the blockchain to evaluate its performance, which demonstrates obvious superiority over traditional PoR schemes without the detection of data duplication.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"130-144"},"PeriodicalIF":6.5,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Approximation Algorithms for Scheduling Coflows With Total Weighted Completion Time in Identical Parallel Networks 在完全相同的并行网络中调度具有总加权完成时间的共同流的高效近似算法

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-12-08 DOI: 10.1109/TCC.2023.3340729

Chi-Yeh Chen

This article addresses the scheduling problem of coflows in identical parallel networks, a well-known

$mathcal {NP}$

-hard problem. We consider both flow-level scheduling and coflow-level scheduling problems. In the flow-level scheduling problem, flows within a coflow can be transmitted through different network cores, while in the coflow-level scheduling problem, flows within a coflow must be transmitted through the same network core. The key difference between these two problems lies in their scheduling granularity. Previous approaches relied on linear programming to solve the scheduling order. In this article, we enhance the efficiency of solving by utilizing the primal-dual method. For the flow-level scheduling problem, we propose an approximation algorithm that achieves approximation ratios of

$6-frac{2}{m}$

and

$5-frac{2}{m}$

for arbitrary and zero release times, respectively, where

$m$

represents the number of network cores. Additionally, for the coflow-level scheduling problem, we introduce an approximation algorithm that achieves approximation ratios of

$4m+1$

and

$text{4}m$

for arbitrary and zero release times, respectively. The algorithm presented in this article has practical applications in data centers, such as those operated by Google or Facebook. The simulated results demonstrate the superior performance of our algorithms compared to previous approach, emphasizing their practical utility.

本文探讨了相同并行网络中的同流调度问题，这是一个著名的 $mathcal {NP}$ 难问题。我们同时考虑了流级调度和同流级调度问题。在流级调度问题中，共流中的流可以通过不同的网络核心传输，而在共流级调度问题中，共流中的流必须通过同一网络核心传输。这两个问题的关键区别在于它们的调度粒度。以往的方法依赖线性规划来解决调度顺序问题。在本文中，我们利用基元二元方法提高了求解效率。对于流级调度问题，我们提出了一种近似算法，对于任意释放时间和零释放时间，其近似率分别达到 $6-frac{2}{m}$ 和 $5-frac{2}{m}$，其中 $m$ 代表网络核心数。此外，对于共流级调度问题，我们引入了一种近似算法，该算法在任意释放时间和零释放时间下的近似率分别达到了 $4m+1$ 和 $text{4}m$。本文介绍的算法可实际应用于数据中心，如谷歌或 Facebook 运营的数据中心。仿真结果表明，与之前的方法相比，我们的算法性能更优越，从而强调了其实用性。

{"title":"Efficient Approximation Algorithms for Scheduling Coflows With Total Weighted Completion Time in Identical Parallel Networks","authors":"Chi-Yeh Chen","doi":"10.1109/TCC.2023.3340729","DOIUrl":"https://doi.org/10.1109/TCC.2023.3340729","url":null,"abstract":"This article addresses the scheduling problem of coflows in identical parallel networks, a well-known \u0000<inline-formula><tex-math>$mathcal {NP}$</tex-math></inline-formula>\u0000-hard problem. We consider both flow-level scheduling and coflow-level scheduling problems. In the flow-level scheduling problem, flows within a coflow can be transmitted through different network cores, while in the coflow-level scheduling problem, flows within a coflow must be transmitted through the same network core. The key difference between these two problems lies in their scheduling granularity. Previous approaches relied on linear programming to solve the scheduling order. In this article, we enhance the efficiency of solving by utilizing the primal-dual method. For the flow-level scheduling problem, we propose an approximation algorithm that achieves approximation ratios of \u0000<inline-formula><tex-math>$6-frac{2}{m}$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$5-frac{2}{m}$</tex-math></inline-formula>\u0000 for arbitrary and zero release times, respectively, where \u0000<inline-formula><tex-math>$m$</tex-math></inline-formula>\u0000 represents the number of network cores. Additionally, for the coflow-level scheduling problem, we introduce an approximation algorithm that achieves approximation ratios of \u0000<inline-formula><tex-math>$4m+1$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$text{4}m$</tex-math></inline-formula>\u0000 for arbitrary and zero release times, respectively. The algorithm presented in this article has practical applications in data centers, such as those operated by Google or Facebook. The simulated results demonstrate the superior performance of our algorithms compared to previous approach, emphasizing their practical utility.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"116-129"},"PeriodicalIF":6.5,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10349921","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Cloud Data Lake Queries With a Balanced Coverage Plan 利用平衡覆盖计划优化云数据湖查询

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-12-05 DOI: 10.1109/TCC.2023.3339208

Grisha Weintraub;Ehud Gudes;Shlomi Dolev;Jeffrey D. Ullman

Cloud data lakes emerge as an inexpensive solution for storing very large amounts of data. The main idea is the separation of compute and storage layers. Thus, cheap cloud storage is used for storing the data, while compute engines are used for running analytics on this data in “on-demand” mode. However, to perform any computation on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. In this paper, we study different approaches to improve query performance in a data lake architecture. We define an optimization problem that can provably speed up data lake queries. We prove that the problem is NP-hard and suggest heuristic approaches. Then, we demonstrate through the experiments that our approach is feasible and efficient (up to ×30 query execution time improvement based on the TPC-H benchmark).

云数据湖是存储海量数据的一种廉价解决方案。其主要理念是将计算层和存储层分离。因此，廉价的云存储用于存储数据，而计算引擎则用于以 "按需 "模式在这些数据上运行分析。然而，在这种架构中，要对数据进行任何计算，每次计算都要通过网络将数据从存储层移动到计算层。这显然会损害计算性能，而且需要巨大的网络带宽。本文研究了在数据湖架构中提高查询性能的不同方法。我们定义了一个优化问题，可以证明它能加快数据湖查询的速度。我们证明了该问题的 NP 难度，并提出了启发式方法。然后，我们通过实验证明了我们的方法是可行且高效的（根据 TPC-H 基准，查询执行时间最多可提高 ×30）。

引用次数: 0

Integrated Computation Offloading, UAV Trajectory Control, Edge-Cloud and Radio Resource Allocation in SAGIN SAGIN 中的集成计算卸载、无人机轨迹控制、边缘云和无线电资源分配

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-12-05 DOI: 10.1109/TCC.2023.3339394

Minh Dat Nguyen;Long Bao Le;André Girard

In this article, we study the computation offloading problem in hybrid edge-cloud based space-air-ground integrated networks (SAGIN), where joint optimization of partial computation offloading, unmanned aerial vehicle (UAV) trajectory control, user scheduling, edge-cloud computation, radio resource allocation, and admission control is performed. Specifically, the considered SAGIN employs multiple UAV-mounted edge servers with controllable UAV trajectory and a cloud sever which can be reached by ground users (GUs) via multi-hop low-earth-orbit (LEO) satellite communications. This design aims to minimize the weighted energy consumption of the GUs and UAVs while satisfying the maximum delay constraints of underlying computation tasks. To tackle the underlying non-convex mixed integer non-linear optimization problem, we use the alternating optimization approach where we iteratively solve four sub-problems, namely user scheduling, partial offloading control and bit allocation over time slots, computation resource and bandwidth allocation, and multi-UAV trajectory control until convergence. Moreover, feasibility verification and admission control strategies are proposed to handle overloaded network scenarios. Furthermore, the successive convex approximation (SCA) method is employed to convexify and solve the non-convex computation resource and bandwidth allocation and UAV trajectory control sub-problems. Via extensive numerical studies, we illustrate the effectiveness of our proposed design compared to baselines.

本文研究了基于边缘-云的空-空-地混合集成网络（SAGIN）中的计算卸载问题，对部分计算卸载、无人机（UAV）轨迹控制、用户调度、边缘-云计算、无线电资源分配和准入控制进行了联合优化。具体来说，所考虑的 SAGIN 采用了多个安装有可控无人飞行器轨迹的无人飞行器边缘服务器，以及地面用户（GU）可通过多跳低地球轨道（LEO）卫星通信到达的云服务器。该设计旨在最大限度地减少地面用户和无人机的加权能耗，同时满足底层计算任务的最大延迟约束。为了解决基本的非凸混合整数非线性优化问题，我们采用了交替优化方法，反复求解四个子问题，即用户调度、部分卸载控制和时隙比特分配、计算资源和带宽分配以及多无人机轨迹控制，直至收敛。此外，还提出了可行性验证和接纳控制策略，以处理网络过载情况。此外，我们还采用了连续凸近似（SCA）方法来凸化和求解非凸计算资源和带宽分配以及无人机轨迹控制子问题。通过大量的数值研究，我们说明了与基线相比，我们提出的设计方案的有效性。

{"title":"Integrated Computation Offloading, UAV Trajectory Control, Edge-Cloud and Radio Resource Allocation in SAGIN","authors":"Minh Dat Nguyen;Long Bao Le;André Girard","doi":"10.1109/TCC.2023.3339394","DOIUrl":"https://doi.org/10.1109/TCC.2023.3339394","url":null,"abstract":"In this article, we study the computation offloading problem in hybrid edge-cloud based space-air-ground integrated networks (SAGIN), where joint optimization of partial computation offloading, unmanned aerial vehicle (UAV) trajectory control, user scheduling, edge-cloud computation, radio resource allocation, and admission control is performed. Specifically, the considered SAGIN employs multiple UAV-mounted edge servers with controllable UAV trajectory and a cloud sever which can be reached by ground users (GUs) via multi-hop low-earth-orbit (LEO) satellite communications. This design aims to minimize the weighted energy consumption of the GUs and UAVs while satisfying the maximum delay constraints of underlying computation tasks. To tackle the underlying non-convex mixed integer non-linear optimization problem, we use the alternating optimization approach where we iteratively solve four sub-problems, namely user scheduling, partial offloading control and bit allocation over time slots, computation resource and bandwidth allocation, and multi-UAV trajectory control until convergence. Moreover, feasibility verification and admission control strategies are proposed to handle overloaded network scenarios. Furthermore, the successive convex approximation (SCA) method is employed to convexify and solve the non-convex computation resource and bandwidth allocation and UAV trajectory control sub-problems. Via extensive numerical studies, we illustrate the effectiveness of our proposed design compared to baselines.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"100-115"},"PeriodicalIF":6.5,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Publicly Verifiable Outsourcing Matrix Computation Scheme Based on Smart Contracts 基于智能合约的可公开验证的外包矩阵计算方案

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing

Pub Date : 2023-11-30 DOI: 10.1109/TCC.2023.3337848

Hao Wang;Chunpeng Ge;Lu Zhou;Zhe Liu;Dongwan Lan;Xiaozhen Lu;Danni Jiang

Matrix computation is a crucial mathematical tool in scientific fields such as Artificial Intelligence and Cryptographic computation. However, it is difficult for resource-limited devices to execute large-scale matrix computations independently. Outsourcing matrix computation (OMC) is a promising solution that engages a cloud server to process complicated matrix computations for resource-limited devices. However, existing OMC schemes lack public verifiability, and thus resource-limited devices cannot verdict the correctness of the computing results. In this paper, for the first time, we propose a smart contract-based OMC scheme that publicly verifies the outsourcing matrix computation results. In our scheme, a smart contract running over the blockchain serves as a decentralized trusted third party to ensure the correctness of the matrix computation results. To overcome the Verifier's Dilemma in the blockchain, we present a blockchain-compatible matrix verification method that decreases the time complexity from

$O(n^{3})$

to

$O(n^{2})$

by utilizing a blinding method with the check digit and padding matrices. We make the verification become the form of comparing whether two results are identical rather than naive re-computing. Finally, we perform experiments on Ethereum and ARM Cortex-M4 and give in-depth analysis and performance evaluation, demonstrating our scheme's practicability and effectiveness.

矩阵计算是人工智能和密码计算等科学领域的重要数学工具。然而，资源有限的设备很难独立执行大规模矩阵计算。矩阵计算外包（OMC）是一种很有前景的解决方案，它利用云服务器为资源有限的设备处理复杂的矩阵计算。然而，现有的 OMC 方案缺乏公开可验证性，因此资源有限的设备无法确定计算结果的正确性。在本文中，我们首次提出了一种基于智能合约的 OMC 方案，可公开验证外包矩阵计算结果。在我们的方案中，运行在区块链上的智能合约作为去中心化的可信第三方，确保矩阵计算结果的正确性。为了克服区块链中的验证者困境，我们提出了一种与区块链兼容的矩阵验证方法，通过利用校验码和填充矩阵的盲法，将时间复杂度从$O(n^{3})$降低到$O(n^{2})$。我们将验证变成了比较两个结果是否相同的形式，而不是天真的重新计算。最后，我们在以太坊和 ARM Cortex-M4 上进行了实验，并给出了深入分析和性能评估，证明了我们方案的实用性和有效性。

{"title":"A Publicly Verifiable Outsourcing Matrix Computation Scheme Based on Smart Contracts","authors":"Hao Wang;Chunpeng Ge;Lu Zhou;Zhe Liu;Dongwan Lan;Xiaozhen Lu;Danni Jiang","doi":"10.1109/TCC.2023.3337848","DOIUrl":"https://doi.org/10.1109/TCC.2023.3337848","url":null,"abstract":"Matrix computation is a crucial mathematical tool in scientific fields such as Artificial Intelligence and Cryptographic computation. However, it is difficult for resource-limited devices to execute large-scale matrix computations independently. Outsourcing matrix computation (OMC) is a promising solution that engages a cloud server to process complicated matrix computations for resource-limited devices. However, existing OMC schemes lack public verifiability, and thus resource-limited devices cannot verdict the correctness of the computing results. In this paper, for the first time, we propose a smart contract-based OMC scheme that publicly verifies the outsourcing matrix computation results. In our scheme, a smart contract running over the blockchain serves as a decentralized trusted third party to ensure the correctness of the matrix computation results. To overcome the Verifier's Dilemma in the blockchain, we present a blockchain-compatible matrix verification method that decreases the time complexity from \u0000<inline-formula><tex-math>$O(n^{3})$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$O(n^{2})$</tex-math></inline-formula>\u0000 by utilizing a blinding method with the check digit and padding matrices. We make the verification become the form of comparing whether two results are identical rather than naive re-computing. Finally, we perform experiments on Ethereum and ARM Cortex-M4 and give in-depth analysis and performance evaluation, demonstrating our scheme's practicability and effectiveness.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 1","pages":"70-83"},"PeriodicalIF":6.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140063570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0