arXiv - CS - Distributed, Parallel, and Cluster Computing最新文献_第2页

HPC with Enhanced User Separation 增强用户隔离的高性能计算

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-16 DOI: arxiv-2409.10770

Andrew Prout, Albert Reuther, Michael Houle, Michael Jones, Peter Michaleas, LaToya Anderson, William Arcand, Bill Bergeron, David Bestor, Alex Bonn, Daniel Burrill, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Hayden Jananthan, Piotr Luszczek, Lauren Milechin, Guillermo Morales, Julie Mullen, Antonio Rosa, Charles Yee, Jeremy Kepner

HPC systems used for research run a wide variety of software and workflows.This software is often written or modified by users to meet the needs of theirresearch projects, and rarely is built with security in mind. In this paper weexplore several of the key techniques that MIT Lincoln LaboratorySupercomputing Center has deployed on its systems to manage the securityimplications of these workflows by providing enforced separation for processes,filesystem access, network traffic, and accelerators to make every user feellike they are running on a personal HPC.

用于研究的高性能计算系统运行着各种各样的软件和工作流程。这些软件通常由用户编写或修改，以满足其研究项目的需要，很少在构建时考虑到安全性。在本文中，我们将探讨麻省理工学院林肯实验室超级计算中心在其系统上部署的几项关键技术，通过对进程、文件系统访问、网络流量和加速器进行强制分离，管理这些工作流的安全影响，让每个用户都感觉自己是在个人 HPC 上运行。

引用次数: 0

Maintaining Distributed Data Structures in Dynamic Peer-to-Peer Networks 在动态点对点网络中维护分布式数据结构

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-16 DOI: arxiv-2409.10235

John Augustine, Antonio Cruciani, Iqra Altaf Gillani

We study robust and efficient distributed algorithms for building andmaintaining distributed data structures in dynamic Peer-to-Peer (P2P) networks.P2P networks are characterized by a high level of dynamicity with abrupt heavynode emph{churn} (nodes that join and leave the network continuously overtime). We present a novel algorithm that builds and maintains with highprobability a skip list for $poly(n)$ rounds despite $mathcal{O}(n/log n)$churn emph{per round} ($n$ is the stable network size). We assume that thechurn is controlled by an oblivious adversary (that has complete knowledge andcontrol of what nodes join and leave and at what time and has unlimitedcomputational power, but is oblivious to the random choices made by thealgorithm). Moreover, the maintenance overhead is proportional to the churnrate. Furthermore, the algorithm is scalable in that the messages are small(i.e., at most $polylog(n)$ bits) and every node sends and receives at most$polylog(n)$ messages per round. Our algorithm crucially relies on novel distributed and parallel algorithmsto merge two $n$-elements skip lists and delete a large subset of items, bothin $mathcal{O}(log n)$ rounds with high probability. These procedures may beof independent interest due to their elegance and potential applicability inother contexts in distributed data structures. To the best of our knowledge, our work provides the first-knownfully-distributed data structure that provably works under highly dynamicsettings (i.e., high churn rate). Furthermore, they are localized (i.e., do notrequire any global topological knowledge). Finally, we believe that ourframework can be generalized to other distributed and dynamic data structuresincluding graphs, potentially leading to stable distributed computation despiteheavy churn.

我们研究了在动态点对点（P2P）网络中构建和维护分布式数据结构的稳健而高效的分布式算法。P2P 网络的特点是具有高度的动态性，节点会突然大量流失（节点会不断地加入和退出网络）。我们提出了一种新颖的算法，该算法能在每轮$poly(n)$（$n$为稳定的网络规模）中以高概率建立并维护一个跳过列表，尽管每轮的churn为$mathcal{O}(n/log n)$（$n$为稳定的网络规模）。我们假设搅动是由一个遗忘对手控制的（它完全了解和控制什么节点在什么时间加入和离开，并拥有无限的计算能力，但对算法的随机选择视而不见）。此外，维护开销与流失率成正比。此外，该算法是可扩展的，因为信息量很小（即最多为 polylog(n)$ 位），每个节点每轮最多发送和接收 polylog(n)$ 信息。我们的算法主要依赖于新颖的分布式并行算法来合并两个 $n$ 元素的跳转列表和删除一个大的项目子集，这两个过程都以高概率在 $mathcal{O}(log n)$ 轮内完成。由于这些程序的优雅性和在分布式数据结构中其他情况下的潜在适用性，它们可能会引起人们的独立兴趣。据我们所知，我们的工作提供了第一个已知的分布式数据结构，它可以证明在高动态设置（即高流失率）下工作。此外，它们是本地化的（即不需要任何全局拓扑知识）。最后，我们相信我们的框架可以推广到包括图在内的其他分布式动态数据结构，从而有可能在高流失率的情况下实现稳定的分布式计算。

{"title":"Maintaining Distributed Data Structures in Dynamic Peer-to-Peer Networks","authors":"John Augustine, Antonio Cruciani, Iqra Altaf Gillani","doi":"arxiv-2409.10235","DOIUrl":"https://doi.org/arxiv-2409.10235","url":null,"abstract":"We study robust and efficient distributed algorithms for building and\u0000maintaining distributed data structures in dynamic Peer-to-Peer (P2P) networks.\u0000P2P networks are characterized by a high level of dynamicity with abrupt heavy\u0000node emph{churn} (nodes that join and leave the network continuously over\u0000time). We present a novel algorithm that builds and maintains with high\u0000probability a skip list for $poly(n)$ rounds despite $mathcal{O}(n/log n)$\u0000churn emph{per round} ($n$ is the stable network size). We assume that the\u0000churn is controlled by an oblivious adversary (that has complete knowledge and\u0000control of what nodes join and leave and at what time and has unlimited\u0000computational power, but is oblivious to the random choices made by the\u0000algorithm). Moreover, the maintenance overhead is proportional to the churn\u0000rate. Furthermore, the algorithm is scalable in that the messages are small\u0000(i.e., at most $polylog(n)$ bits) and every node sends and receives at most\u0000$polylog(n)$ messages per round. Our algorithm crucially relies on novel distributed and parallel algorithms\u0000to merge two $n$-elements skip lists and delete a large subset of items, both\u0000in $mathcal{O}(log n)$ rounds with high probability. These procedures may be\u0000of independent interest due to their elegance and potential applicability in\u0000other contexts in distributed data structures. To the best of our knowledge, our work provides the first-known\u0000fully-distributed data structure that provably works under highly dynamic\u0000settings (i.e., high churn rate). Furthermore, they are localized (i.e., do not\u0000require any global topological knowledge). Finally, we believe that our\u0000framework can be generalized to other distributed and dynamic data structures\u0000including graphs, potentially leading to stable distributed computation despite\u0000heavy churn.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated Learning in Adversarial Environments: Testbed Design and Poisoning Resilience in Cybersecurity 对抗环境中的联合学习：网络安全中的试验台设计和抗中毒能力

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-15 DOI: arxiv-2409.09794

Hao Jian Huang, Bekzod Iskandarov, Mizanur Rahman, Hakan T. Otal, M. Abdullah Canbaz

This paper presents the design and implementation of a Federated Learning(FL) testbed, focusing on its application in cybersecurity and evaluating itsresilience against poisoning attacks. Federated Learning allows multipleclients to collaboratively train a global model while keeping their datadecentralized, addressing critical needs for data privacy and security,particularly in sensitive fields like cybersecurity. Our testbed, built usingthe Flower framework, facilitates experimentation with various FL frameworks,assessing their performance, scalability, and ease of integration. Through acase study on federated intrusion detection systems, we demonstrate thetestbed's capabilities in detecting anomalies and securing criticalinfrastructure without exposing sensitive network data. Comprehensive poisoningtests, targeting both model and data integrity, evaluate the system'srobustness under adversarial conditions. Our results show that while federatedlearning enhances data privacy and distributed learning, it remains vulnerableto poisoning attacks, which must be mitigated to ensure its reliability inreal-world applications.

本文介绍了联合学习（FL）测试平台的设计与实施，重点关注其在网络安全中的应用，并评估其抵御中毒攻击的能力。Federated Learning 允许多个客户端协作训练一个全局模型，同时保持其数据的集中性，从而满足数据隐私和安全的关键需求，尤其是在网络安全等敏感领域。我们的测试平台采用 Flower 框架构建，便于对各种 FL 框架进行实验，评估它们的性能、可扩展性和易集成性。通过对联合入侵检测系统的案例研究，我们展示了测试平台在检测异常和保护关键基础设施安全方面的能力，而不会暴露敏感的网络数据。针对模型和数据完整性的全面中毒测试评估了系统在对抗条件下的稳健性。我们的研究结果表明，虽然联合学习增强了数据隐私和分布式学习，但它仍然容易受到中毒攻击，必须减轻这种攻击才能确保其在真实世界应用中的可靠性。

{"title":"Federated Learning in Adversarial Environments: Testbed Design and Poisoning Resilience in Cybersecurity","authors":"Hao Jian Huang, Bekzod Iskandarov, Mizanur Rahman, Hakan T. Otal, M. Abdullah Canbaz","doi":"arxiv-2409.09794","DOIUrl":"https://doi.org/arxiv-2409.09794","url":null,"abstract":"This paper presents the design and implementation of a Federated Learning\u0000(FL) testbed, focusing on its application in cybersecurity and evaluating its\u0000resilience against poisoning attacks. Federated Learning allows multiple\u0000clients to collaboratively train a global model while keeping their data\u0000decentralized, addressing critical needs for data privacy and security,\u0000particularly in sensitive fields like cybersecurity. Our testbed, built using\u0000the Flower framework, facilitates experimentation with various FL frameworks,\u0000assessing their performance, scalability, and ease of integration. Through a\u0000case study on federated intrusion detection systems, we demonstrate the\u0000testbed's capabilities in detecting anomalies and securing critical\u0000infrastructure without exposing sensitive network data. Comprehensive poisoning\u0000tests, targeting both model and data integrity, evaluate the system's\u0000robustness under adversarial conditions. Our results show that while federated\u0000learning enhances data privacy and distributed learning, it remains vulnerable\u0000to poisoning attacks, which must be mitigated to ensure its reliability in\u0000real-world applications.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Developing an Interactive OpenMP Programming Book with Large Language Models 利用大型语言模型开发交互式 OpenMP 编程书籍

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-14 DOI: arxiv-2409.09296

Xinyao Yi, Anjia Wang, Yonghong Yan, Chunhua Liao

This paper presents an approach to authoring a textbook titled InteractiveOpenMP Programming with the assistance of Large Language Models (LLMs). Thewriting process utilized state-of-the-art LLMs, including Gemini Pro 1.5,Claude 3, and ChatGPT-4, to generate the initial structure and outline of thebook, as well as the initial content for specific chapters. This contentincluded detailed descriptions of individual OpenMP constructs and practicalprogramming examples. The outline and content have then undergone extensivemanual revisions to meet our book goals. In this paper, we report our findingsabout the capabilities and limitations of these LLMs. We address criticalquestions concerning the necessity of textbook resources and the effectivenessof LLMs in creating fundamental and practical programming content. Our findingssuggest that while LLMs offer significant advantages in generating textbookcontent, they require careful integration with traditional educationalmethodologies to ensure depth, accuracy, and pedagogical effectiveness. TheInteractive OpenMP Programming book is developed with the framework of JupyterBook, enabling the execution of code within the book from the web browser,providing instant feedback and a dynamic learning experience that stands incontrast to traditional educational resources. The book represents asignificant step towards modernizing programming education, offering insightsinto practical strategies for generating the textbook through advanced AItools.

本文介绍了一种在大型语言模型（LLM）的帮助下编写教科书《InteractiveOpenMP Programming》的方法。编写过程使用了最先进的大型语言模型（包括 Gemini Pro 1.5、Claude 3 和 ChatGPT-4）来生成教科书的初始结构和大纲，以及特定章节的初始内容。这些内容包括各个 OpenMP 结构的详细说明和实际编程示例。随后，我们对大纲和内容进行了大量的修订，以实现本书的目标。在本文中，我们报告了关于这些 LLM 的能力和局限性的研究结果。我们探讨了有关教科书资源的必要性以及 LLM 在创建基础和实用编程内容方面的有效性的关键问题。我们的研究结果表明，虽然 LLM 在生成教科书内容方面具有显著优势，但它们需要与传统教育方法精心整合，以确保深度、准确性和教学效果。交互式 OpenMP 编程》一书是利用 JupyterBook 框架开发的，可以通过网络浏览器执行书中的代码，提供即时反馈和动态学习体验，与传统教育资源形成鲜明对比。本书是编程教育现代化的重要一步，深入探讨了通过先进的人工智能工具生成教科书的实用策略。

{"title":"Developing an Interactive OpenMP Programming Book with Large Language Models","authors":"Xinyao Yi, Anjia Wang, Yonghong Yan, Chunhua Liao","doi":"arxiv-2409.09296","DOIUrl":"https://doi.org/arxiv-2409.09296","url":null,"abstract":"This paper presents an approach to authoring a textbook titled Interactive\u0000OpenMP Programming with the assistance of Large Language Models (LLMs). The\u0000writing process utilized state-of-the-art LLMs, including Gemini Pro 1.5,\u0000Claude 3, and ChatGPT-4, to generate the initial structure and outline of the\u0000book, as well as the initial content for specific chapters. This content\u0000included detailed descriptions of individual OpenMP constructs and practical\u0000programming examples. The outline and content have then undergone extensive\u0000manual revisions to meet our book goals. In this paper, we report our findings\u0000about the capabilities and limitations of these LLMs. We address critical\u0000questions concerning the necessity of textbook resources and the effectiveness\u0000of LLMs in creating fundamental and practical programming content. Our findings\u0000suggest that while LLMs offer significant advantages in generating textbook\u0000content, they require careful integration with traditional educational\u0000methodologies to ensure depth, accuracy, and pedagogical effectiveness. The\u0000Interactive OpenMP Programming book is developed with the framework of Jupyter\u0000Book, enabling the execution of code within the book from the web browser,\u0000providing instant feedback and a dynamic learning experience that stands in\u0000contrast to traditional educational resources. The book represents a\u0000significant step towards modernizing programming education, offering insights\u0000into practical strategies for generating the textbook through advanced AI\u0000tools.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dynamic Weighting Strategy to Mitigate Worker Node Failure in Distributed Deep Learning 减轻分布式深度学习中工作节点故障的动态加权策略

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-14 DOI: arxiv-2409.09242

Yuesheng Xu, Arielle Carr

The increasing complexity of deep learning models and the demand forprocessing vast amounts of data make the utilization of large-scale distributedsystems for efficient training essential. These systems, however, facesignificant challenges such as communication overhead, hardware limitations,and node failure. This paper investigates various optimization techniques indistributed deep learning, including Elastic Averaging SGD (EASGD) and thesecond-order method AdaHessian. We propose a dynamic weighting strategy tomitigate the problem of straggler nodes due to failure, enhancing theperformance and efficiency of the overall training process. We conductexperiments with different numbers of workers and communication periods todemonstrate improved convergence rates and test performance using our strategy.

随着深度学习模型的复杂性不断增加，以及处理海量数据的需求，利用大规模分布式系统进行高效训练变得至关重要。然而，这些系统面临着通信开销、硬件限制和节点故障等重大挑战。本文研究了分布式深度学习中的各种优化技术，包括弹性平均 SGD（EASGD）和这些二阶方法 AdaHessian。我们提出了一种动态加权策略，以解决因失效而产生的滞后节点问题，从而提高整个训练过程的性能和效率。我们使用不同的工作者数量和通信周期进行了实验，以证明使用我们的策略可以提高收敛率和测试性能。

引用次数: 0

Weather Prediction Using CNN-LSTM for Time Series Analysis: A Case Study on Delhi Temperature Data 使用 CNN-LSTM 进行时间序列分析的天气预测：德里气温数据案例研究

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-14 DOI: arxiv-2409.09414

Bangyu Li, Yang Qian

As global climate change intensifies, accurate weather forecasting isincreasingly crucial for sectors such as agriculture, energy management, andenvironmental protection. Traditional methods, which rely on physical andstatistical models, often struggle with complex, nonlinear, and time-varyingdata, underscoring the need for more advanced techniques. This study explores ahybrid CNN-LSTM model to enhance temperature forecasting accuracy for the Delhiregion, using historical meteorological data from 1996 to 2017. We employedboth direct and indirect methods, including comprehensive data preprocessingand exploratory analysis, to construct and train our model. The CNN componenteffectively extracts spatial features, while the LSTM captures temporaldependencies, leading to improved prediction accuracy. Experimental resultsindicate that the CNN-LSTM model significantly outperforms traditionalforecasting methods in terms of both accuracy and stability, with a mean squareerror (MSE) of 3.26217 and a root mean square error (RMSE) of 1.80615. Thehybrid model demonstrates its potential as a robust tool for temperatureprediction, offering valuable insights for meteorological forecasting andrelated fields. Future research should focus on optimizing model architecture,exploring additional feature extraction techniques, and addressing challengessuch as overfitting and computational complexity. This approach not onlyadvances temperature forecasting but also provides a foundation for applyingdeep learning to other time series forecasting tasks.

随着全球气候变化的加剧，准确的天气预报对农业、能源管理和环境保护等部门越来越重要。传统方法依赖于物理和统计模型，往往难以应对复杂、非线性和时变数据，因此需要更先进的技术。本研究利用 1996 年至 2017 年的历史气象数据，探索了一种混合 CNN-LSTM 模型，以提高德尔希尔吉恩的气温预报精度。我们采用了直接和间接的方法，包括全面的数据预处理和探索性分析，来构建和训练我们的模型。CNN 组件有效地提取了空间特征，而 LSTM 则捕捉了时间依赖性，从而提高了预测精度。实验结果表明，CNN-LSTM 模型在准确性和稳定性方面都明显优于传统预测方法，其均方误差 (MSE) 为 3.26217，均方根误差 (RMSE) 为 1.80615。该混合模型证明了其作为一种稳健的气温预测工具的潜力，为气象预报及相关领域提供了有价值的见解。未来的研究应侧重于优化模型结构、探索更多特征提取技术以及解决诸如过拟合和计算复杂性等挑战。这种方法不仅有助于气温预测，还为将深度学习应用于其他时间序列预测任务奠定了基础。

{"title":"Weather Prediction Using CNN-LSTM for Time Series Analysis: A Case Study on Delhi Temperature Data","authors":"Bangyu Li, Yang Qian","doi":"arxiv-2409.09414","DOIUrl":"https://doi.org/arxiv-2409.09414","url":null,"abstract":"As global climate change intensifies, accurate weather forecasting is\u0000increasingly crucial for sectors such as agriculture, energy management, and\u0000environmental protection. Traditional methods, which rely on physical and\u0000statistical models, often struggle with complex, nonlinear, and time-varying\u0000data, underscoring the need for more advanced techniques. This study explores a\u0000hybrid CNN-LSTM model to enhance temperature forecasting accuracy for the Delhi\u0000region, using historical meteorological data from 1996 to 2017. We employed\u0000both direct and indirect methods, including comprehensive data preprocessing\u0000and exploratory analysis, to construct and train our model. The CNN component\u0000effectively extracts spatial features, while the LSTM captures temporal\u0000dependencies, leading to improved prediction accuracy. Experimental results\u0000indicate that the CNN-LSTM model significantly outperforms traditional\u0000forecasting methods in terms of both accuracy and stability, with a mean square\u0000error (MSE) of 3.26217 and a root mean square error (RMSE) of 1.80615. The\u0000hybrid model demonstrates its potential as a robust tool for temperature\u0000prediction, offering valuable insights for meteorological forecasting and\u0000related fields. Future research should focus on optimizing model architecture,\u0000exploring additional feature extraction techniques, and addressing challenges\u0000such as overfitting and computational complexity. This approach not only\u0000advances temperature forecasting but also provides a foundation for applying\u0000deep learning to other time series forecasting tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks 利用基础模型在资源受限的边缘网络中实现高效的联合学习

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-14 DOI: arxiv-2409.09273

S. Kawa Atapour, S. Jamal SeyedMohammadi, S. Mohammad Sheikholeslami, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi

Recently pre-trained Foundation Models (FMs) have been combined withFederated Learning (FL) to improve training of downstream tasks whilepreserving privacy. However, deploying FMs over edge networks withresource-constrained Internet of Things (IoT) devices is under-explored. Thispaper proposes a novel framework, namely, Federated Distilling knowledge toPrompt (FedD2P), for leveraging the robust representation abilities of avision-language FM without deploying it locally on edge devices. This frameworkdistills the aggregated knowledge of IoT devices to a prompt generator toefficiently adapt the frozen FM for downstream tasks. To eliminate thedependency on a public dataset, our framework leverages perclass localknowledge from IoT devices and linguistic descriptions of classes to train theprompt generator. Our experiments on diverse image classification datasetsCIFAR, OxfordPets, SVHN, EuroSAT, and DTD show that FedD2P outperforms thebaselines in terms of model performance.

最近，预训练基础模型（FM）与联合学习（FL）相结合，在保护隐私的同时改进了下游任务的训练。然而，在资源受限的物联网（IoT）设备的边缘网络上部署 FM 还没有得到充分探索。本文提出了一个新颖的框架，即 "知识到提示的联合分馏（FedD2P）"，用于利用视觉语言调频的强大表示能力，而无需在边缘设备上进行本地部署。该框架将物联网设备的聚合知识蒸馏到提示生成器中，以便有效地将冻结的调频适应下游任务。为了消除对公共数据集的依赖，我们的框架利用物联网设备的每类本地知识和类的语言描述来训练提示生成器。我们在各种图像分类数据集 CIFAR、OxfordPets、SVHN、EuroSAT 和 DTD 上进行的实验表明，FedD2P 在模型性能方面优于基准。

{"title":"Leveraging Foundation Models for Efficient Federated Learning in Resource-restricted Edge Networks","authors":"S. Kawa Atapour, S. Jamal SeyedMohammadi, S. Mohammad Sheikholeslami, Jamshid Abouei, Konstantinos N. Plataniotis, Arash Mohammadi","doi":"arxiv-2409.09273","DOIUrl":"https://doi.org/arxiv-2409.09273","url":null,"abstract":"Recently pre-trained Foundation Models (FMs) have been combined with\u0000Federated Learning (FL) to improve training of downstream tasks while\u0000preserving privacy. However, deploying FMs over edge networks with\u0000resource-constrained Internet of Things (IoT) devices is under-explored. This\u0000paper proposes a novel framework, namely, Federated Distilling knowledge to\u0000Prompt (FedD2P), for leveraging the robust representation abilities of a\u0000vision-language FM without deploying it locally on edge devices. This framework\u0000distills the aggregated knowledge of IoT devices to a prompt generator to\u0000efficiently adapt the frozen FM for downstream tasks. To eliminate the\u0000dependency on a public dataset, our framework leverages perclass local\u0000knowledge from IoT devices and linguistic descriptions of classes to train the\u0000prompt generator. Our experiments on diverse image classification datasets\u0000CIFAR, OxfordPets, SVHN, EuroSAT, and DTD show that FedD2P outperforms the\u0000baselines in terms of model performance.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection 利用动态模型选择探索系统异构联合学习

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-13 DOI: arxiv-2409.08858

Dixi Yao

Federated learning is a distributed learning paradigm in which multiplemobile clients train a global model while keeping data local. These mobileclients can have various available memory and network bandwidth. However, toachieve the best global model performance, how we can utilize available memoryand network bandwidth to the maximum remains an open challenge. In this paper,we propose to assign each client a subset of the global model, having differentlayers and channels on each layer. To realize that, we design a constrainedmodel search process with early stop to improve efficiency of finding themodels from such a very large space; and a data-free knowledge distillationmechanism to improve the global model performance when aggregating models ofsuch different structures. For fair and reproducible comparison betweendifferent solutions, we develop a new system, which can directly allocatedifferent memory and bandwidth to each client according to memory and bandwidthlogs collected on mobile devices. The evaluation shows that our solution canhave accuracy increase ranging from 2.43% to 15.81% and provide 5% to 40%more memory and bandwidth utilization with negligible extra running time,comparing to existing state-of-the-art system-heterogeneous federated learningmethods under different available memory and bandwidth, non-i.i.d.~datasets,image and text tasks.

联盟学习是一种分布式学习范式，其中多个移动客户端在训练全局模型的同时保留本地数据。这些移动客户端可以拥有不同的可用内存和网络带宽。然而，要想获得最佳的全局模型性能，如何最大限度地利用可用内存和网络带宽仍然是一个有待解决的难题。在本文中，我们建议为每个客户端分配一个全局模型子集，每个子集有不同的层和通道。为实现这一目标，我们设计了一个早期停止的受限模型搜索过程，以提高从如此巨大的空间中找到模型的效率；还设计了一个无数据知识提炼机制，以提高聚合此类不同结构的模型时的全局模型性能。为了公平、可重复地比较不同的解决方案，我们开发了一个新系统，它可以根据移动设备上收集的内存和带宽日志，直接为每个客户端分配不同的内存和带宽。评估结果表明，在不同的可用内存和带宽、非i.i.d.数据集、图像和文本任务条件下，与现有的一流系统--异构联合学习方法相比，我们的解决方案可以提高2.43%到15.81%的准确率，并提高5%到40%的内存和带宽利用率，而额外的运行时间几乎可以忽略不计。

{"title":"Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection","authors":"Dixi Yao","doi":"arxiv-2409.08858","DOIUrl":"https://doi.org/arxiv-2409.08858","url":null,"abstract":"Federated learning is a distributed learning paradigm in which multiple\u0000mobile clients train a global model while keeping data local. These mobile\u0000clients can have various available memory and network bandwidth. However, to\u0000achieve the best global model performance, how we can utilize available memory\u0000and network bandwidth to the maximum remains an open challenge. In this paper,\u0000we propose to assign each client a subset of the global model, having different\u0000layers and channels on each layer. To realize that, we design a constrained\u0000model search process with early stop to improve efficiency of finding the\u0000models from such a very large space; and a data-free knowledge distillation\u0000mechanism to improve the global model performance when aggregating models of\u0000such different structures. For fair and reproducible comparison between\u0000different solutions, we develop a new system, which can directly allocate\u0000different memory and bandwidth to each client according to memory and bandwidth\u0000logs collected on mobile devices. The evaluation shows that our solution can\u0000have accuracy increase ranging from 2.43% to 15.81% and provide 5% to 40%\u0000more memory and bandwidth utilization with negligible extra running time,\u0000comparing to existing state-of-the-art system-heterogeneous federated learning\u0000methods under different available memory and bandwidth, non-i.i.d.~datasets,\u0000image and text tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs 在 GPU 上精确计算修正贝塞尔函数的对数

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-13 DOI: arxiv-2409.08729

Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg

Bessel functions are critical in scientific computing for applications suchas machine learning, protein structure modeling, and robotics. However,currently, available routines lack precision or fail for certain input ranges,such as when the order $v$ is large, and GPU-specific implementations arelimited. We address the precision limitations of current numericalimplementations while dramatically improving the runtime. We propose two novelalgorithms for computing the logarithm of modified Bessel functions of thefirst and second kinds by computing intermediate values on a logarithmic scale.Our algorithms are robust and never have issues with underflows or overflowswhile having relative errors on the order of machine precision, even for inputswhere existing libraries fail. In C++/CUDA, our algorithms have median andmaximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU,respectively, over the ranges of inputs and third-party libraries tested.Compared to SciPy, the algorithms have median and maximum speedups of 77x and300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allowus to fit von Mises-Fisher, vMF, distributions to high-dimensional neuralnetwork features. This is, e.g., relevant for uncertainty quantification inmetric learning. We obtain image feature data by processing CIFAR10 trainingimages with the convolutional layers of a pre-trained ResNet50. We successfullyfit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature datausing our algorithms. Our approach provides fast and accurate results whileexisting implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fastopen-source implementation alongside this paper.

贝塞尔函数在机器学习、蛋白质结构建模和机器人等科学计算应用中至关重要。然而，目前可用的例程缺乏精度，或者在某些输入范围（如阶数 $v$ 较大时）失效，而且 GPU 特定的实现也受到限制。我们解决了当前数值实现的精度限制问题，同时显著改善了运行时间。我们提出了两种新颖的算法，通过在对数尺度上计算中间值来计算第一种和第二种修正贝塞尔函数的对数。我们的算法非常稳健，从未出现过下溢或溢出问题，同时具有机器精度数量级的相对误差，即使对于现有库失效的输入也是如此。在 C++/CUDA 中，我们的算法在所测试的输入和第三方库的范围内，GPU 的中位速度和最大速度分别提高了 45 倍和 6150 倍，CPU 的中位速度和最大速度分别提高了 17 倍和 3403 倍。稳健计算解决方案的能力和较低的相对误差使我们能够将 von Mises-Fisher（vMF）分布拟合到高维神经网络特征中。这与度量学习中的不确定性量化等相关。我们通过使用预先训练好的 ResNet50 的卷积层处理 CIFAR10 训练图像来获取图像特征数据。我们利用算法成功地将 vMF 分布拟合到 2048、8192 和 32768 维图像特征数据中。我们的方法提供了快速而准确的结果，而现有的 SciPy 和 mpmath 实现却无法成功拟合。我们的方法很容易在 GPU 上实现，我们在本文中还提供了一个快速的开源实现。

{"title":"Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs","authors":"Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg","doi":"arxiv-2409.08729","DOIUrl":"https://doi.org/arxiv-2409.08729","url":null,"abstract":"Bessel functions are critical in scientific computing for applications such\u0000as machine learning, protein structure modeling, and robotics. However,\u0000currently, available routines lack precision or fail for certain input ranges,\u0000such as when the order $v$ is large, and GPU-specific implementations are\u0000limited. We address the precision limitations of current numerical\u0000implementations while dramatically improving the runtime. We propose two novel\u0000algorithms for computing the logarithm of modified Bessel functions of the\u0000first and second kinds by computing intermediate values on a logarithmic scale.\u0000Our algorithms are robust and never have issues with underflows or overflows\u0000while having relative errors on the order of machine precision, even for inputs\u0000where existing libraries fail. In C++/CUDA, our algorithms have median and\u0000maximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU,\u0000respectively, over the ranges of inputs and third-party libraries tested.\u0000Compared to SciPy, the algorithms have median and maximum speedups of 77x and\u0000300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allow\u0000us to fit von Mises-Fisher, vMF, distributions to high-dimensional neural\u0000network features. This is, e.g., relevant for uncertainty quantification in\u0000metric learning. We obtain image feature data by processing CIFAR10 training\u0000images with the convolutional layers of a pre-trained ResNet50. We successfully\u0000fit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data\u0000using our algorithms. Our approach provides fast and accurate results while\u0000existing implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast\u0000open-source implementation alongside this paper.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WarmSwap: Sharing Dependencies for Accelerating Cold Starts in Serverless Functions WarmSwap：共享依赖关系以加速无服务器功能的冷启动

arXiv - CS - Distributed, Parallel, and Cluster Computing

Pub Date : 2024-09-13 DOI: arxiv-2409.09202

Rui Li, Devesh Tiwari, Gene Cooperman

This work presents WarmSwap, a novel provider-side cold-start optimizationfor serverless computing. This optimization reduces cold-start time whenbooting and loading dependencies at runtime inside a function container.Previous approaches to the optimization of cold starts tend to fall into twocategories: optimizing the infrastructure of serverless computing to benefitall serverless functions; or function-specific tuning for individual serverlessfunctions. In contrast, WarmSwap offers a broad middle ground, which optimizesentire categories of serverless functions. WarmSwap eliminates the need toinitialize middleware or software dependencies when launching a new serverlesscontainer, by migrating a pre-initialized live dependency image to the newfunction instance. WarmSwap respects the provider's cache constraints, as asingle pre-warmed dependency image in the cache is shared among all serverlessfunctions requiring that software dependency image. WarmSwap has been tested onseven representative functions from FunctionBench. The functions are chosen tocompare with previous work. In those tests, WarmSwap accelerates cold-startexecutions for those serverless functions with large dependency requirements bya factor ranging from 1.2 to 2.2.

本研究提出了一种针对无服务器计算的新型提供商侧冷启动优化方法--WarmSwap。以前的冷启动优化方法往往分为两类：优化无服务器计算的基础设施，使所有无服务器功能受益；或针对单个无服务器功能进行特定功能的调整。相比之下，WarmSwap提供了一个广阔的中间地带，可以优化整个类别的无服务器功能。在启动新的无服务器容器时，WarmSwap 通过将预先初始化的实时依赖关系映像迁移到新的功能实例，消除了初始化中间件或软件依赖关系的需要。WarmSwap 尊重提供商的缓存限制，缓存中的单个预热依赖映像由所有需要该软件依赖映像的无服务器功能共享。WarmSwap 在 FunctionBench 的七个代表性函数上进行了测试。选择这些函数是为了与以前的工作进行比较。在这些测试中，WarmSwap 加速了那些需要大量依赖性的无服务器函数的冷启动执行，加速度从 1.2 到 2.2 不等。

{"title":"WarmSwap: Sharing Dependencies for Accelerating Cold Starts in Serverless Functions","authors":"Rui Li, Devesh Tiwari, Gene Cooperman","doi":"arxiv-2409.09202","DOIUrl":"https://doi.org/arxiv-2409.09202","url":null,"abstract":"This work presents WarmSwap, a novel provider-side cold-start optimization\u0000for serverless computing. This optimization reduces cold-start time when\u0000booting and loading dependencies at runtime inside a function container.\u0000Previous approaches to the optimization of cold starts tend to fall into two\u0000categories: optimizing the infrastructure of serverless computing to benefit\u0000all serverless functions; or function-specific tuning for individual serverless\u0000functions. In contrast, WarmSwap offers a broad middle ground, which optimizes\u0000entire categories of serverless functions. WarmSwap eliminates the need to\u0000initialize middleware or software dependencies when launching a new serverless\u0000container, by migrating a pre-initialized live dependency image to the new\u0000function instance. WarmSwap respects the provider's cache constraints, as a\u0000single pre-warmed dependency image in the cache is shared among all serverless\u0000functions requiring that software dependency image. WarmSwap has been tested on\u0000seven representative functions from FunctionBench. The functions are chosen to\u0000compare with previous work. In those tests, WarmSwap accelerates cold-start\u0000executions for those serverless functions with large dependency requirements by\u0000a factor ranging from 1.2 to 2.2.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0