首页 > 最新文献

arXiv - CS - Performance最新文献

英文 中文
Comment on paper: Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems 对文件发表评论:立场:反思解决大规模旅行推销员问题的基于事后搜索的神经方法
Pub Date : 2024-06-11 DOI: arxiv-2406.09441
Yimeng Min
We identify two major issues in the SoftDist paper (Xia et al.): (1) thefailure to run all steps of different baselines on the same hardwareenvironment, and (2) the use of inconsistent time measurements when comparingto other baselines. These issues lead to flawed conclusions. When all steps areexecuted in the same hardware environment, the primary claim made in SoftDistis no longer supported.
我们在 SoftDist 论文(Xia 等人)中发现了两个主要问题:(1) 未能在相同的硬件环境下运行不同基线的所有步骤;(2) 在与其他基线比较时使用了不一致的时间测量。这些问题导致了错误的结论。当所有步骤都在相同的硬件环境中运行时,SoftDist 提出的主要主张就不再成立了。
{"title":"Comment on paper: Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems","authors":"Yimeng Min","doi":"arxiv-2406.09441","DOIUrl":"https://doi.org/arxiv-2406.09441","url":null,"abstract":"We identify two major issues in the SoftDist paper (Xia et al.): (1) the\u0000failure to run all steps of different baselines on the same hardware\u0000environment, and (2) the use of inconsistent time measurements when comparing\u0000to other baselines. These issues lead to flawed conclusions. When all steps are\u0000executed in the same hardware environment, the primary claim made in SoftDist\u0000is no longer supported.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141516735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis" 关于 "具有差异隐私的联合学习:算法和性能分析"
Pub Date : 2024-06-09 DOI: arxiv-2406.05858
Mahtab Talaei, Iman Izadi
In the paper by Wei et al. ("Federated Learning with Differential Privacy:Algorithms and Performance Analysis"), the convergence performance of theproposed differential privacy algorithm in federated learning (FL), known asNoising before Model Aggregation FL (NbAFL), was studied. However, thepresented convergence upper bound of NbAFL (Theorem 2) is incorrect. Thiscomment aims to present the correct form of the convergence upper bound forNbAFL.
在 Wei 等人的论文(《具有差分隐私的联合学习:算法与性能分析》)中,研究了联合学习(FL)中的差分隐私算法(即模型聚合前噪声联合学习算法(NbAFL))的收敛性能。然而,NbAFL 的收敛上限(定理 2)并不正确。本评论旨在提出 NbAFL 收敛上界的正确形式。
{"title":"Comments on \"Federated Learning with Differential Privacy: Algorithms and Performance Analysis\"","authors":"Mahtab Talaei, Iman Izadi","doi":"arxiv-2406.05858","DOIUrl":"https://doi.org/arxiv-2406.05858","url":null,"abstract":"In the paper by Wei et al. (\"Federated Learning with Differential Privacy:\u0000Algorithms and Performance Analysis\"), the convergence performance of the\u0000proposed differential privacy algorithm in federated learning (FL), known as\u0000Noising before Model Aggregation FL (NbAFL), was studied. However, the\u0000presented convergence upper bound of NbAFL (Theorem 2) is incorrect. This\u0000comment aims to present the correct form of the convergence upper bound for\u0000NbAFL.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead QJL:用于 KV 缓存量化的 1 位量化 JL 变换,零开销
Pub Date : 2024-06-05 DOI: arxiv-2406.03482
Amir Zandieh, Majid Daliri, Insu Han
Serving LLMs requires substantial memory due to the storage requirements ofKey-Value (KV) embeddings in the KV cache, which grows with sequence length. Aneffective approach to compress KV cache is quantization. However, traditionalquantization methods face significant memory overhead due to the need to storequantization constants (at least a zero point and a scale) in full precisionper data block. Depending on the block size, this overhead can add 1 or 2 bitsper quantized number. We introduce QJL, a new quantization approach thatconsists of a Johnson-Lindenstrauss (JL) transform followed by sign-bitquantization. In contrast to existing methods, QJL eliminates memory overheadsby removing the need for storing quantization constants. We propose anasymmetric estimator for the inner product of two vectors and demonstrate thatapplying QJL to one vector and a standard JL transform without quantization tothe other provides an unbiased estimator with minimal distortion. We havedeveloped an efficient implementation of the QJL sketch and its correspondinginner product estimator, incorporating a lightweight CUDA kernel for optimizedcomputation. When applied across various LLMs and NLP tasks to quantize the KVcache to only 3 bits, QJL demonstrates a more than fivefold reduction in KVcache memory usage without compromising accuracy, all while achieving fasterruntime. Codes are available at url{https://github.com/amirzandieh/QJL}.
为 LLM 服务需要大量内存,这是因为 KV 缓存中的键值(KV)嵌入需要存储,而 KV 缓存随着序列长度的增加而增加。压缩 KV 缓存的一种有效方法是量化。然而,传统的量化方法需要在每个数据块中以全精度存储量化常数(至少一个零点和一个刻度),因此内存开销很大。根据数据块的大小,这种开销会使每个量化数字增加 1 或 2 个比特。我们引入了 QJL,这是一种新的量化方法,包括约翰逊-林登斯特劳斯(JL)变换和符号位量化。与现有方法相比,QJL 无需存储量化常数,从而消除了内存开销。我们为两个向量的内积提出了一个非对称估计器,并证明对一个向量应用 QJL,对另一个向量应用标准 JL 变换而不进行量化,可以提供一个失真最小的无偏估计器。我们开发了一种 QJL 草图及其相应内积估计器的高效实现方法,其中包含一个用于优化计算的轻量级 CUDA 内核。当在各种 LLM 和 NLP 任务中应用 QJL 将 KVcache 量化到仅 3 位时,QJL 显示 KVcache 内存使用量减少了五倍多,而精度却没有受到影响,同时还实现了更快的运行时间。代码可在(url{https://github.com/amirzandieh/QJL}.
{"title":"QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead","authors":"Amir Zandieh, Majid Daliri, Insu Han","doi":"arxiv-2406.03482","DOIUrl":"https://doi.org/arxiv-2406.03482","url":null,"abstract":"Serving LLMs requires substantial memory due to the storage requirements of\u0000Key-Value (KV) embeddings in the KV cache, which grows with sequence length. An\u0000effective approach to compress KV cache is quantization. However, traditional\u0000quantization methods face significant memory overhead due to the need to store\u0000quantization constants (at least a zero point and a scale) in full precision\u0000per data block. Depending on the block size, this overhead can add 1 or 2 bits\u0000per quantized number. We introduce QJL, a new quantization approach that\u0000consists of a Johnson-Lindenstrauss (JL) transform followed by sign-bit\u0000quantization. In contrast to existing methods, QJL eliminates memory overheads\u0000by removing the need for storing quantization constants. We propose an\u0000asymmetric estimator for the inner product of two vectors and demonstrate that\u0000applying QJL to one vector and a standard JL transform without quantization to\u0000the other provides an unbiased estimator with minimal distortion. We have\u0000developed an efficient implementation of the QJL sketch and its corresponding\u0000inner product estimator, incorporating a lightweight CUDA kernel for optimized\u0000computation. When applied across various LLMs and NLP tasks to quantize the KV\u0000cache to only 3 bits, QJL demonstrates a more than fivefold reduction in KV\u0000cache memory usage without compromising accuracy, all while achieving faster\u0000runtime. Codes are available at url{https://github.com/amirzandieh/QJL}.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observations 生成式人工智能(大型语言模型)对 PRA 模型构建和维护的影响,观察结果
Pub Date : 2024-06-03 DOI: arxiv-2406.01133
Valentin RychkovEDF R&D, Claudia PicocoEDF R&D, Emilie CalecaEDF R&D
The rapid development of Large Language Models (LLMs) and GenerativePre-Trained Transformers(GPTs) in the field of Generative ArtificialIntelligence (AI) can significantly impact task automation in themoderneconomy. We anticipate that the PRA field will inevitably be affected by thistechnology1. Thus, themain goal of this paper is to engage the risk assessmentcommunity into a discussion of benefits anddrawbacks of this technology forPRA. We make a preliminary analysis of possible application of LLMinProbabilistic Risk Assessment (PRA) modeling context referring to the ongoingexperience in softwareengineering field. We explore potential applicationscenarios and the necessary conditions for controlledLLM usage in PRA modeling(whether static or dynamic). Additionally, we consider the potential impactofthis technology on PRA modeling tools.
在生成式人工智能(AI)领域,大型语言模型(LLM)和生成式预训练变换器(GPT)的快速发展会对现代经济中的任务自动化产生重大影响。我们预计,PRA 领域将不可避免地受到这项技术的影响1。因此,本文的主要目标是让风险评估社区参与讨论该技术对 PRA 的利弊。我们参考软件工程领域的现有经验,初步分析了 LLM 在概率风险评估(PRA)建模中的可能应用。我们探讨了潜在的应用场景,以及在 PRA 建模(无论是静态还是动态)中使用受控LLM 的必要条件。此外,我们还考虑了该技术对 PRA 建模工具的潜在影响。
{"title":"Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observations","authors":"Valentin RychkovEDF R&D, Claudia PicocoEDF R&D, Emilie CalecaEDF R&D","doi":"arxiv-2406.01133","DOIUrl":"https://doi.org/arxiv-2406.01133","url":null,"abstract":"The rapid development of Large Language Models (LLMs) and Generative\u0000Pre-Trained Transformers(GPTs) in the field of Generative Artificial\u0000Intelligence (AI) can significantly impact task automation in themodern\u0000economy. We anticipate that the PRA field will inevitably be affected by this\u0000technology1. Thus, themain goal of this paper is to engage the risk assessment\u0000community into a discussion of benefits anddrawbacks of this technology for\u0000PRA. We make a preliminary analysis of possible application of LLM\u0000inProbabilistic Risk Assessment (PRA) modeling context referring to the ongoing\u0000experience in softwareengineering field. We explore potential application\u0000scenarios and the necessary conditions for controlledLLM usage in PRA modeling\u0000(whether static or dynamic). Additionally, we consider the potential impact\u0000ofthis technology on PRA modeling tools.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141258839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ranking with Ties based on Noisy Performance Data 基于噪声性能数据的并列排名
Pub Date : 2024-05-28 DOI: arxiv-2405.18259
Aravind Sankaran, Lars Karlsson, Paolo Bientinesi
We consider the problem of ranking a set of objects based on theirperformance when the measurement of said performance is subject to noise. Inthis scenario, the performance is measured repeatedly, resulting in a range ofmeasurements for each object. If the ranges of two objects do not overlap, thenwe consider one object as 'better' than the other, and we expect it to receivea higher rank; if, however, the ranges overlap, then the objects areincomparable, and we wish them to be assigned the same rank. Unfortunately, theincomparability relation of ranges is in general not transitive; as aconsequence, in general the two requirements cannot be satisfiedsimultaneously, i.e., it is not possible to guarantee both distinct ranks forobjects with separated ranges, and same rank for objects with overlappingranges. This conflict leads to more than one reasonable way to rank a set ofobjects. In this paper, we explore the ambiguities that arise when ranking withties, and define a set of reasonable rankings, which we call partial rankings.We develop and analyse three different methodologies to compute a partialranking. Finally, we show how performance differences among objects can beinvestigated with the help of partial ranking.
我们考虑的问题是,在对一组对象的性能进行测量时,如果测量结果受到噪声影响,如何根据这些对象的性能对其进行排序。在这种情况下,性能会被反复测量,从而得出每个对象的测量范围。如果两个对象的测量范围没有重叠,那么我们认为其中一个对象比另一个 "更好",我们希望它获得更高的排名;但是,如果两个对象的测量范围重叠,那么它们就是不可比的,我们希望它们获得相同的排名。不幸的是,范围的不可比关系一般不具有传递性;因此,一般来说,这两个要求不能同时满足,也就是说,不可能同时保证范围分开的对象有不同的等级,而范围重叠的对象有相同的等级。这种冲突导致对一组对象进行排序的合理方法不止一种。在本文中,我们探讨了有属性排序时出现的歧义,并定义了一组合理的排序,我们称之为部分排序。我们开发并分析了三种不同的方法来计算部分排序。最后,我们展示了如何借助部分排序来研究对象之间的性能差异。
{"title":"Ranking with Ties based on Noisy Performance Data","authors":"Aravind Sankaran, Lars Karlsson, Paolo Bientinesi","doi":"arxiv-2405.18259","DOIUrl":"https://doi.org/arxiv-2405.18259","url":null,"abstract":"We consider the problem of ranking a set of objects based on their\u0000performance when the measurement of said performance is subject to noise. In\u0000this scenario, the performance is measured repeatedly, resulting in a range of\u0000measurements for each object. If the ranges of two objects do not overlap, then\u0000we consider one object as 'better' than the other, and we expect it to receive\u0000a higher rank; if, however, the ranges overlap, then the objects are\u0000incomparable, and we wish them to be assigned the same rank. Unfortunately, the\u0000incomparability relation of ranges is in general not transitive; as a\u0000consequence, in general the two requirements cannot be satisfied\u0000simultaneously, i.e., it is not possible to guarantee both distinct ranks for\u0000objects with separated ranges, and same rank for objects with overlapping\u0000ranges. This conflict leads to more than one reasonable way to rank a set of\u0000objects. In this paper, we explore the ambiguities that arise when ranking with\u0000ties, and define a set of reasonable rankings, which we call partial rankings.\u0000We develop and analyse three different methodologies to compute a partial\u0000ranking. Finally, we show how performance differences among objects can be\u0000investigated with the help of partial ranking.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Analysis of Performance Bottlenecks in MRI Pre-Processing 核磁共振成像预处理性能瓶颈分析
Pub Date : 2024-05-27 DOI: arxiv-2405.17650
Mathieu Dugré, Yohan Chatelain, Tristan Glatard
Magnetic Resonance Image (MRI) pre-processing is a critical step forneuroimaging analysis. However, the computational cost of MRI pre-processingpipelines is a major bottleneck for large cohort studies and some clinicalapplications. While High-Performance Computing (HPC) and, more recently, DeepLearning have been adopted to accelerate the computations, these techniquesrequire costly hardware and are not accessible to all researchers. Therefore,it is important to understand the performance bottlenecks of MRI pre-processingpipelines to improve their performance. Using Intel VTune profiler, wecharacterized the bottlenecks of several commonly used MRI-preprocessingpipelines from the ANTs, FSL, and FreeSurfer toolboxes. We found that fewfunctions contributed to most of the CPU time, and that linear interpolationwas the largest contributor. Data access was also a substantial bottleneck. Weidentified a bug in the ITK library that impacts the performance of ANTspipeline in single-precision and a potential issue with the OpenMP scaling inFreeSurfer recon-all. Our results provide a reference for future efforts tooptimize MRI pre-processing pipelines.
磁共振成像(MRI)预处理是神经成像分析的关键步骤。然而,磁共振成像预处理管道的计算成本是大型队列研究和一些临床应用的主要瓶颈。虽然高性能计算(HPC)和最近的深度学习(DeepLearning)已被采用来加速计算,但这些技术需要昂贵的硬件,并非所有研究人员都能使用。因此,了解磁共振成像预处理管道的性能瓶颈以提高其性能非常重要。利用英特尔 VTune 分析器,我们分析了 ANTs、FSL 和 FreeSurfer 工具箱中几种常用磁共振成像预处理管道的瓶颈。我们发现,少数几个函数占用了大部分 CPU 时间,而线性插值是最大的贡献者。数据访问也是一个很大的瓶颈。我们在 ITK 库中发现了一个影响 ANTspipeline 单精度性能的错误,并发现了 FreeSurfer recon-all 中 OpenMP 扩展的潜在问题。我们的研究结果为今后优化磁共振成像预处理管道提供了参考。
{"title":"An Analysis of Performance Bottlenecks in MRI Pre-Processing","authors":"Mathieu Dugré, Yohan Chatelain, Tristan Glatard","doi":"arxiv-2405.17650","DOIUrl":"https://doi.org/arxiv-2405.17650","url":null,"abstract":"Magnetic Resonance Image (MRI) pre-processing is a critical step for\u0000neuroimaging analysis. However, the computational cost of MRI pre-processing\u0000pipelines is a major bottleneck for large cohort studies and some clinical\u0000applications. While High-Performance Computing (HPC) and, more recently, Deep\u0000Learning have been adopted to accelerate the computations, these techniques\u0000require costly hardware and are not accessible to all researchers. Therefore,\u0000it is important to understand the performance bottlenecks of MRI pre-processing\u0000pipelines to improve their performance. Using Intel VTune profiler, we\u0000characterized the bottlenecks of several commonly used MRI-preprocessing\u0000pipelines from the ANTs, FSL, and FreeSurfer toolboxes. We found that few\u0000functions contributed to most of the CPU time, and that linear interpolation\u0000was the largest contributor. Data access was also a substantial bottleneck. We\u0000identified a bug in the ITK library that impacts the performance of ANTs\u0000pipeline in single-precision and a potential issue with the OpenMP scaling in\u0000FreeSurfer recon-all. Our results provide a reference for future efforts to\u0000optimize MRI pre-processing pipelines.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"129 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Resource-Efficient Crater Detectors on Embedded Systems 评估嵌入式系统上的资源节约型弹坑探测器
Pub Date : 2024-05-27 DOI: arxiv-2405.16953
Simon Vellas, Bill Psomas, Kalliopi Karadima, Dimitrios Danopoulos, Alexandros Paterakis, George Lentaris, Dimitrios Soudris, Konstantinos Karantzalos
Real-time analysis of Martian craters is crucial for mission-criticaloperations, including safe landings and geological exploration. This workleverages the latest breakthroughs for on-the-edge crater detection aboardspacecraft. We rigorously benchmark several YOLO networks using a Mars cratersdataset, analyzing their performance on embedded systems with a focus onoptimization for low-power devices. We optimize this process for a new wave ofcost-effective, commercial-off-the-shelf-based smaller satellites.Implementations on diverse platforms, including Google Coral Edge TPU, AMDVersal SoC VCK190, Nvidia Jetson Nano and Jetson AGX Orin, undergo a detailedtrade-off analysis. Our findings identify optimal network-device pairings,enhancing the feasibility of crater detection on resource-constrained hardwareand setting a new precedent for efficient and resilient extraterrestrialimaging. Code at: https://github.com/billpsomas/mars_crater_detection.
对火星陨石坑的实时分析对于包括安全着陆和地质勘探在内的关键任务操作至关重要。这项工作利用了最新的突破,在航天器上进行陨石坑边缘探测。我们使用火星陨石坑数据集对几个 YOLO 网络进行了严格的基准测试,分析了它们在嵌入式系统上的性能,重点是针对低功耗设备进行优化。我们针对新一轮低成本、基于商用现货的小型卫星对这一过程进行了优化。我们对不同平台(包括 Google Coral Edge TPU、AMDVersal SoC VCK190、Nvidia Jetson Nano 和 Jetson AGX Orin)上的实现进行了详细的权衡分析。我们的研究结果确定了最佳的网络-设备配对,提高了在资源有限的硬件上进行陨石坑探测的可行性,为高效、弹性的地外成像开创了新的先例。代码见:https://github.com/billpsomas/mars_crater_detection。
{"title":"Evaluation of Resource-Efficient Crater Detectors on Embedded Systems","authors":"Simon Vellas, Bill Psomas, Kalliopi Karadima, Dimitrios Danopoulos, Alexandros Paterakis, George Lentaris, Dimitrios Soudris, Konstantinos Karantzalos","doi":"arxiv-2405.16953","DOIUrl":"https://doi.org/arxiv-2405.16953","url":null,"abstract":"Real-time analysis of Martian craters is crucial for mission-critical\u0000operations, including safe landings and geological exploration. This work\u0000leverages the latest breakthroughs for on-the-edge crater detection aboard\u0000spacecraft. We rigorously benchmark several YOLO networks using a Mars craters\u0000dataset, analyzing their performance on embedded systems with a focus on\u0000optimization for low-power devices. We optimize this process for a new wave of\u0000cost-effective, commercial-off-the-shelf-based smaller satellites.\u0000Implementations on diverse platforms, including Google Coral Edge TPU, AMD\u0000Versal SoC VCK190, Nvidia Jetson Nano and Jetson AGX Orin, undergo a detailed\u0000trade-off analysis. Our findings identify optimal network-device pairings,\u0000enhancing the feasibility of crater detection on resource-constrained hardware\u0000and setting a new precedent for efficient and resilient extraterrestrial\u0000imaging. Code at: https://github.com/billpsomas/mars_crater_detection.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AmBC-NOMA-Aided Short-Packet Communication for High Mobility V2X Transmissions 用于高移动性 V2X 传输的 AmBC-NOMA 辅助短包通信
Pub Date : 2024-05-26 DOI: arxiv-2405.16502
Xinyue Pei, Xingwei Wang, Yingyang Chen, Tingrui Pei, Miaowen Wen
In this paper, we investigate the performance of ambient backscattercommunication non-orthogonal multiple access (AmBC-NOMA)-assisted short packetcommunication for high-mobility vehicle-to-everything transmissions. In theproposed system, a roadside unit (RSU) transmits a superimposed signal to atypical NOMA user pair. Simultaneously, the backscatter device (BD) transmitsits own signal towards the user pair by reflecting and modulating the RSU'ssuperimposed signals. Due to vehicles' mobility, we consider realisticassumptions of time-selective fading and channel estimation errors. Theoreticalexpressions for the average block error rates (BLERs) of both users arederived. Furthermore, analysis and insights on transmit signal-to-noise ratio,vehicles' mobility, imperfect channel estimation, the reflection efficiency atthe BD, and blocklength are provided. Numerical results validate thetheoretical findings and reveal that the AmBC-NOMA system outperforms itsorthogonal multiple access counterpart in terms of BLER performance.
本文研究了环境反向散射通信非正交多址(AmBC-NOMA)辅助短分组通信在高移动性车对车传输中的性能。在拟议的系统中,路边装置(RSU)向非典型 NOMA 用户对发射叠加信号。与此同时,反向散射设备(BD)通过反射和调制 RSU 的叠加信号,向用户对发射自己的信号。由于车辆的移动性,我们考虑了时间选择性衰落和信道估计误差的现实假设。我们得出了两个用户的平均块误码率(BLER)的理论表达式。此外,还提供了对发射信噪比、车辆的移动性、不完善的信道估计、BD 的反射效率和块长度的分析和见解。数值结果验证了理论结论,并表明 AmBC-NOMA 系统的误码率性能优于其对应的正交多址系统。
{"title":"AmBC-NOMA-Aided Short-Packet Communication for High Mobility V2X Transmissions","authors":"Xinyue Pei, Xingwei Wang, Yingyang Chen, Tingrui Pei, Miaowen Wen","doi":"arxiv-2405.16502","DOIUrl":"https://doi.org/arxiv-2405.16502","url":null,"abstract":"In this paper, we investigate the performance of ambient backscatter\u0000communication non-orthogonal multiple access (AmBC-NOMA)-assisted short packet\u0000communication for high-mobility vehicle-to-everything transmissions. In the\u0000proposed system, a roadside unit (RSU) transmits a superimposed signal to a\u0000typical NOMA user pair. Simultaneously, the backscatter device (BD) transmits\u0000its own signal towards the user pair by reflecting and modulating the RSU's\u0000superimposed signals. Due to vehicles' mobility, we consider realistic\u0000assumptions of time-selective fading and channel estimation errors. Theoretical\u0000expressions for the average block error rates (BLERs) of both users are\u0000derived. Furthermore, analysis and insights on transmit signal-to-noise ratio,\u0000vehicles' mobility, imperfect channel estimation, the reflection efficiency at\u0000the BD, and blocklength are provided. Numerical results validate the\u0000theoretical findings and reveal that the AmBC-NOMA system outperforms its\u0000orthogonal multiple access counterpart in terms of BLER performance.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph neural networks with configuration cross-attention for tensor compilers 为张量编译器配置交叉注意的图神经网络
Pub Date : 2024-05-26 DOI: arxiv-2405.16623
Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh
With the recent popularity of neural networks comes the need for efficientserving of inference workloads. A neural network inference workload can berepresented as a computational graph with nodes as operators transformingmultidimensional tensors. The tensors can be transposed and/or tiled in acombinatorially large number of ways, some configurations leading toaccelerated inference. We propose TGraph, a neural graph architecture thatallows screening for fast configurations of the target computational graph,thus representing an artificial intelligence (AI) tensor compiler in contrastto the traditional heuristics-based compilers. The proposed solution improvesmean Kendall's $tau$ across layout collections of TpuGraphs from 29.8% of thereliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emissionreduction associated with our work to be equivalent to over 50% of the totalhousehold emissions in the areas hosting AI-oriented data centers.
近年来,随着神经网络的普及,人们需要为推理工作负载提供高效服务。神经网络推理工作负载可以表示为一个计算图,节点是变换多维张量的算子。这些张量可以通过大量组合方式进行转置和/或平铺,其中一些配置可以加快推理速度。我们提出的 TGraph 是一种神经图架构,它允许筛选目标计算图的快速配置,从而代表了一种人工智能(AI)张量编译器,与传统的基于启发式的编译器形成鲜明对比。所提出的解决方案提高了 TpuGraph 布局集合的平均 Kendall's $tau$ 值,从可靠基线的 29.8% 提高到 TGraph 的 67.4%。我们估计,与我们的工作相关的潜在 CO$_2$ 减排量相当于面向人工智能的数据中心所在地区家庭总排放量的 50%以上。
{"title":"Graph neural networks with configuration cross-attention for tensor compilers","authors":"Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh","doi":"arxiv-2405.16623","DOIUrl":"https://doi.org/arxiv-2405.16623","url":null,"abstract":"With the recent popularity of neural networks comes the need for efficient\u0000serving of inference workloads. A neural network inference workload can be\u0000represented as a computational graph with nodes as operators transforming\u0000multidimensional tensors. The tensors can be transposed and/or tiled in a\u0000combinatorially large number of ways, some configurations leading to\u0000accelerated inference. We propose TGraph, a neural graph architecture that\u0000allows screening for fast configurations of the target computational graph,\u0000thus representing an artificial intelligence (AI) tensor compiler in contrast\u0000to the traditional heuristics-based compilers. The proposed solution improves\u0000mean Kendall's $tau$ across layout collections of TpuGraphs from 29.8% of the\u0000reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission\u0000reduction associated with our work to be equivalent to over 50% of the total\u0000household emissions in the areas hosting AI-oriented data centers.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning 半同步联合学习中不同聚合方案的实验研究
Pub Date : 2024-05-25 DOI: arxiv-2405.16086
Yunbo Li, Jiaping Gui, Yue Wu
Federated learning is highly valued due to its high-performance computing indistributed environments while safeguarding data privacy. To address resourceheterogeneity, researchers have proposed a semi-asynchronous federated learning(SAFL) architecture. However, the performance gap between different aggregationtargets in SAFL remain unexplored. In this paper, we systematically compare the performance between twoalgorithm modes, FedSGD and FedAvg that correspond to aggregating gradients andmodels, respectively. Our results across various task scenarios indicate thesetwo modes exhibit a substantial performance gap. Specifically, FedSGD achieveshigher accuracy and faster convergence but experiences more severe fluctuatesin accuracy, whereas FedAvg excels in handling straggler issues but convergesslower with reduced accuracy.
联盟学习因其在分布式环境中的高性能计算而备受推崇,同时还能保护数据隐私。为了解决资源异构问题,研究人员提出了半异步联合学习(SAFL)架构。然而,SAFL 中不同聚合目标之间的性能差距仍未得到探索。在本文中,我们系统地比较了 FedSGD 和 FedAvg 这两种算法模式的性能,它们分别对应于聚合梯度和模型。我们在各种任务场景下得出的结果表明,这两种模式在性能上存在很大差距。具体来说,FedSGD 实现了更高的精度和更快的收敛速度,但精度波动更为剧烈;而 FedAvg 在处理杂散问题方面表现出色,但收敛速度较慢,精度也有所下降。
{"title":"An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning","authors":"Yunbo Li, Jiaping Gui, Yue Wu","doi":"arxiv-2405.16086","DOIUrl":"https://doi.org/arxiv-2405.16086","url":null,"abstract":"Federated learning is highly valued due to its high-performance computing in\u0000distributed environments while safeguarding data privacy. To address resource\u0000heterogeneity, researchers have proposed a semi-asynchronous federated learning\u0000(SAFL) architecture. However, the performance gap between different aggregation\u0000targets in SAFL remain unexplored. In this paper, we systematically compare the performance between two\u0000algorithm modes, FedSGD and FedAvg that correspond to aggregating gradients and\u0000models, respectively. Our results across various task scenarios indicate these\u0000two modes exhibit a substantial performance gap. Specifically, FedSGD achieves\u0000higher accuracy and faster convergence but experiences more severe fluctuates\u0000in accuracy, whereas FedAvg excels in handling straggler issues but converges\u0000slower with reduced accuracy.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"2016 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Performance
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1