首页 > 最新文献

International Journal of Parallel Programming最新文献

英文 中文
ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation ControlPULP:用于多核高性能计算处理器的 RISC-V 片上并行功率控制器,具有基于 FPGA 的硬件在环功率和热仿真功能
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-26 DOI: 10.1007/s10766-024-00761-4
Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini

High-performance computing (HPC) processors are nowadays integrated cyber-physical systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant’s thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.

如今,高性能计算(HPC)处理器已成为集成的网络物理系统,需要复杂的高带宽闭环功率和热控制策略。为有效满足实时多输入多输出(MIMO)的最佳功率要求,高端处理器集成了片上功率控制系统(PCS)。传统的 PCS 基于简单的微控制器(MCU)级内核,但需要更具可扩展性和灵活性的 PCS 架构来支持先进的 MIMO 控制算法,以管理不断增加的内核数量、功率状态以及工艺、电压和温度变化。本文介绍的 ControlPULP 是一个开源、硬件/软件 RISC-V 并行 PCS 平台,由一个具有快速中断处理功能的单核 MCU 和一个可扩展的多核可编程集群加速器以及一个专用 DMA 引擎组成,用于并行加速实时电源管理策略。ControlPULP 依靠 FreeRTOS 调度无功功率控制固件 (PCF) 应用层。我们在针对下一代 72 核高性能计算处理器的电源管理用例中演示了 ControlPULP。我们首先展示了多核集群对 PCF 的加速作用,与单核执行相比,PCF 的速度提高了 4.9 倍,在控制超周期内以较小的面积开销(约为现代 HPC CPU 芯片面积的 0.1%)实现了更先进的电源管理算法。然后,我们通过设计一个基于 FPGA 的闭环仿真框架来评估 PCS 和 PCF,该框架利用异构 SoC 范例实现了 DVFS 跟踪,与软件等价模型在环方法相比,平均偏差在工厂热设计功率 (TDP) 的 3% 以内。最后,我们表明,在计算密集型工作负载下,所提出的 PCF 可与工业级控制算法相媲美。
{"title":"ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation","authors":"Alessandro Ottaviano, Robert Balas, Giovanni Bambini, Antonio Del Vecchio, Maicol Ciani, Davide Rossi, Luca Benini, Andrea Bartolini","doi":"10.1007/s10766-024-00761-4","DOIUrl":"https://doi.org/10.1007/s10766-024-00761-4","url":null,"abstract":"<p>High-performance computing (HPC) processors are nowadays integrated cyber-physical systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant’s thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"242 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139967725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design 研究基于 ASPmT 的进化产品设计空间探索方法
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-24 DOI: 10.1007/s10766-024-00763-2
Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub

Nowadays, product development is challenged by increasing system complexity and stringent time-to-market. To handle the demanding market requirements, knowledge from prior product generations is used to derive new, but partially similar product versions. The concept of product generation engineering, hence, allows manufacturers to release high-quality products within short development times. Therefore, in this paper, we propose a novel approach to evaluate the similarity of two product implementations based on the concept of the Hamming distance. This allows the usage of similarity information in various heuristics as well as in strategies and thus, to improve the product design process. In a wide set of cases, we investigate the quality and similarity of design points. In the experiments, the use of strategies leads to significantly short searching times, but also tends to be too restrictive in certain cases. Simultaneously, the quality of the solutions found in the heuristic design space exploration has been shown to be as good or better than for the search from scratch and considerably closer solutions as part of the non-dominated solution front have been found.

如今,产品开发面临着系统复杂性不断增加和上市时间紧迫的挑战。为了满足苛刻的市场要求,需要利用前几代产品的知识来开发新的、但部分相似的产品版本。因此,产品生成工程的概念使制造商能够在较短的开发时间内推出高质量的产品。因此,我们在本文中提出了一种基于汉明距离概念的新方法,用于评估两个产品实现的相似性。这样就可以在各种启发式方法和策略中使用相似性信息,从而改进产品设计过程。在大量案例中,我们研究了设计点的质量和相似性。在实验中,策略的使用大大缩短了搜索时间,但在某些情况下也有限制过多的倾向。同时,在启发式设计空间探索中找到的解决方案的质量已被证明与从头开始搜索的质量一样好,甚至更好,而且作为非主导解决方案前沿的一部分,已经找到了相当接近的解决方案。
{"title":"Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design","authors":"Luise Müller, Philipp Wanko, Christian Haubelt, Torsten Schaub","doi":"10.1007/s10766-024-00763-2","DOIUrl":"https://doi.org/10.1007/s10766-024-00763-2","url":null,"abstract":"<p>Nowadays, product development is challenged by increasing system complexity and stringent time-to-market. To handle the demanding market requirements, knowledge from prior product generations is used to derive new, but partially similar product versions. The concept of product generation engineering, hence, allows manufacturers to release high-quality products within short development times. Therefore, in this paper, we propose a novel approach to evaluate the similarity of two product implementations based on the concept of the Hamming distance. This allows the usage of similarity information in various heuristics as well as in strategies and thus, to improve the product design process. In a wide set of cases, we investigate the quality and similarity of design points. In the experiments, the use of strategies leads to significantly short searching times, but also tends to be too restrictive in certain cases. Simultaneously, the quality of the solutions found in the heuristic design space exploration has been shown to be as good or better than for the search from scratch and considerably closer solutions as part of the non-dominated solution front have been found.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"114 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks 针对卷积神经网络的硬件感知进化可解释滤波器修剪
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-22 DOI: 10.1007/s10766-024-00760-5
Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig

Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the (ell ^1)-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.

卷积神经网络(CNN)的滤波器剪枝是一种常用技术,可有效减少内存占用、算术运算次数,从而缩短推理时间。最近的剪枝方法还考虑了部署 CNN 的目标设备(即图形处理单元),以减少实际推理时间。然而,一些简单的指标,如 (ell ^1)-norm,被用于决定哪些滤波器需要剪枝。在这项工作中,我们提出了一种硬件感知技术,用于探索可能的滤波器剪枝配置的广阔多目标设计空间。我们的方法不仅结合了目标设备,还结合了可解释人工智能技术,用于排序和决定修剪哪些滤波器。对于每一层,要剪枝的滤波器数量都要进行优化,目标是最大限度地减少 CNN 的推理时间和错误率。实验结果表明,与最先进的 ABCPruner 相比,我们的方法可将 CIFAR-10 数据集上的 VGG-16 和 ILSVRC-2012 数据集上的 ResNet-18 的推理时间分别加快 1.40 倍和 1.30 倍。
{"title":"Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks","authors":"Christian Heidorn, Muhammad Sabih, Nicolai Meyerhöfer, Christian Schinabeck, Jürgen Teich, Frank Hannig","doi":"10.1007/s10766-024-00760-5","DOIUrl":"https://doi.org/10.1007/s10766-024-00760-5","url":null,"abstract":"<p>Filter pruning of convolutional neural networks (CNNs) is a common technique to effectively reduce the memory footprint, the number of arithmetic operations, and, consequently, inference time. Recent pruning approaches also consider the targeted device (i.e., graphics processing units) for CNN deployment to reduce the actual inference time. However, simple metrics, such as the <span>(ell ^1)</span>-norm, are used for deciding which filters to prune. In this work, we propose a hardware-aware technique to explore the vast multi-objective design space of possible filter pruning configurations. Our approach incorporates not only the targeted device but also techniques from explainable artificial intelligence for ranking and deciding which filters to prune. For each layer, the number of filters to be pruned is optimized with the objective of minimizing the inference time and the error rate of the CNN. Experimental results show that our approach can speed up inference time by 1.40× and 1.30× for VGG-16 on the CIFAR-10 dataset and ResNet-18 on the ILSVRC-2012 dataset, respectively, compared to the state-of-the-art ABCPruner.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"819 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Practical Approach for Employing Tensor Train Decomposition in Edge Devices 在边缘设备中采用张量列车分解的实用方法
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-16 DOI: 10.1007/s10766-024-00762-3
Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis

Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.

深度神经网络(DNN)在语音识别和图像处理等多个领域取得了重大进展。通常情况下,现代 DNN 都是计算和内存密集型的,因此在低端设备中部署 DNN 是一项具有挑战性的任务。解决这一问题的一种著名技术是低阶因式分解(LRF),即用一个或多个低阶张量来近似权重张量,从而减少内存大小和执行张量运算的次数。然而,采用 LRF 是一个多参数的优化过程,涉及一个巨大的设计空间,不同的设计点代表不同的解决方案,需要在 FLOPs 数量、内存大小和 DNN 模型的预测精度之间进行权衡。因此,提取高效解决方案是一个复杂而耗时的过程。本研究提出了一种新方法,将 LRF 问题表述为(FLOPs vs. 内存 vs. 预测精度)设计空间探索(DSE)问题。然后,通过去除低效解决方案,对 DSE 空间进行大幅剪枝。我们的实验结果证明,设计空间可以被有效剪枝,因此只提取有限的一组解决方案,与原始(非因子化)模型相比,这些解决方案的准确性、内存和 FLOPs 都有所提高。我们的方法是作为一个独立的参数化模块开发的,集成在 TensorFlow 2.X 的 T3F 库中。
{"title":"A Practical Approach for Employing Tensor Train Decomposition in Edge Devices","authors":"Milad Kokhazadeh, Georgios Keramidas, Vasilios Kelefouras, Iakovos Stamoulis","doi":"10.1007/s10766-024-00762-3","DOIUrl":"https://doi.org/10.1007/s10766-024-00762-3","url":null,"abstract":"<p>Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"54 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems 通过部分匹配预测紧密耦合内存系统的访问间隔
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-13 DOI: 10.1007/s10766-024-00764-1
Viktor Razilov, Robert Wittig, Emil Matúš, Gerhard Fettweis

In embedded systems, tightly coupled memories (TCMs) are usually shared between multiple masters for the purpose of hardware efficiency and software flexibility. On the one hand, memory sharing improves area utilization, but on the other hand, this can lead to a performance degradation due to an increase in access conflicts. To mitigate the associated performance penalty, access interval prediction (AIP) has been proposed. In a similar fashion to branch prediction, AIP exploits program flow regularity to predict the cycle of the next memory access. We show that this structural similarity allows for adaption of state-of-the-art branch predictors, such as Prediction by Partial Matching (PPM) and the TAgged GEometric history length (TAGE) branch predictor. Our analysis on memory access traces reveals that PPM predicts 99 percent of memory accesses. As PPM does not lend itself to hardware implementation, we also present the PPM-based TAGE access interval predictor which attains an accuracy of over 97 percent outperforming all previously presented implementable AIP schemes.

在嵌入式系统中,为了提高硬件效率和软件灵活性,紧密耦合存储器(TCM)通常由多个主控器共享。一方面,内存共享提高了区域利用率,但另一方面,由于访问冲突的增加,可能导致性能下降。为了减轻相关的性能损失,有人提出了访问间隔预测(AIP)技术。与分支预测类似,AIP 利用程序流的规律性来预测下一次内存访问的周期。我们的研究表明,这种结构上的相似性允许调整最先进的分支预测器,如部分匹配预测(PPM)和TAgged GEometric history length(TAGE)分支预测器。我们对内存访问跟踪的分析表明,PPM 预测了 99% 的内存访问。由于 PPM 不适合硬件实现,我们还提出了基于 PPM 的 TAGE 访问间隔预测器,其准确率超过 97%,优于之前提出的所有可实现 AIP 方案。
{"title":"Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems","authors":"Viktor Razilov, Robert Wittig, Emil Matúš, Gerhard Fettweis","doi":"10.1007/s10766-024-00764-1","DOIUrl":"https://doi.org/10.1007/s10766-024-00764-1","url":null,"abstract":"<p>In embedded systems, tightly coupled memories (TCMs) are usually shared between multiple masters for the purpose of hardware efficiency and software flexibility. On the one hand, memory sharing improves area utilization, but on the other hand, this can lead to a performance degradation due to an increase in access conflicts. To mitigate the associated performance penalty, access interval prediction (AIP) has been proposed. In a similar fashion to branch prediction, AIP exploits program flow regularity to predict the cycle of the next memory access. We show that this structural similarity allows for adaption of state-of-the-art branch predictors, such as Prediction by Partial Matching (PPM) and the TAgged GEometric history length (TAGE) branch predictor. Our analysis on memory access traces reveals that PPM predicts 99 percent of memory accesses. As PPM does not lend itself to hardware implementation, we also present the PPM-based TAGE access interval predictor which attains an accuracy of over 97 percent outperforming all previously presented implementable AIP schemes.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"29 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method 利用高效的伪同步更新方法加速大规模分布式深度学习
4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2023-11-13 DOI: 10.1007/s10766-023-00759-4
Yingpeng Wen, Zhilin Qiu, Dongyu Zhang, Dan Huang, Nong Xiao, Liang Lin
{"title":"Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method","authors":"Yingpeng Wen, Zhilin Qiu, Dongyu Zhang, Dan Huang, Nong Xiao, Liang Lin","doi":"10.1007/s10766-023-00759-4","DOIUrl":"https://doi.org/10.1007/s10766-023-00759-4","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"60 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136347152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid Machine Learning Model for Code Optimization 代码优化的混合机器学习模型
4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2023-09-22 DOI: 10.1007/s10766-023-00758-5
Yacine Hakimi, Riyadh Baghdadi, Yacine Challal
{"title":"A Hybrid Machine Learning Model for Code Optimization","authors":"Yacine Hakimi, Riyadh Baghdadi, Yacine Challal","doi":"10.1007/s10766-023-00758-5","DOIUrl":"https://doi.org/10.1007/s10766-023-00758-5","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136061710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution 基于gpu的空间数据k近邻查询分区和并发内核执行算法
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2023-07-21 DOI: 10.1007/s10766-023-00755-8
Polychronis Velentzas, M. Vassilakopoulos, A. Corral, C. Antonopoulos
{"title":"GPU-Based Algorithms for Processing the k Nearest-Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution","authors":"Polychronis Velentzas, M. Vassilakopoulos, A. Corral, C. Antonopoulos","doi":"10.1007/s10766-023-00755-8","DOIUrl":"https://doi.org/10.1007/s10766-023-00755-8","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48802782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU 张量核GPU上分布阶分数阶导数的计算
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2023-07-10 DOI: 10.1007/s10766-023-00754-9
Vsevolod Bohaienko
{"title":"Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU","authors":"Vsevolod Bohaienko","doi":"10.1007/s10766-023-00754-9","DOIUrl":"https://doi.org/10.1007/s10766-023-00754-9","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"51 1","pages":"256 - 270"},"PeriodicalIF":1.5,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45404292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks 分布式图形处理任务的分区感知性能建模
IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2023-05-05 DOI: 10.1007/s10766-023-00753-w
Daniel Presser, Frank Siqueira
{"title":"Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks","authors":"Daniel Presser, Frank Siqueira","doi":"10.1007/s10766-023-00753-w","DOIUrl":"https://doi.org/10.1007/s10766-023-00753-w","url":null,"abstract":"","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"1 1","pages":"1-25"},"PeriodicalIF":1.5,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43960120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Parallel Programming
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1