首页 > 最新文献

Proceedings of the 17th ACM International Conference on Computing Frontiers最新文献

英文 中文
Contention-aware application performance prediction for disaggregated memory systems 面向分解内存系统的竞争感知应用程序性能预测
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392625
F. V. Zacarias, Rajiv Nishtala, P. Carpenter
Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity to compute jobs. This paper makes an important step towards effective resource allocation on disaggregated memory systems. Specifically, we propose a generic approach to predict the performance degradation due to sharing of disaggregated memory. In contrast to prior work, cache capacity is not shared among multiple applications, which removes a major contributor to application performance. For this reason, our analysis is driven by the demand for memory bandwidth, which has been shown to have an important effect on application performance. We show that profiling the application slowdown often involves significant experimental error and noise, and to this end, we improve the accuracy by linear smoothing of the sensitivity curves. We also show that contention is sensitive to the ratio between read and write memory accesses, and we address this sensitivity by building a family of sensitivity curves according to the read/write ratios. Our results show that the methodology predicts the slowdown in application performance subject to memory contention with an average error of 1.19% and max error of 14.6%. Compared with state-of-the-art, the relative improvements are almost 24% on average and 33% for the worst case.
分解内存最近被提议作为一种允许灵活和细粒度地分配内存容量来计算作业的方法。本文为实现可分解存储系统的有效资源分配迈出了重要的一步。具体来说,我们提出了一种通用的方法来预测由于共享分解内存而导致的性能下降。与以前的工作相比,缓存容量不会在多个应用程序之间共享,这消除了影响应用程序性能的主要因素。出于这个原因,我们的分析是由内存带宽需求驱动的,内存带宽已被证明对应用程序性能有重要影响。研究表明,分析应用程序的速度往往涉及显著的实验误差和噪声,为此,我们通过对灵敏度曲线进行线性平滑来提高精度。我们还表明争用对读写内存访问之间的比率很敏感,我们通过根据读/写比率构建一系列灵敏度曲线来解决这种敏感性。我们的结果表明,该方法预测内存争用导致的应用程序性能下降,平均误差为1.19%,最大误差为14.6%。与最先进的技术相比,其相对改进率平均为24%,最坏情况下为33%。
{"title":"Contention-aware application performance prediction for disaggregated memory systems","authors":"F. V. Zacarias, Rajiv Nishtala, P. Carpenter","doi":"10.1145/3387902.3392625","DOIUrl":"https://doi.org/10.1145/3387902.3392625","url":null,"abstract":"Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity to compute jobs. This paper makes an important step towards effective resource allocation on disaggregated memory systems. Specifically, we propose a generic approach to predict the performance degradation due to sharing of disaggregated memory. In contrast to prior work, cache capacity is not shared among multiple applications, which removes a major contributor to application performance. For this reason, our analysis is driven by the demand for memory bandwidth, which has been shown to have an important effect on application performance. We show that profiling the application slowdown often involves significant experimental error and noise, and to this end, we improve the accuracy by linear smoothing of the sensitivity curves. We also show that contention is sensitive to the ratio between read and write memory accesses, and we address this sensitivity by building a family of sensitivity curves according to the read/write ratios. Our results show that the methodology predicts the slowdown in application performance subject to memory contention with an average error of 1.19% and max error of 14.6%. Compared with state-of-the-art, the relative improvements are almost 24% on average and 33% for the worst case.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131801103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems 基于非相干突发和基于相干缓存线的内存系统之间的开源桥接的设计
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392631
Matheus A. Cavalcante, Andreas Kurth, Fabian Schuiki, L. Benini
In heterogeneous computer architectures, the serial part of an application is coupled with domain-specific accelerators that promise high computing throughput and efficiency across a wide range of applications. In such systems, the serial part of a program is executed on a Central Processing Unit (CPU) core optimized for single-thread performance, while parallel sections are offloaded to Programmable Manycore Accelerators (PMCAs). This heterogeneity requires CPU cores and PMCAs to share data in memory efficiently, although CPUs rely on a coherent memory system where data is transferred in cache lines, while PMCAs are based on non-coherent scratchpad memories where data is transferred in bursts by DMA engines. In this paper, we tackle the challenges and hardware complexity of bridging the gap from a non-coherent, burst-based memory hierarchy to a coherent, cache-line-based one. We design and implement an open-source hardware module that reaches 97% peak throughput over a wide range of realistic linear algebra kernels and is suited for a wide spectrum of memory architectures. Implemented in a state-of-the-art 22 nm FD-SOI technology, our module bridges up to 650 Gbps at 130 fJ/bit and has a complexity of less than 1 kGE/Gbps.
在异构计算机体系结构中,应用程序的串行部分与领域特定的加速器耦合在一起,这些加速器保证了跨广泛应用程序的高计算吞吐量和效率。在这样的系统中,程序的串行部分在针对单线程性能进行优化的中央处理单元(CPU)核心上执行,而并行部分则卸载到可编程多核加速器(PMCAs)上。这种异构性要求CPU内核和pmca有效地共享内存中的数据,尽管CPU依赖于一个连贯的内存系统,其中数据在缓存线中传输,而pmca基于非连贯的刮擦存储器,其中数据由DMA引擎以突发方式传输。在本文中,我们解决了从非连贯的、基于突发的内存层次结构到连贯的、基于缓存线的内存层次结构之间的桥梁的挑战和硬件复杂性。我们设计并实现了一个开源硬件模块,在广泛的现实线性代数内核上达到97%的峰值吞吐量,适合于广泛的内存架构。我们的模块采用最先进的22纳米FD-SOI技术,以130 fJ/bit的速度桥接高达650 Gbps,复杂性低于1 kGE/Gbps。
{"title":"Design of an open-source bridge between non-coherent burst-based and coherent cache-line-based memory systems","authors":"Matheus A. Cavalcante, Andreas Kurth, Fabian Schuiki, L. Benini","doi":"10.1145/3387902.3392631","DOIUrl":"https://doi.org/10.1145/3387902.3392631","url":null,"abstract":"In heterogeneous computer architectures, the serial part of an application is coupled with domain-specific accelerators that promise high computing throughput and efficiency across a wide range of applications. In such systems, the serial part of a program is executed on a Central Processing Unit (CPU) core optimized for single-thread performance, while parallel sections are offloaded to Programmable Manycore Accelerators (PMCAs). This heterogeneity requires CPU cores and PMCAs to share data in memory efficiently, although CPUs rely on a coherent memory system where data is transferred in cache lines, while PMCAs are based on non-coherent scratchpad memories where data is transferred in bursts by DMA engines. In this paper, we tackle the challenges and hardware complexity of bridging the gap from a non-coherent, burst-based memory hierarchy to a coherent, cache-line-based one. We design and implement an open-source hardware module that reaches 97% peak throughput over a wide range of realistic linear algebra kernels and is suited for a wide spectrum of memory architectures. Implemented in a state-of-the-art 22 nm FD-SOI technology, our module bridges up to 650 Gbps at 130 fJ/bit and has a complexity of less than 1 kGE/Gbps.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132660972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
StoneCutter 石匠
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394029
J. Leidel, D. Donofrio, Frank Conlon
As the density and capability of reconfigurable computing using FPGAs continues to increase and access to large scale ASIC integration continues to increase, research activities associated with high level synthesis flows have expanded at a similar rate. The goal of these research efforts is to reduce the time and effort required to construct and deploy application-specific architectures. However, these synthesis techniques often force users to consider the entire circuit design space in order to develop a successful implementation. This lack of design specificity often results in hardware design implementations that are difficult to program, difficult to reuse in future designs and make sub-optimal use of hardware resources. In this work we introduce the StoneCutter instruction set design language and tool infrastructure. StoneCutter provides a familiar, C-like language construct by which to develop the implementation for individual, programmable instructions. The LLVM-based StoneCutter compiler performs individual instruction and whole-ISA optimizations in order to generate a high performance, Chisel HDL representation of the target design. Utilizing the existing Chisel tools, users can also generate C++ cycle accurate simulation models as well as Verilog representations of the target design. As a result, StoneCutter provides a very rapid design environment for development and experimentation.
{"title":"StoneCutter","authors":"J. Leidel, D. Donofrio, Frank Conlon","doi":"10.1145/3387902.3394029","DOIUrl":"https://doi.org/10.1145/3387902.3394029","url":null,"abstract":"As the density and capability of reconfigurable computing using FPGAs continues to increase and access to large scale ASIC integration continues to increase, research activities associated with high level synthesis flows have expanded at a similar rate. The goal of these research efforts is to reduce the time and effort required to construct and deploy application-specific architectures. However, these synthesis techniques often force users to consider the entire circuit design space in order to develop a successful implementation. This lack of design specificity often results in hardware design implementations that are difficult to program, difficult to reuse in future designs and make sub-optimal use of hardware resources. In this work we introduce the StoneCutter instruction set design language and tool infrastructure. StoneCutter provides a familiar, C-like language construct by which to develop the implementation for individual, programmable instructions. The LLVM-based StoneCutter compiler performs individual instruction and whole-ISA optimizations in order to generate a high performance, Chisel HDL representation of the target design. Utilizing the existing Chisel tools, users can also generate C++ cycle accurate simulation models as well as Verilog representations of the target design. As a result, StoneCutter provides a very rapid design environment for development and experimentation.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125368270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-sliced quantum circuit partitioning for modular architectures 模块化架构的时间切片量子电路划分
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392617
Jonathan M. Baker, Casey Duckering, Alexander P. Hoover, F. Chong
Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitioning heuristics which map quantum circuits to modular physical machines one time slice at a time. Specifically, we create optimized mappings for each time slice, accounting for the cost to move data from the previous time slice and using a tunable lookahead scheme to reduce the cost to move to future time slices. We compare our approach to a traditional statically-mapped, owner-computes model. Our results show strict improvement over the static mapping baseline. We reduce the non-local communication overhead by 89.8% in the best case and by 60.9% on average. Our techniques, unlike many exact solver methods, are computationally tractable.
目前的量子计算机设计无法扩展。为了超越小型原型,量子架构可能会采用模块化的方法,采用紧密连接的量子比特集群和集群之间更稀疏的连接。我们利用这种聚类和量子程序的静态已知控制流来创建易于处理的划分启发式算法,将量子电路映射到模块化物理机器,每次一个时间片。具体来说,我们为每个时间片创建了优化的映射,考虑了从前一个时间片移动数据的成本,并使用可调的前瞻方案来减少移动到未来时间片的成本。我们将我们的方法与传统的静态映射、所有者计算模型进行比较。我们的结果显示了相对于静态映射基线的严格改进。在最佳情况下,我们将非本地通信开销减少了89.8%,平均减少了60.9%。我们的技术,不像许多精确求解器方法,在计算上是可处理的。
{"title":"Time-sliced quantum circuit partitioning for modular architectures","authors":"Jonathan M. Baker, Casey Duckering, Alexander P. Hoover, F. Chong","doi":"10.1145/3387902.3392617","DOIUrl":"https://doi.org/10.1145/3387902.3392617","url":null,"abstract":"Current quantum computer designs will not scale. To scale beyond small prototypes, quantum architectures will likely adopt a modular approach with clusters of tightly connected quantum bits and sparser connections between clusters. We exploit this clustering and the statically-known control flow of quantum programs to create tractable partitioning heuristics which map quantum circuits to modular physical machines one time slice at a time. Specifically, we create optimized mappings for each time slice, accounting for the cost to move data from the previous time slice and using a tunable lookahead scheme to reduce the cost to move to future time slices. We compare our approach to a traditional statically-mapped, owner-computes model. Our results show strict improvement over the static mapping baseline. We reduce the non-local communication overhead by 89.8% in the best case and by 60.9% on average. Our techniques, unlike many exact solver methods, are computationally tractable.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"609 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121979988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Quantum splines for non-linear approximations 非线性近似的量子样条
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394032
A. Macaluso, L. Clissa, Stefano Lodi, Claudio Sartori
Quantum Computing offers a new paradigm for efficient computing and many AI applications could benefit from its potential boost in performance. However, the main limitation is the constraint to linear operations that hampers the representation of complex relationships in data. In this work, we propose an efficient implementation of quantum splines for non-linear approximation. In particular, we first discuss possible parametrisations, and select the most convenient for exploiting the HHL algorithm to obtain the estimates of spline coefficients. Then, we investigate QSpline performance as an evaluation routine for some of the most popular activation functions adopted in ML. Finally, a detailed comparison with classical alternatives to the HHL is also presented.
量子计算为高效计算提供了一种新的范例,许多人工智能应用可以从其潜在的性能提升中受益。然而,主要的限制是对线性操作的约束,这妨碍了数据中复杂关系的表示。在这项工作中,我们提出了一种用于非线性逼近的量子样条的有效实现。特别地,我们首先讨论了可能的参数化,并选择了最方便的利用HHL算法来获得样条系数的估计。然后,我们研究了QSpline性能作为ML中采用的一些最流行的激活函数的评估例程。最后,还介绍了与HHL的经典替代方案的详细比较。
{"title":"Quantum splines for non-linear approximations","authors":"A. Macaluso, L. Clissa, Stefano Lodi, Claudio Sartori","doi":"10.1145/3387902.3394032","DOIUrl":"https://doi.org/10.1145/3387902.3394032","url":null,"abstract":"Quantum Computing offers a new paradigm for efficient computing and many AI applications could benefit from its potential boost in performance. However, the main limitation is the constraint to linear operations that hampers the representation of complex relationships in data. In this work, we propose an efficient implementation of quantum splines for non-linear approximation. In particular, we first discuss possible parametrisations, and select the most convenient for exploiting the HHL algorithm to obtain the estimates of spline coefficients. Then, we investigate QSpline performance as an evaluation routine for some of the most popular activation functions adopted in ML. Finally, a detailed comparison with classical alternatives to the HHL is also presented.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127033646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Freeway 高速公路
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394028
Yifan Shen, Ke Liu, Ziting Guo, Wenli Zhang, Guanghui Zhang, V. Aggarwal, Mingyu Chen
After reading this book, you will really know how exactly the importance of reading books as common. Think once again as what this freeway gives you new lesson, the other books with many themes and genres and million PDFs will also give you same, or more than it. This is why, we always provide what you need and what you need to do. Many collections of the books from not only this country, from abroad a countries in the world are provided here. By providing easy way to help you finding the books, hopefully, reading habit will spread out easily to other people, too.
读完这本书,你就会真正知道读书的重要性到底有多普遍。再想想这条高速公路给你带来了什么新的教训,其他有许多主题和流派的书以及数百万的pdf文件也会给你同样的,甚至更多。这就是为什么,我们总是提供你需要的和你需要做的。这里不仅有这个国家的藏书,还有世界上其他国家的藏书。通过提供简单的方法来帮助你找到书,希望阅读习惯也能很容易地传播给其他人。
{"title":"Freeway","authors":"Yifan Shen, Ke Liu, Ziting Guo, Wenli Zhang, Guanghui Zhang, V. Aggarwal, Mingyu Chen","doi":"10.1145/3387902.3394028","DOIUrl":"https://doi.org/10.1145/3387902.3394028","url":null,"abstract":"After reading this book, you will really know how exactly the importance of reading books as common. Think once again as what this freeway gives you new lesson, the other books with many themes and genres and million PDFs will also give you same, or more than it. This is why, we always provide what you need and what you need to do. Many collections of the books from not only this country, from abroad a countries in the world are provided here. By providing easy way to help you finding the books, hopefully, reading habit will spread out easily to other people, too.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134351226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enabling mixed-precision quantized neural networks in extreme-edge devices 在极端边缘设备中实现混合精度量化神经网络
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394038
Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.
在高级微控制器上部署量化神经网络(QNN)需要优化软件来利用现代指令集架构(ISA)的数字信号处理(DSP)扩展。因此,最近的研究提出了针对qnn(从8位到2位)的优化库,如CMSIS-NN和PULP-NN。这项工作提出了对PULP-NN库的扩展,目标是加速混合精度深度神经网络,这是一种新兴的范例,能够显著缩小深度神经网络的内存占用,而精度损失可以忽略不计。该库由27个内核组成,每个内核用于输入特征映射、权重和输出特征映射精度的排列(考虑8位、4位和2位),能够在基于RISC-V处理器的并行超低功耗(PULP)集群上高效地推断QNN,具有RV32IMCXpulpV2 ISA。提出的解决方案在8核GAP-8 PULP集群上进行基准测试,在8核上达到16 mac /周期的峰值性能,比STM32H7(由ARM Cortex M7处理器驱动)快21到25倍,能效提高15到21倍。
{"title":"Enabling mixed-precision quantized neural networks in extreme-edge devices","authors":"Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, D. Rossi","doi":"10.1145/3387902.3394038","DOIUrl":"https://doi.org/10.1145/3387902.3394038","url":null,"abstract":"The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132398132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
HiLSM
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392621
Wenjie Li, Dejun Jiang, Jin Xiong, Yungang Bao
In order to ensure data durability and crash consistency, the LSM-tree based key-value stores suffer from high WAL synchronization overhead. Fortunately, the advent of NVM offers an opportunity to address this issue. However, NVM is currently too expensive to meet the demand of massive storage systems. Therefore, the hybrid NVM and SSD storage system provides a more cost-efficient solution. This paper proposes HiLSM, a key-value store for hybrid NVM-SSD storage systems. According to the characteristics of hybrid storage mediums, HiLSM adopts hybrid data structures consisting of the log-structured memory and the LSM-tree. Aiming at the issue of write stalls in write intensive scenario, a fine-grained data migration strategy is proposed to make the data migration start as early as possible. Aiming at the performance gap between NVM and SSD, a multi-threaded data migration strategy is proposed to make the data migration complete as soon as possible. Aiming at the LSM-tree's inherent issue of write amplification, a data filtering strategy is proposed to make data updates be absorbed in NVM as much as possible. We compare HiLSM with the state-of-the-art key-value stores via extensive experiments and the results show that HiLSM achieves 1.3x higher throughput for write, 10x higher throughput for read and 79% less write traffic under the skewed workload.
{"title":"HiLSM","authors":"Wenjie Li, Dejun Jiang, Jin Xiong, Yungang Bao","doi":"10.1145/3387902.3392621","DOIUrl":"https://doi.org/10.1145/3387902.3392621","url":null,"abstract":"In order to ensure data durability and crash consistency, the LSM-tree based key-value stores suffer from high WAL synchronization overhead. Fortunately, the advent of NVM offers an opportunity to address this issue. However, NVM is currently too expensive to meet the demand of massive storage systems. Therefore, the hybrid NVM and SSD storage system provides a more cost-efficient solution. This paper proposes HiLSM, a key-value store for hybrid NVM-SSD storage systems. According to the characteristics of hybrid storage mediums, HiLSM adopts hybrid data structures consisting of the log-structured memory and the LSM-tree. Aiming at the issue of write stalls in write intensive scenario, a fine-grained data migration strategy is proposed to make the data migration start as early as possible. Aiming at the performance gap between NVM and SSD, a multi-threaded data migration strategy is proposed to make the data migration complete as soon as possible. Aiming at the LSM-tree's inherent issue of write amplification, a data filtering strategy is proposed to make data updates be absorbed in NVM as much as possible. We compare HiLSM with the state-of-the-art key-value stores via extensive experiments and the results show that HiLSM achieves 1.3x higher throughput for write, 10x higher throughput for read and 79% less write traffic under the skewed workload.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122299815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
SoundFactory SoundFactory
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3394036
A. Scionti, S. Ciccia, O. Terzo
The proliferation of smart connected devices using digital assistants activated by voice commands (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.) is raising the interest in algorithms to localize and recognize audio sources. Among the others, deep neural networks (DNNs) are seen as a promising approach to accomplish such task. Unlike other approaches, DNNs can categorize received events, thus discriminating between events of interests and not even in presence of noise. Despite their advantages, DNNs require large datasets to be trained. Thus, tools for generating datasets are of great value, being able to accelerate the development of advanced learning models. This paper presents SoundFactory, a framework for simulating the propagation of sound waves (also considering noise, reverberation, reflection, attenuation, and other interfering waves) and the microphone array response to such sound waves. As such, SoundFactory allows to easily generate datasets to train deep neural networks which are at the basis of modern applications. SoundFactory is flexible enough to simulate many different microphone array configurations, thus covering a large set of use cases. To demonstrate the capabilities offered by SoundFactory, we generated a dataset and trained two different (rather simple) learning models against them, achieving up to 97% of accuracy. The quality of the generated dataset has been also assessed comparing the microphone array model responses with the real ones.
{"title":"SoundFactory","authors":"A. Scionti, S. Ciccia, O. Terzo","doi":"10.1145/3387902.3394036","DOIUrl":"https://doi.org/10.1145/3387902.3394036","url":null,"abstract":"The proliferation of smart connected devices using digital assistants activated by voice commands (e.g., Apple Siri, Google Assistant, Amazon Alexa, etc.) is raising the interest in algorithms to localize and recognize audio sources. Among the others, deep neural networks (DNNs) are seen as a promising approach to accomplish such task. Unlike other approaches, DNNs can categorize received events, thus discriminating between events of interests and not even in presence of noise. Despite their advantages, DNNs require large datasets to be trained. Thus, tools for generating datasets are of great value, being able to accelerate the development of advanced learning models. This paper presents SoundFactory, a framework for simulating the propagation of sound waves (also considering noise, reverberation, reflection, attenuation, and other interfering waves) and the microphone array response to such sound waves. As such, SoundFactory allows to easily generate datasets to train deep neural networks which are at the basis of modern applications. SoundFactory is flexible enough to simulate many different microphone array configurations, thus covering a large set of use cases. To demonstrate the capabilities offered by SoundFactory, we generated a dataset and trained two different (rather simple) learning models against them, achieving up to 97% of accuracy. The quality of the generated dataset has been also assessed comparing the microphone array model responses with the real ones.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129707179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient architecture design for the AES-128 algorithm on embedded systems 嵌入式系统中AES-128算法的高效架构设计
Pub Date : 2020-05-11 DOI: 10.1145/3387902.3392624
Rupam Mondal, H. Ngo, James Shey, R. Rakvic, Owens Walker, Dane Brown
Many applications make use of the edge devices in wireless sensor networks (WSNs), including video surveillance, traffic monitoring and enforcement, personal and health care, gaming, habitat monitoring, and industrial process control. However, these edge devices are resource-limited embedded systems that require a low-cost, low-power, and high-performance encryption/decryption solution to prevent attacks such as eavesdropping, message modification, and impersonation. This paper proposes a field-programmable gate array (FPGA) based design and implementation of the Advanced Encryption Standard (AES) algorithm for encryption and decryption using a parallel-pipeline architecture with a data forwarding mechanism that efficiently utilizes on-chip memory modules and massive parallel processing units to support a high throughput rate. Hardware designs that optimize the implementation of the AES algorithm are proposed to minimize resource allocation and maximize throughput. These designs are shown to outperform existing solutions in the literature. Additionally, a rapid prototype of a complete system-on-chip (SoC) solution that employs the proposed design on a configurable platform has been developed and proven to be suitable for real-time applications.
许多应用都在无线传感器网络(wsn)中使用边缘设备,包括视频监控、交通监控和执法、个人和医疗保健、游戏、栖息地监控和工业过程控制。然而,这些边缘设备是资源有限的嵌入式系统,需要低成本、低功耗和高性能的加密/解密解决方案来防止诸如窃听、消息修改和冒充等攻击。本文提出了一种基于现场可编程门阵列(FPGA)的高级加密标准(AES)加密和解密算法的设计和实现,采用并行管道架构和数据转发机制,有效地利用片上存储模块和大量并行处理单元来支持高吞吐量。提出了优化AES算法实现的硬件设计,以最小化资源分配和最大化吞吐量。这些设计在文献中被证明优于现有的解决方案。此外,一个完整的片上系统(SoC)解决方案的快速原型,在一个可配置的平台上采用了所提出的设计,并被证明适用于实时应用。
{"title":"Efficient architecture design for the AES-128 algorithm on embedded systems","authors":"Rupam Mondal, H. Ngo, James Shey, R. Rakvic, Owens Walker, Dane Brown","doi":"10.1145/3387902.3392624","DOIUrl":"https://doi.org/10.1145/3387902.3392624","url":null,"abstract":"Many applications make use of the edge devices in wireless sensor networks (WSNs), including video surveillance, traffic monitoring and enforcement, personal and health care, gaming, habitat monitoring, and industrial process control. However, these edge devices are resource-limited embedded systems that require a low-cost, low-power, and high-performance encryption/decryption solution to prevent attacks such as eavesdropping, message modification, and impersonation. This paper proposes a field-programmable gate array (FPGA) based design and implementation of the Advanced Encryption Standard (AES) algorithm for encryption and decryption using a parallel-pipeline architecture with a data forwarding mechanism that efficiently utilizes on-chip memory modules and massive parallel processing units to support a high throughput rate. Hardware designs that optimize the implementation of the AES algorithm are proposed to minimize resource allocation and maximize throughput. These designs are shown to outperform existing solutions in the literature. Additionally, a rapid prototype of a complete system-on-chip (SoC) solution that employs the proposed design on a configurable platform has been developed and proven to be suitable for real-time applications.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132054607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings of the 17th ACM International Conference on Computing Frontiers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1