首页 > 最新文献

2021 IEEE 39th International Conference on Computer Design (ICCD)最新文献

英文 中文
Special Session: How much quality is enough quality? A case for acceptability in approximate designs 专题讨论:多少质量才算足够的质量?近似设计的可接受性
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00013
Isaías B. Felzmann, João Fabrício Filho, Juliane Regina de Oliveira, L. Wanner
Approximate systems are designed to offer improved efficiency with potentially reduced quality of results. Quality of output in these systems is typically quantified in comparison to a precise result using metrics such as RMSE, MAE, PSNR, or application-specific metrics such as structural similarity of images (SSIM). Furthermore, systems are typically designed to maximize efficiency for a given minimum quality requirement. It is often difficult to determine what this quality requirement should be for an application, let alone a system. Thus, a fixed quality requirement may be overly conservative, and leave optimization opportunities on the table. In this work, we present a different approach to evaluate approximate systems based on the usefulness of results instead of quality. Our method qualitatively determines the acceptability of approximate results within different processing pipelines. To demonstrate the method, we implement three image and signal processing applications featuring scenarios of image classification, image recognition, and frequency estimation. Our results show that designing approximate systems to guarantee acceptability can produce up to 20% more valid results than the conservative quality thresholds commonly adopted in the literature, allowing for higher error rates and, consequently, lower energy cost.
近似系统旨在提高效率,但可能会降低结果的质量。这些系统中的输出质量通常是通过使用RMSE、MAE、PSNR等指标或特定于应用程序的指标(如图像的结构相似性(SSIM))与精确结果相比较来量化的。此外,系统通常被设计为在给定的最低质量要求下实现效率最大化。通常很难确定应用程序的质量需求应该是什么,更不用说系统了。因此,固定的质量需求可能会过于保守,并留下优化机会。在这项工作中,我们提出了一种基于结果的有用性而不是质量来评估近似系统的不同方法。我们的方法定性地决定了近似结果在不同处理管道中的可接受性。为了演示该方法,我们实现了三个图像和信号处理应用,包括图像分类、图像识别和频率估计。我们的研究结果表明,设计近似系统以保证可接受性可以产生比文献中通常采用的保守质量阈值多20%的有效结果,允许更高的错误率,从而降低能源成本。
{"title":"Special Session: How much quality is enough quality? A case for acceptability in approximate designs","authors":"Isaías B. Felzmann, João Fabrício Filho, Juliane Regina de Oliveira, L. Wanner","doi":"10.1109/ICCD53106.2021.00013","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00013","url":null,"abstract":"Approximate systems are designed to offer improved efficiency with potentially reduced quality of results. Quality of output in these systems is typically quantified in comparison to a precise result using metrics such as RMSE, MAE, PSNR, or application-specific metrics such as structural similarity of images (SSIM). Furthermore, systems are typically designed to maximize efficiency for a given minimum quality requirement. It is often difficult to determine what this quality requirement should be for an application, let alone a system. Thus, a fixed quality requirement may be overly conservative, and leave optimization opportunities on the table. In this work, we present a different approach to evaluate approximate systems based on the usefulness of results instead of quality. Our method qualitatively determines the acceptability of approximate results within different processing pipelines. To demonstrate the method, we implement three image and signal processing applications featuring scenarios of image classification, image recognition, and frequency estimation. Our results show that designing approximate systems to guarantee acceptability can produce up to 20% more valid results than the conservative quality thresholds commonly adopted in the literature, allowing for higher error rates and, consequently, lower energy cost.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129586708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Methods for SoC Trust Validation Using Information Flow Verification 基于信息流验证的SoC信任验证方法
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00098
Khitam M. Alatoun, Shanmukha Murali Achyutha, R. Vemuri
Information flow properties are essential to identify security vulnerabilities in System-on-Chip (SoC) designs. Verifying information flow properties, such as integrity and confidentiality, is challenging as these properties cannot be handled using traditional assertion-based verification techniques. This paper proposes two novel approaches, a universal method and a property-driven method, to verify and monitor information flow properties. Both methods can be used for formal verification, dynamic verification during simulation, post-fabrication validation, and run-time monitoring. The universal method expedites implementing the information flow model and has less complexity than the most recently published technique. The property-driven method reduces the overhead of the security model, which helps speed up the verification process and create an efficient run-time hardware monitor. More than 20 information flow properties from 5 different designs were verified and several bugs were identified. We show that the method is scalable for large systems by applying it to an SoC design based on an OpenRISC-1200 processor.
信息流属性对于识别片上系统(SoC)设计中的安全漏洞至关重要。验证信息流属性(如完整性和机密性)具有挑战性,因为这些属性不能使用传统的基于断言的验证技术来处理。本文提出了两种验证和监控信息流属性的新方法:通用方法和属性驱动方法。这两种方法都可以用于形式验证、仿真过程中的动态验证、制造后验证和运行时监控。通用方法加快了信息流模型的实现速度,并且比最新发表的技术具有更低的复杂性。属性驱动的方法减少了安全模型的开销,这有助于加快验证过程并创建高效的运行时硬件监视器。验证了来自5种不同设计的20多个信息流属性,并发现了几个错误。我们通过将该方法应用于基于OpenRISC-1200处理器的SoC设计,证明该方法可扩展到大型系统。
{"title":"Efficient Methods for SoC Trust Validation Using Information Flow Verification","authors":"Khitam M. Alatoun, Shanmukha Murali Achyutha, R. Vemuri","doi":"10.1109/ICCD53106.2021.00098","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00098","url":null,"abstract":"Information flow properties are essential to identify security vulnerabilities in System-on-Chip (SoC) designs. Verifying information flow properties, such as integrity and confidentiality, is challenging as these properties cannot be handled using traditional assertion-based verification techniques. This paper proposes two novel approaches, a universal method and a property-driven method, to verify and monitor information flow properties. Both methods can be used for formal verification, dynamic verification during simulation, post-fabrication validation, and run-time monitoring. The universal method expedites implementing the information flow model and has less complexity than the most recently published technique. The property-driven method reduces the overhead of the security model, which helps speed up the verification process and create an efficient run-time hardware monitor. More than 20 information flow properties from 5 different designs were verified and several bugs were identified. We show that the method is scalable for large systems by applying it to an SoC design based on an OpenRISC-1200 processor.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130280750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Differential Testing of x86 Instruction Decoders with Instruction Operand Inferring Algorithm 基于指令操作数推断算法的x86指令解码器差分测试
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00040
Guang Wang, Ziyuan Zhu, Shuan Li, Xu Cheng, Dan Meng
The instruction decoders are tools for software analysis, sandboxing, malware detection, and undocumented instructions detection. The decoders must be accurate and consistent with the instruction set architecture manuals. The existing testing methods for instruction decoders are based on random and instruction structure mutation. Moreover, the methods are mainly aimed at the legal instruction space. However, there is little research on whether the instructions in the reserved instruction space can be accurately identified as invalid instructions. We propose an instruction operand inferring algorithm, based on the depth-first search algorithm, to skip considerable redundant legal instruction space. The algorithm keeps the types of instructions in the legal instruction space unchanged and guarantees the traversal of the reserved instruction space. In addition, we propose a differential testing method that discovers decoding discrepancies between instruction decoders. We applied the method to XED and Capstone and found four million inconsistent instructions between them. Compared with the existing instruction generation method based on the depth-first search algorithm, the efficiency of our method is improved by about four times.
指令解码器是用于软件分析、沙箱、恶意软件检测和未记录指令检测的工具。解码器必须准确并与指令集架构手册一致。现有的指令解码器测试方法主要基于随机和指令结构突变。此外,这些方法主要针对法律教学空间。然而,对于保留指令空间中的指令能否准确识别为无效指令,目前的研究还很少。我们提出了一种基于深度优先搜索算法的指令操作数推断算法,以跳过大量冗余的合法指令空间。该算法保持合法指令空间中的指令类型不变,并保证保留指令空间的遍历。此外,我们提出了一种差分测试方法来发现指令解码器之间的解码差异。我们将该方法应用于XED和Capstone,发现它们之间有400万条不一致的指令。与现有的基于深度优先搜索算法的指令生成方法相比,该方法的效率提高了约4倍。
{"title":"Differential Testing of x86 Instruction Decoders with Instruction Operand Inferring Algorithm","authors":"Guang Wang, Ziyuan Zhu, Shuan Li, Xu Cheng, Dan Meng","doi":"10.1109/ICCD53106.2021.00040","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00040","url":null,"abstract":"The instruction decoders are tools for software analysis, sandboxing, malware detection, and undocumented instructions detection. The decoders must be accurate and consistent with the instruction set architecture manuals. The existing testing methods for instruction decoders are based on random and instruction structure mutation. Moreover, the methods are mainly aimed at the legal instruction space. However, there is little research on whether the instructions in the reserved instruction space can be accurately identified as invalid instructions. We propose an instruction operand inferring algorithm, based on the depth-first search algorithm, to skip considerable redundant legal instruction space. The algorithm keeps the types of instructions in the legal instruction space unchanged and guarantees the traversal of the reserved instruction space. In addition, we propose a differential testing method that discovers decoding discrepancies between instruction decoders. We applied the method to XED and Capstone and found four million inconsistent instructions between them. Compared with the existing instruction generation method based on the depth-first search algorithm, the efficiency of our method is improved by about four times.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128724085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Chopin: Composing Cost-Effective Custom Chips with Algorithmic Chiplets 肖邦:用算法小芯片组成具有成本效益的定制芯片
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00069
Pete Ehrett, Todd M. Austin, V. Bertacco
As computational demands rise, the need for specialized hardware has grown acute. However, the immense cost of fully-custom chips has forced many developers to rely on suboptimal solutions like FPGAs, especially for low- to mid-volume applications, in which multi-million-dollar non-recurring engineering (NRE) costs cannot be amortized effectively. We propose to address this problem by composing custom chips out of small, algorithmic chiplets, reusable across diverse designs, such that high NRE costs may be amortized across many different designs. This work models the economics of this paradigm and identifies a cost-optimal granularity for algorithmic chiplets, then demonstrates how those guidelines may be applied to design high-performance, algorithmically-composable hardware components – which may be reused, without modification, across many different processing pipelines. For an example phased-array radar accelerator, our chiplet-centric paradigm improves perf-per-$ by 9.3× over an FPGA, and ∼4× over a conventional ASIC.
随着计算需求的增加,对专用硬件的需求也日益迫切。然而,完全定制芯片的巨大成本迫使许多开发人员依赖于fpga等次优解决方案,特别是对于中小批量应用,数百万美元的非重复性工程(NRE)成本无法有效摊销。为了解决这个问题,我们建议用小的算法芯片组成定制芯片,在不同的设计中可重复使用,这样高的NRE成本可以在许多不同的设计中摊销。这项工作为这种范例的经济建模,并确定了算法小芯片的成本最优粒度,然后演示了如何将这些指导原则应用于设计高性能、算法可组合的硬件组件——这些组件可以在许多不同的处理管道中重用,而无需修改。以相控阵雷达加速器为例,我们以芯片为中心的范例比FPGA提高了9.3倍的per-$,比传统的ASIC提高了4倍。
{"title":"Chopin: Composing Cost-Effective Custom Chips with Algorithmic Chiplets","authors":"Pete Ehrett, Todd M. Austin, V. Bertacco","doi":"10.1109/ICCD53106.2021.00069","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00069","url":null,"abstract":"As computational demands rise, the need for specialized hardware has grown acute. However, the immense cost of fully-custom chips has forced many developers to rely on suboptimal solutions like FPGAs, especially for low- to mid-volume applications, in which multi-million-dollar non-recurring engineering (NRE) costs cannot be amortized effectively. We propose to address this problem by composing custom chips out of small, algorithmic chiplets, reusable across diverse designs, such that high NRE costs may be amortized across many different designs. This work models the economics of this paradigm and identifies a cost-optimal granularity for algorithmic chiplets, then demonstrates how those guidelines may be applied to design high-performance, algorithmically-composable hardware components – which may be reused, without modification, across many different processing pipelines. For an example phased-array radar accelerator, our chiplet-centric paradigm improves perf-per-$ by 9.3× over an FPGA, and ∼4× over a conventional ASIC.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123514086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Write-Optimized and Consistent RDMA-based Non-Volatile Main Memory Systems 基于写入优化和一致性rdma的非易失性主存系统
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00048
Xinxin Liu, Yu Hua, Xuan Li, Qifan Liu
To deliver high performance in cloud computing, many efforts leverage RDMA (Remote Direct Memory Access) in networking and NVMM (Non-Volatile Main Memory) in end systems. Due to no CPU involvement, one-sided RDMA becomes efficient to access the remote memory, and NVMM technologies have the strengths of non-volatility, byte-addressability and DRAM-like latency. However, due to the need to guarantee Remote Data Atomicity (RDA), the synergized scheme has to consume extra network round-trips, remote CPU participation and double NVMM writes. In order to address these problems, we propose a write-optimized log-structured NVMM design for Efficient Remote Data Atomicity, called Erda. In Erda, clients directly transfer data to the destination memory addresses in the logs on servers via one-sided RDMA writes without redundant copies and remote CPU consumption. To detect the atomicity of the fetched data, we verify a checksum without client-server coordination. We further ensure metadata consistency by leveraging an 8-byte atomic update in a hash table, which also contains the addresses of previous versions of data in the log for consistency. When a failure occurs, the server properly and efficiently restores to become consistent. Experimental results show that compared with state-of-the-art schemes, Erda reduces NVMM writes approximately by 50%, significantly improves throughput and decreases latency.
为了在云计算中提供高性能,许多工作在网络中利用RDMA(远程直接内存访问),在终端系统中利用NVMM(非易失性主内存)。由于不涉及CPU,单侧RDMA访问远程内存变得高效,NVMM技术具有非易失性、字节寻址性和类似dram的延迟的优势。然而,由于需要保证远程数据原子性(RDA),协同方案必须消耗额外的网络往返、远程CPU参与和双重NVMM写入。为了解决这些问题,我们提出了一种写优化的日志结构NVMM设计,用于高效远程数据原子性,称为Erda。在Erda中,客户端通过单侧RDMA写入直接将数据传输到服务器日志中的目标内存地址,而不需要冗余副本和远程CPU消耗。为了检测所获取数据的原子性,我们在没有客户机-服务器协调的情况下验证校验和。我们通过利用哈希表中的8字节原子更新进一步确保元数据的一致性,哈希表还包含日志中以前版本数据的地址,以保持一致性。当发生故障时,服务器可以正确有效地恢复到一致状态。实验结果表明,与最先进的方案相比,Erda将NVMM写入减少了大约50%,显着提高了吞吐量并降低了延迟。
{"title":"Write-Optimized and Consistent RDMA-based Non-Volatile Main Memory Systems","authors":"Xinxin Liu, Yu Hua, Xuan Li, Qifan Liu","doi":"10.1109/ICCD53106.2021.00048","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00048","url":null,"abstract":"To deliver high performance in cloud computing, many efforts leverage RDMA (Remote Direct Memory Access) in networking and NVMM (Non-Volatile Main Memory) in end systems. Due to no CPU involvement, one-sided RDMA becomes efficient to access the remote memory, and NVMM technologies have the strengths of non-volatility, byte-addressability and DRAM-like latency. However, due to the need to guarantee Remote Data Atomicity (RDA), the synergized scheme has to consume extra network round-trips, remote CPU participation and double NVMM writes. In order to address these problems, we propose a write-optimized log-structured NVMM design for Efficient Remote Data Atomicity, called Erda. In Erda, clients directly transfer data to the destination memory addresses in the logs on servers via one-sided RDMA writes without redundant copies and remote CPU consumption. To detect the atomicity of the fetched data, we verify a checksum without client-server coordination. We further ensure metadata consistency by leveraging an 8-byte atomic update in a hash table, which also contains the addresses of previous versions of data in the log for consistency. When a failure occurs, the server properly and efficiently restores to become consistent. Experimental results show that compared with state-of-the-art schemes, Erda reduces NVMM writes approximately by 50%, significantly improves throughput and decreases latency.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122456717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conciliating Speed and Efficiency on Cache Compressors 缓存压缩器的调和速度和效率
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00075
Daniel Rodrigues Carvalho, André Seznec
Cache compression algorithms must abide by hard-ware constraints; thus, their efficiency ends up being low, and most cache lines end up barely compressed. Moreover, schemes that compress relatively well often decompress slowly, and vice versa. This paper proposes a compression scheme achieving high (good) compaction ratio and fast decompression latency. The key observation is that by further subdividing the chunks of data being compressed one can tailor the algorithms. This concept is orthogonal to most existent compressors, and results in a reduction of their average compressed size. In particular, we leverage this concept to boost a single-cycle-decompression compressor to reach a compressibility level competitive to state-of-the-art proposals. When normalized against the best long decompression latency state-of-the-art compressors, the proposed ideas further enhance the average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.
缓存压缩算法必须遵守硬件约束;因此,它们的效率最终很低,并且大多数缓存行最终几乎没有被压缩。此外,压缩相对较好的方案通常解压较慢,反之亦然。本文提出了一种压缩方案,可以实现高(好的)压缩比和快速的解压缩延迟。关键的观察是,通过进一步细分被压缩的数据块,可以定制算法。这个概念是正交的大多数现有的压缩机,并导致减少他们的平均压缩尺寸。特别是,我们利用这一概念来提升单循环减压压缩机,使其可压缩性达到与最先进的方案相竞争的水平。当针对最佳的长解压延迟的最先进的压缩器进行规范化时,所提出的想法进一步提高了平均缓存容量2.7%(几何平均值),同时具有短的解压延迟。
{"title":"Conciliating Speed and Efficiency on Cache Compressors","authors":"Daniel Rodrigues Carvalho, André Seznec","doi":"10.1109/ICCD53106.2021.00075","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00075","url":null,"abstract":"Cache compression algorithms must abide by hard-ware constraints; thus, their efficiency ends up being low, and most cache lines end up barely compressed. Moreover, schemes that compress relatively well often decompress slowly, and vice versa. This paper proposes a compression scheme achieving high (good) compaction ratio and fast decompression latency. The key observation is that by further subdividing the chunks of data being compressed one can tailor the algorithms. This concept is orthogonal to most existent compressors, and results in a reduction of their average compressed size. In particular, we leverage this concept to boost a single-cycle-decompression compressor to reach a compressibility level competitive to state-of-the-art proposals. When normalized against the best long decompression latency state-of-the-art compressors, the proposed ideas further enhance the average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127799832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimal Transistor Placement Combined with Global In-cell Routing in Standard Cell Layout Synthesis 在标准单元布局合成中结合全局单元内布线的最佳晶体管放置
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00085
Kyeongrok Jo, Taewhan Kim
The synthesis of standard cell layouts is largely divided into two tasks namely transistor placement and in-cell routing. Since the result of transistor placement highly affects the quality of in-cell routing, it is crucial to accurately and efficiently predict in-cell routability during transistor placement. In this work, we address the problem of an optimal transistor placement combined with global in-cell routing with the primary objective of minimizing cell size and the secondary objective of minimizing wirelength for global in-cell routing. To this end, unlike the conventional indirect and complex SMT (satisfiability modulo theory) formulation, we propose a method of direct and efficient formulation of the original problem based on SMT. Through experiments, it is confirmed that our proposed method is able to produce minimal-area cell layouts with minimal wirelength for global in-cell routing while spending much less running time over the conventional optimal layout generator.
标准电池布局的综合主要分为两个任务,即晶体管放置和电池内布线。由于晶体管放置的结果对单元内路由的质量影响很大,因此在晶体管放置过程中准确有效地预测单元内路由是至关重要的。在这项工作中,我们解决了最佳晶体管放置与全局单元内路由相结合的问题,其主要目标是最小化单元尺寸,次要目标是最小化全局单元内路由的无线长度。为此,与传统的间接和复杂的SMT(可满足模理论)表述不同,我们提出了一种基于SMT的原始问题的直接和有效表述方法。通过实验证实,我们提出的方法能够以最小的无线长度为全局单元内路由产生最小面积的单元布局,同时比传统的最优布局生成器花费更少的运行时间。
{"title":"Optimal Transistor Placement Combined with Global In-cell Routing in Standard Cell Layout Synthesis","authors":"Kyeongrok Jo, Taewhan Kim","doi":"10.1109/ICCD53106.2021.00085","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00085","url":null,"abstract":"The synthesis of standard cell layouts is largely divided into two tasks namely transistor placement and in-cell routing. Since the result of transistor placement highly affects the quality of in-cell routing, it is crucial to accurately and efficiently predict in-cell routability during transistor placement. In this work, we address the problem of an optimal transistor placement combined with global in-cell routing with the primary objective of minimizing cell size and the secondary objective of minimizing wirelength for global in-cell routing. To this end, unlike the conventional indirect and complex SMT (satisfiability modulo theory) formulation, we propose a method of direct and efficient formulation of the original problem based on SMT. Through experiments, it is confirmed that our proposed method is able to produce minimal-area cell layouts with minimal wirelength for global in-cell routing while spending much less running time over the conventional optimal layout generator.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115726047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters CHARM: GPU数据中心的协作主机和加速器资源管理
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00056
Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, M. Guo
Emerging latency-critical (LC) services often have both CPU and GPU stages (e.g. DNN-assisted services) and require short response latency. Co-locating best-effort (BE) applications on the both CPU side and GPU side with the LC service improves resource utilization. However, resource contention often results in the QoS violation of LC services. We therefore present CHARM, a collaborative host-accelerator resource management system. CHARM ensures the required QoS target of DNN-assisted LC services, while maximizing the resource utilization of both the host and accelerator. CHARM is comprised of a BE-aware QoS target allocator, a unified heterogeneous resource manager, and a collaborative accelerator-side QoS compensator. The QoS target allocator determines the time limit of an LC service running on the host side and the accelerator side. The resource manager allocates the shared resources on both host side and accelerator side. The QoS compensator allocates more resources to the LC service to speed up its execution, if it runs slower than expected. Experimental results on an Nvidia GPU RTX 2080Ti show that CHARM improves the resource utilization by 43.2%, while ensuring the required QoS target compared with state-of-the-art solutions.
新兴的延迟关键型(LC)服务通常同时具有CPU和GPU两个阶段(例如dnn辅助服务),并且需要较短的响应延迟。使用LC服务在CPU端和GPU端同时配置best-effort (BE)应用程序可以提高资源利用率。然而,资源争用往往会导致LC服务的QoS冲突。因此,我们提出了CHARM,一个协作主机加速器资源管理系统。CHARM确保了dnn辅助LC服务所需的QoS目标,同时最大限度地提高了主机和加速器的资源利用率。CHARM由一个感知be的QoS目标分配器、一个统一的异构资源管理器和一个协作的加速器端QoS补偿器组成。QoS目标分配器决定LC服务在主机端和加速器端运行的时间限制。资源管理器在主机端和加速器端分配共享资源。如果LC服务的运行速度低于预期,QoS补偿器将为其分配更多的资源,以加快其执行速度。在Nvidia GPU RTX 2080Ti上的实验结果表明,与最先进的解决方案相比,CHARM将资源利用率提高了43.2%,同时确保了所需的QoS目标。
{"title":"CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters","authors":"Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, M. Guo","doi":"10.1109/ICCD53106.2021.00056","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00056","url":null,"abstract":"Emerging latency-critical (LC) services often have both CPU and GPU stages (e.g. DNN-assisted services) and require short response latency. Co-locating best-effort (BE) applications on the both CPU side and GPU side with the LC service improves resource utilization. However, resource contention often results in the QoS violation of LC services. We therefore present CHARM, a collaborative host-accelerator resource management system. CHARM ensures the required QoS target of DNN-assisted LC services, while maximizing the resource utilization of both the host and accelerator. CHARM is comprised of a BE-aware QoS target allocator, a unified heterogeneous resource manager, and a collaborative accelerator-side QoS compensator. The QoS target allocator determines the time limit of an LC service running on the host side and the accelerator side. The resource manager allocates the shared resources on both host side and accelerator side. The QoS compensator allocates more resources to the LC service to speed up its execution, if it runs slower than expected. Experimental results on an Nvidia GPU RTX 2080Ti show that CHARM improves the resource utilization by 43.2%, while ensuring the required QoS target compared with state-of-the-art solutions.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114906518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
HASDH: A Hotspot-Aware and Scalable Dynamic Hashing for Hybrid DRAM-NVM Memory hashh:用于混合ram - nvm内存的热点感知和可扩展动态哈希
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00034
Z. Li, Zhipeng Tan, Jianxi Chen
Intel Optane DC Persistent Memory Module (DCPMM) is the first commercially available non-volatile memory (NVM) product and can be directly placed on the processor’s memory bus along with DRAM to serve as a hybrid memory. Compared with DRAM, NVM has 3× read latency and similar write latency, while the read and write bandwidths of NVM are only 1/3rd and 1/6th of those of DRAM. However, existing hashing schemes fail to reap those performance characteristics. We propose HASDH, a hotspot-aware and scalable dynamic hashing built on the hybrid DRAM-NVM memory. HASDH maintains structure metadata (i.e., directory) in DRAM and persists key-value items in NVM. To reduce hot key-value items’ access cost, HASDH caches frequently-accessed key-value items in DRAM with a dedicated caching strategy. To achieve scalable performance for multicore machines, HASDH maintains locks in DRAM that avoid the extra NVM read-write bandwidth consumption caused by lock operations. Furthermore, HASDH chains all NVM segments using sibling pointers to the right neighbors to ensure crash consistency and leverages log-free NVM segment split to reduce logging overhead. On an 18-core machine with Intel Optane DCPMM, experimental results show that HASDH achieves 1.43∼7.39× speedup for insertions, 2.08~9.63× speedup for searches, and 1.78~3.01× speedup for deletions, compared with start-of-the-art NVM-based hashing indexes.
英特尔Optane DC Persistent Memory Module (DCPMM)是第一款商用非易失性内存(NVM)产品,可以直接与DRAM一起放置在处理器的内存总线上,作为混合内存。与DRAM相比,NVM的读时延是DRAM的3倍,写时延也差不多,而读写带宽仅为DRAM的1/3和1/6。然而,现有的散列方案无法获得这些性能特征。我们提出了一种基于混合DRAM-NVM内存的热点感知和可扩展的动态哈希算法HASDH。HASDH在DRAM中维护结构元数据(即目录),并在NVM中持久化键值项。为了降低热键值项的访问成本,HASDH使用专用缓存策略将频繁访问的键值项缓存到DRAM中。为了实现多核机器的可扩展性能,HASDH在DRAM中维护锁,避免了锁操作造成的额外NVM读写带宽消耗。此外,HASDH使用兄弟指针将所有NVM段链接到正确的邻居,以确保崩溃一致性,并利用无日志的NVM段分割来减少日志开销。在使用Intel Optane DCPMM的18核机器上,实验结果表明,与基于nvm的初始哈希索引相比,HASDH在插入方面实现了1.43 ~ 7.39倍的加速,在搜索方面实现了2.08~9.63倍的加速,在删除方面实现了1.78~3.01倍的加速。
{"title":"HASDH: A Hotspot-Aware and Scalable Dynamic Hashing for Hybrid DRAM-NVM Memory","authors":"Z. Li, Zhipeng Tan, Jianxi Chen","doi":"10.1109/ICCD53106.2021.00034","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00034","url":null,"abstract":"Intel Optane DC Persistent Memory Module (DCPMM) is the first commercially available non-volatile memory (NVM) product and can be directly placed on the processor’s memory bus along with DRAM to serve as a hybrid memory. Compared with DRAM, NVM has 3× read latency and similar write latency, while the read and write bandwidths of NVM are only 1/3rd and 1/6th of those of DRAM. However, existing hashing schemes fail to reap those performance characteristics. We propose HASDH, a hotspot-aware and scalable dynamic hashing built on the hybrid DRAM-NVM memory. HASDH maintains structure metadata (i.e., directory) in DRAM and persists key-value items in NVM. To reduce hot key-value items’ access cost, HASDH caches frequently-accessed key-value items in DRAM with a dedicated caching strategy. To achieve scalable performance for multicore machines, HASDH maintains locks in DRAM that avoid the extra NVM read-write bandwidth consumption caused by lock operations. Furthermore, HASDH chains all NVM segments using sibling pointers to the right neighbors to ensure crash consistency and leverages log-free NVM segment split to reduce logging overhead. On an 18-core machine with Intel Optane DCPMM, experimental results show that HASDH achieves 1.43∼7.39× speedup for insertions, 2.08~9.63× speedup for searches, and 1.78~3.01× speedup for deletions, compared with start-of-the-art NVM-based hashing indexes.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122179364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low-Cost Sequential Logic Circuit Design Considering Single Event Double-Node Upsets and Single Event Transients 考虑单事件双节点扰流和单事件瞬变的低成本顺序逻辑电路设计
Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00038
R. Rajaei, M. Niemier, X. Hu
As CMOS device sizes continue to scale down, radiation-related reliability issues are of ever-growing concern. Single event double node upsets (SEDUs) in sequential logic and single event transients (SETs) in combinational logic are sources of high rate radiation-induced soft errors that can affect the functionality of logic circuits. This paper presents effective circuit-level solutions for combating SEDUs/SETs in nanoscale sequential and combinational logic circuits. More specifically, we propose and evaluate low-power latch and flip-flop circuits to mitigate SEDUs and SETs. Simulations with a 22 nm PTM model reveal that the proposed circuits offer full immunity against SEDUs, can better filter SET pulses, and simultaneously reduce design overhead when compared to prior work. As a representative example, simulation-based studies show that our designs offer up to 77% improvements in delay-power-area product, and can filter out up to 58% wider SET pulses when compared to the state-of-the-art.
随着CMOS器件尺寸的不断缩小,与辐射相关的可靠性问题日益受到关注。顺序逻辑中的单事件双节点扰动(SEDUs)和组合逻辑中的单事件瞬变(SETs)是高速率辐射引起的软误差的来源,可以影响逻辑电路的功能。本文提出了在纳米级顺序和组合逻辑电路中对抗sedu / set的有效电路级解决方案。更具体地说,我们提出并评估了低功耗锁存器和触发器电路,以减轻sedu和set。利用22 nm PTM模型进行的仿真表明,与之前的工作相比,所提出的电路对sedu具有完全的抗扰性,可以更好地过滤SET脉冲,同时降低了设计开销。作为一个代表性的例子,基于仿真的研究表明,我们的设计在延迟功率面积产品方面提供了高达77%的改进,并且与最先进的相比,可以过滤出高达58%的宽SET脉冲。
{"title":"Low-Cost Sequential Logic Circuit Design Considering Single Event Double-Node Upsets and Single Event Transients","authors":"R. Rajaei, M. Niemier, X. Hu","doi":"10.1109/ICCD53106.2021.00038","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00038","url":null,"abstract":"As CMOS device sizes continue to scale down, radiation-related reliability issues are of ever-growing concern. Single event double node upsets (SEDUs) in sequential logic and single event transients (SETs) in combinational logic are sources of high rate radiation-induced soft errors that can affect the functionality of logic circuits. This paper presents effective circuit-level solutions for combating SEDUs/SETs in nanoscale sequential and combinational logic circuits. More specifically, we propose and evaluate low-power latch and flip-flop circuits to mitigate SEDUs and SETs. Simulations with a 22 nm PTM model reveal that the proposed circuits offer full immunity against SEDUs, can better filter SET pulses, and simultaneously reduce design overhead when compared to prior work. As a representative example, simulation-based studies show that our designs offer up to 77% improvements in delay-power-area product, and can filter out up to 58% wider SET pulses when compared to the state-of-the-art.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126481511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE 39th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1