首页 > 最新文献

2018 19th International Symposium on Quality Electronic Design (ISQED)最新文献

英文 中文
Body-biasing assisted vmin optimization for 5nm-node multi-Vt FD-SOI 6T-SRAM 5nm节点多vt FD-SOI 6T-SRAM体偏置辅助vmin优化
Pub Date : 2018-05-09 DOI: 10.1109/ISQED.2018.8357280
Jheng-Yi Chen, Ming-Yu Chang, Shi-Hao Chen, Jia-Wei Lee, M. Chiang
This work proposes a body-biasing technique to optimize Vmin of the 6T-SRAM based on 5nm-node multi-Vt FD-SOI devices. Accounting for the process variation, the operating voltage, Vmin, is estimated at 6-sigma yield. By properly selecting the back bias, the lowest Vmin is achieved for each of the three operation modes: high-performance, standard and low-voltage modes. In high-performance mode, the optimized Vmin is reduced to 0.491 V at back bias of 0.2 V. The proposed technique offers a design flexibility for optimizing the SRAM performance and yield by adjusting the back bias without complicated process technology requirements.
本文提出了一种基于5nm节点多vt FD-SOI器件的6T-SRAM Vmin优化的体偏置技术。考虑到工艺变化,工作电压Vmin估计为6西格玛产量。通过正确选择反向偏置,可以在高性能、标准和低压三种工作模式中实现最低的Vmin。在高性能模式下,在0.2 V的背偏置下,优化后的Vmin降至0.491 V。该技术为优化SRAM性能和良率提供了设计灵活性,通过调整背偏,而无需复杂的工艺技术要求。
{"title":"Body-biasing assisted vmin optimization for 5nm-node multi-Vt FD-SOI 6T-SRAM","authors":"Jheng-Yi Chen, Ming-Yu Chang, Shi-Hao Chen, Jia-Wei Lee, M. Chiang","doi":"10.1109/ISQED.2018.8357280","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357280","url":null,"abstract":"This work proposes a body-biasing technique to optimize Vmin of the 6T-SRAM based on 5nm-node multi-Vt FD-SOI devices. Accounting for the process variation, the operating voltage, Vmin, is estimated at 6-sigma yield. By properly selecting the back bias, the lowest Vmin is achieved for each of the three operation modes: high-performance, standard and low-voltage modes. In high-performance mode, the optimized Vmin is reduced to 0.491 V at back bias of 0.2 V. The proposed technique offers a design flexibility for optimizing the SRAM performance and yield by adjusting the back bias without complicated process technology requirements.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115216267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid-comp: A criticality-aware compressed last-level cache Hybrid-comp:一个临界感知压缩的最后一级缓存
Pub Date : 2018-05-09 DOI: 10.1109/ISQED.2018.8357260
A. Jadidi, M. Arjomand, M. Kandemir, C. Das
Cache compression is a promising technique to increase on-chip cache capacity and to decrease off-chip bandwidth usage. While prior compression techniques always consider a trade-off between compression ratio and decompression latency, they are oblivious to the variation in criticality of different cache blocks. In multi-core processors, last-level cache (LLC) is logically shared but physically distributed among cores. In this work, we demonstrate that, cache blocks within such nonuniform architecture exhibit different sensitivity to the access latency. Owing to this behavior, we propose a criticality-aware compressed LLC that favors lower latency over higher capacity based on the criticality of the data blocks. Based on our studies on a 16-core processor with 4MB LLC, our proposed criticality-aware mechanism improves the system performance comparable to that of with an 8MB uncompressed LLC.
高速缓存压缩是一种很有前途的技术,可以增加片上高速缓存容量,减少片外带宽的使用。虽然以前的压缩技术总是考虑在压缩比和缓压缩延迟之间进行权衡,但它们忽略了不同缓存块的临界性的变化。在多核处理器中,最后一级缓存(LLC)在逻辑上是共享的,但在物理上分布在核之间。在这项工作中,我们证明了在这种非统一架构中的缓存块对访问延迟表现出不同的敏感性。由于这种行为,我们提出了一种临界感知的压缩LLC,基于数据块的临界性,它倾向于更低的延迟而不是更高的容量。基于我们对具有4MB LLC的16核处理器的研究,我们提出的临界感知机制提高了与具有8MB未压缩LLC的系统性能相当的系统性能。
{"title":"Hybrid-comp: A criticality-aware compressed last-level cache","authors":"A. Jadidi, M. Arjomand, M. Kandemir, C. Das","doi":"10.1109/ISQED.2018.8357260","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357260","url":null,"abstract":"Cache compression is a promising technique to increase on-chip cache capacity and to decrease off-chip bandwidth usage. While prior compression techniques always consider a trade-off between compression ratio and decompression latency, they are oblivious to the variation in criticality of different cache blocks. In multi-core processors, last-level cache (LLC) is logically shared but physically distributed among cores. In this work, we demonstrate that, cache blocks within such nonuniform architecture exhibit different sensitivity to the access latency. Owing to this behavior, we propose a criticality-aware compressed LLC that favors lower latency over higher capacity based on the criticality of the data blocks. Based on our studies on a 16-core processor with 4MB LLC, our proposed criticality-aware mechanism improves the system performance comparable to that of with an 8MB uncompressed LLC.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124906371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
PDA-HyPAR: Path-diversity-aware hybrid planar adaptive routing algorithm for 3D NoCs 基于路径分集感知的三维noc混合平面自适应路由算法
Pub Date : 2018-05-09 DOI: 10.1109/ISQED.2018.8357277
Jindun Dai, Renjie Li, Xin Jiang, Takahiro Watanabe
3D Network-on-Chips (NoCs) is an efficient solution to multi-core communications. The routing algorithm has become a critical challenge for higher performance of NoCs. Performance of traditional methods based on the turn models degrades when the network gets saturated. To improve network stability after saturation, in this paper, a novel deadlock-free Path-Diversity-Aware Hybrid Planar Adaptive Routing (PDA-HyPAR) algorithm without using virtual channels is proposed. In this method, different routing rules are exploited in different XY-planes. And planar adaptive routing strategy is proposed to balance the network loads. We analyze path diversity theoretically and utilize path-diversity-aware selection strategy properly. Experimental results show that PDA-HyPAR is effective even if network load becomes heavy.
三维片上网络(noc)是一种有效的多核通信解决方案。路由算法已成为提高网络网络性能的一个关键挑战。当网络饱和时,基于转向模型的传统方法的性能下降。为了提高饱和后网络的稳定性,本文提出了一种不使用虚拟信道的无死锁感知路径分集混合平面自适应路由(PDA-HyPAR)算法。这种方法在不同的xy平面上采用了不同的路由规则。提出了平面自适应路由策略来平衡网络负载。从理论上对路径多样性进行了分析,合理运用路径多样性感知选择策略。实验结果表明,即使网络负载变大,PDA-HyPAR仍然是有效的。
{"title":"PDA-HyPAR: Path-diversity-aware hybrid planar adaptive routing algorithm for 3D NoCs","authors":"Jindun Dai, Renjie Li, Xin Jiang, Takahiro Watanabe","doi":"10.1109/ISQED.2018.8357277","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357277","url":null,"abstract":"3D Network-on-Chips (NoCs) is an efficient solution to multi-core communications. The routing algorithm has become a critical challenge for higher performance of NoCs. Performance of traditional methods based on the turn models degrades when the network gets saturated. To improve network stability after saturation, in this paper, a novel deadlock-free Path-Diversity-Aware Hybrid Planar Adaptive Routing (PDA-HyPAR) algorithm without using virtual channels is proposed. In this method, different routing rules are exploited in different XY-planes. And planar adaptive routing strategy is proposed to balance the network loads. We analyze path diversity theoretically and utilize path-diversity-aware selection strategy properly. Experimental results show that PDA-HyPAR is effective even if network load becomes heavy.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121344780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low power latch based design with smart retiming 基于低功耗锁存器的智能重新定时设计
Pub Date : 2018-05-09 DOI: 10.1109/ISQED.2018.8357308
K. Singh, Hailong Jiao, J. Huisken, H. Fatemi, J. P. D. Gyvez
Flip-flops and latches are two options to construct pipelines in digital integrated circuits (ICs). In this paper, the implications for converting a flip-flop based design to a latch-based design are investigated by performing timing and power analysis. Design flows are also proposed to convert a flip-flop based design to a latch-based design as well as a latch/flip-flop-mixed design. With a new retiming strategy, the optimum operating condition is identified for both the latch based design and the mixed design, where the maximum time borrowing or performance enhancement can be obtained. Compared to the flip-flop based design, 48% and 45% frequency boosting are achieved by the latch based design and the mixed design, respectively. While maintaining the same performance as the flip-flop based design with the aid of supply voltage scaling, the latch based design and the mixed design reduce the power consumption by 21% and 16%, respectively, in an industrial 28-nm FDSOI CMOS technology.
触发器和锁存器是数字集成电路(ic)中构建管道的两种选择。在本文中,通过执行时序和功率分析,研究了将基于触发器的设计转换为基于锁存器的设计的含义。还提出了将基于触发器的设计转换为基于锁存器的设计以及锁存器/触发器混合设计的设计流程。采用一种新的重定时策略,确定了基于锁存器设计和混合设计的最优工作状态,从而获得最大的时间借用或性能提升。与基于触发器的设计相比,基于锁存器的设计和混合设计分别实现了48%和45%的频率提升。在保持与基于触发器的设计相同的性能的同时,借助电源电压缩放,基于锁存器的设计和混合设计在工业28纳米FDSOI CMOS技术中分别降低了21%和16%的功耗。
{"title":"Low power latch based design with smart retiming","authors":"K. Singh, Hailong Jiao, J. Huisken, H. Fatemi, J. P. D. Gyvez","doi":"10.1109/ISQED.2018.8357308","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357308","url":null,"abstract":"Flip-flops and latches are two options to construct pipelines in digital integrated circuits (ICs). In this paper, the implications for converting a flip-flop based design to a latch-based design are investigated by performing timing and power analysis. Design flows are also proposed to convert a flip-flop based design to a latch-based design as well as a latch/flip-flop-mixed design. With a new retiming strategy, the optimum operating condition is identified for both the latch based design and the mixed design, where the maximum time borrowing or performance enhancement can be obtained. Compared to the flip-flop based design, 48% and 45% frequency boosting are achieved by the latch based design and the mixed design, respectively. While maintaining the same performance as the flip-flop based design with the aid of supply voltage scaling, the latch based design and the mixed design reduce the power consumption by 21% and 16%, respectively, in an industrial 28-nm FDSOI CMOS technology.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"360 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134228431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A loop structure optimization targeting high-level synthesis of fast number theoretic transform 以快速数论变换为目标的高阶综合环结构优化
Pub Date : 2018-05-09 DOI: 10.1109/ISQED.2018.8357273
Kazushi Kawamura, M. Yanagisawa, N. Togawa
Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.
在处理全同态加密加密后的数据时,大量使用大位数的乘法运算,这是计算时间的瓶颈。一种利用快速数论变换(FNTT)的算法被称为高速乘法算法,通过在FPGA上实现FNTT过程,有望进一步加快速度。一个高级的综合工具可以实现高效的硬件实现,即使对于具有大量点的FNTT也是如此。在本文中,我们提出了一种优化FNTT软件描述中包含的循环结构的方法,以使合成FNTT处理器的性能最大化。从回路平坦化和减少行程数两方面考虑了回路结构优化。我们在FPGA上实现了一个65,536点的FNTT处理器,并证明了它的执行速度比CPU上的执行速度快6.9倍。
{"title":"A loop structure optimization targeting high-level synthesis of fast number theoretic transform","authors":"Kazushi Kawamura, M. Yanagisawa, N. Togawa","doi":"10.1109/ISQED.2018.8357273","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357273","url":null,"abstract":"Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121997492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A 4-PAM interconnect in network-on-chip for high-throughput and latency-sensitive applications 片上网络中的4-PAM互连,用于高吞吐量和对延迟敏感的应用
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357274
Ahmad Mansour, Ahmed El-Naggar, Bassma Al-Abassy, Mostafa Khamis, A. Shalaby
In this paper, a network-on-chip four-level pulse amplitude modulation (4-PAM) scheme is proposed to be used for communication within the network itself in MPSoCs. A current-mode based 4-PAM transmitter is used to encode data transactions between neighboring routers. Decoding data streams is done by a flash-ADC based receiver using clocked latched type comparators. Additionally, this scheme is implemented on networks utilizing high-radix routers with a local concentration factor of 2 IPs per node to encode data streams injected into the network at the network interface and decode them at the input port of the router. We also discuss the required modifications to the router architecture in the input port buffers and introduce a two-stage allocation method to resolve conflicts of output port requests which is essential to maintain system stability after saturation by utilizing a fair flow control methodology. This results in a reduction in wiring load for each router which is an added value that facilitates the routing stage. The evaluation is extended to reflect the overall network performance supporting the use of multi-valued logic and estimate the overhead of implementation on area and power budgets.
本文提出了一种用于mpsoc网络内部通信的片上网络四电平脉冲幅度调制(4-PAM)方案。基于电流模式的4-PAM发送器用于对相邻路由器之间的数据事务进行编码。解码数据流由基于闪存adc的接收器使用时钟锁存型比较器完成。此外,该方案还在使用高基数路由器的网络上实现,该路由器的本地集中系数为每节点2个ip,通过网络接口对注入网络的数据流进行编码,并在路由器的输入端口进行解码。我们还讨论了输入端口缓冲区中路由器架构的必要修改,并介绍了一种两阶段分配方法来解决输出端口请求的冲突,这对于利用公平的流量控制方法在饱和后保持系统稳定性至关重要。这将减少每个路由器的布线负载,这是一个附加价值,有利于路由阶段。该评估扩展到反映支持使用多值逻辑的整体网络性能,并估计在面积和功率预算上实现的开销。
{"title":"A 4-PAM interconnect in network-on-chip for high-throughput and latency-sensitive applications","authors":"Ahmad Mansour, Ahmed El-Naggar, Bassma Al-Abassy, Mostafa Khamis, A. Shalaby","doi":"10.1109/ISQED.2018.8357274","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357274","url":null,"abstract":"In this paper, a network-on-chip four-level pulse amplitude modulation (4-PAM) scheme is proposed to be used for communication within the network itself in MPSoCs. A current-mode based 4-PAM transmitter is used to encode data transactions between neighboring routers. Decoding data streams is done by a flash-ADC based receiver using clocked latched type comparators. Additionally, this scheme is implemented on networks utilizing high-radix routers with a local concentration factor of 2 IPs per node to encode data streams injected into the network at the network interface and decode them at the input port of the router. We also discuss the required modifications to the router architecture in the input port buffers and introduce a two-stage allocation method to resolve conflicts of output port requests which is essential to maintain system stability after saturation by utilizing a fair flow control methodology. This results in a reduction in wiring load for each router which is an added value that facilitates the routing stage. The evaluation is extended to reflect the overall network performance supporting the use of multi-valued logic and estimate the overhead of implementation on area and power budgets.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"15 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120852562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Program acceleration using nearest distance associative search 程序加速使用最近距离关联搜索
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357263
M. Imani, Daniel Peroni, T. Simunic
Data generated by current computing systems is rapidly increasing as they become more interconnected as part of the Internet of Things (IoT). The growing amount of generated data, such as multimedia, needs to be accelerated using efficient massive parallel processors. Associative memories, in tandem with processing elements, in the form of look-up tables, can reduce energy consumption by eliminating redundant computations. In this paper, we propose a resistive associative unit, called RAU, which approximately performs basic computations with significantly higher efficiency compared to traditional processing units. RAU stores high frequency patterns corresponding to each operation and then retrieves the nearest distance row to the input data as an approximate output. In order to avoid using a large and energy intensive RAU, our design adaptively detects inputs with lower frequency and assigns them to precise cores to process. For each application, our design is able to adjust the ratio of data processed between RAU and precise cores to ensure computational accuracy. We consider the application of RAU on an AMD Southern Island GPU, a recent GPGPU architecture. Our experimental evaluation shows that GPGPU enhanced with RAU can achieve 61% average energy savings, and 2.2× speedup over eight diverse OpenCL applications, while ensuring acceptable quality of computation. The energy-delay product improvement of enhanced GPGPU is 5.7× and 2.8× higher compared to conventional and state-of-the-art approximate GPGPU, respectively.
随着当前计算系统作为物联网(IoT)的一部分变得更加互联,它们产生的数据正在迅速增加。越来越多的生成数据(如多媒体)需要使用高效的大规模并行处理器来加速。以查找表的形式与处理元素相结合的联想存储器可以通过消除冗余计算来减少能耗。在本文中,我们提出了一种称为RAU的电阻联想单元,与传统处理单元相比,它可以以显着更高的效率近似执行基本计算。RAU存储对应于每个操作的高频模式,然后检索距离输入数据最近的行作为近似输出。为了避免使用大型和能源密集型的RAU,我们的设计自适应地检测频率较低的输入,并将其分配给精确的核心进行处理。对于每个应用程序,我们的设计能够调整RAU和精确核心之间处理的数据比例,以确保计算精度。我们考虑RAU在AMD Southern Island GPU上的应用,这是一种最新的GPGPU架构。我们的实验评估表明,经过RAU增强的GPGPU在8种不同的OpenCL应用程序中可以实现61%的平均节能和2.2倍的加速,同时确保可接受的计算质量。增强型GPGPU的能量延迟积比传统GPGPU和最先进的近似GPGPU分别提高了5.7倍和2.8倍。
{"title":"Program acceleration using nearest distance associative search","authors":"M. Imani, Daniel Peroni, T. Simunic","doi":"10.1109/ISQED.2018.8357263","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357263","url":null,"abstract":"Data generated by current computing systems is rapidly increasing as they become more interconnected as part of the Internet of Things (IoT). The growing amount of generated data, such as multimedia, needs to be accelerated using efficient massive parallel processors. Associative memories, in tandem with processing elements, in the form of look-up tables, can reduce energy consumption by eliminating redundant computations. In this paper, we propose a resistive associative unit, called RAU, which approximately performs basic computations with significantly higher efficiency compared to traditional processing units. RAU stores high frequency patterns corresponding to each operation and then retrieves the nearest distance row to the input data as an approximate output. In order to avoid using a large and energy intensive RAU, our design adaptively detects inputs with lower frequency and assigns them to precise cores to process. For each application, our design is able to adjust the ratio of data processed between RAU and precise cores to ensure computational accuracy. We consider the application of RAU on an AMD Southern Island GPU, a recent GPGPU architecture. Our experimental evaluation shows that GPGPU enhanced with RAU can achieve 61% average energy savings, and 2.2× speedup over eight diverse OpenCL applications, while ensuring acceptable quality of computation. The energy-delay product improvement of enhanced GPGPU is 5.7× and 2.8× higher compared to conventional and state-of-the-art approximate GPGPU, respectively.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123545840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Optimizing energy in a DRAM based hybrid cache 基于DRAM的混合高速缓存的能量优化
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357262
Jiacong He, Joseph Callenes-Sloan
The die-stacking DRAM cache can be used to increase bandwidth and reduce latency compared with conventional DRAM memory. However, energy becomes an inevitable challenge with the increasing size of DRAM cache. STT-RAM with near-zero leakage can be integrated with DRAM cache as a hybrid cache to reduce static energy, but the high write energy of STT-RAM brings another energy challenge. In this paper, we propose a tri-regional hybrid cache that can exploit the advantage of both DRAM and STT-RAM technologies. The asymmetric data access policy is introduced based on the non-uniform read/write property of the different hybrid cache regions. We also propose a prediction table that can reduce the searching energy of the hybrid cache. The results show that our hybrid cache reduces energy by 26% and improves performance by 11% on average compared with previous work.
与传统的DRAM存储器相比,该芯片可以增加带宽并减少延迟。然而,随着DRAM高速缓存的不断增大,能量成为一个不可避免的挑战。几乎为零泄漏的STT-RAM可以与DRAM缓存集成作为混合缓存来降低静态能量,但STT-RAM的高写能量带来了另一个能量挑战。在本文中,我们提出了一种三区域混合缓存,可以利用DRAM和STT-RAM技术的优势。根据不同混合缓存区域的非均匀读写特性,引入了非对称数据访问策略。我们还提出了一种预测表,可以减少混合缓存的搜索能量。结果表明,与以前的工作相比,我们的混合缓存减少了26%的能量,平均提高了11%的性能。
{"title":"Optimizing energy in a DRAM based hybrid cache","authors":"Jiacong He, Joseph Callenes-Sloan","doi":"10.1109/ISQED.2018.8357262","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357262","url":null,"abstract":"The die-stacking DRAM cache can be used to increase bandwidth and reduce latency compared with conventional DRAM memory. However, energy becomes an inevitable challenge with the increasing size of DRAM cache. STT-RAM with near-zero leakage can be integrated with DRAM cache as a hybrid cache to reduce static energy, but the high write energy of STT-RAM brings another energy challenge. In this paper, we propose a tri-regional hybrid cache that can exploit the advantage of both DRAM and STT-RAM technologies. The asymmetric data access policy is introduced based on the non-uniform read/write property of the different hybrid cache regions. We also propose a prediction table that can reduce the searching energy of the hybrid cache. The results show that our hybrid cache reduces energy by 26% and improves performance by 11% on average compared with previous work.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116814219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hierarchical dynamic goal management for IoT systems 物联网系统的分层动态目标管理
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357315
A. Jantsch, A. Anzanpour, H. Kholerdi, I. Azimi, L. Siafara, A. Rahmani, N. Taherinejad, P. Liljeberg, N. Dutt
As the Internet of Things (IoT) penetrates ever more application domains, many IoT-based systems are increasingly becoming more complex, versatile and resource-rich, and need to serve one or more applications with diverse and changing goals. These systems face new challenges in dynamic goal management due to a combination of limited shared resources, and multiple goals that may not only conflict with each other, but which may also change dynamically. We motivate the need for hierarchical, dynamic goal management for this class of complex IoT systems and substantiate our arguments with case studies from two application domains: patient health monitoring and Cyber-Physical Production Systems (CPPSs).
随着物联网(IoT)渗透到越来越多的应用领域,许多基于物联网的系统变得越来越复杂、多用途和资源丰富,并且需要服务于一个或多个具有多样化和不断变化的目标的应用。这些系统由于共享资源有限,且多个目标不仅相互冲突,而且还可能发生动态变化,因此在动态目标管理方面面临新的挑战。我们激发了对这类复杂物联网系统的分层、动态目标管理的需求,并通过两个应用领域的案例研究证实了我们的论点:患者健康监测和网络物理生产系统(CPPSs)。
{"title":"Hierarchical dynamic goal management for IoT systems","authors":"A. Jantsch, A. Anzanpour, H. Kholerdi, I. Azimi, L. Siafara, A. Rahmani, N. Taherinejad, P. Liljeberg, N. Dutt","doi":"10.1109/ISQED.2018.8357315","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357315","url":null,"abstract":"As the Internet of Things (IoT) penetrates ever more application domains, many IoT-based systems are increasingly becoming more complex, versatile and resource-rich, and need to serve one or more applications with diverse and changing goals. These systems face new challenges in dynamic goal management due to a combination of limited shared resources, and multiple goals that may not only conflict with each other, but which may also change dynamically. We motivate the need for hierarchical, dynamic goal management for this class of complex IoT systems and substantiate our arguments with case studies from two application domains: patient health monitoring and Cyber-Physical Production Systems (CPPSs).","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126817218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Concolic testing of SystemC designs SystemC设计的整体测试
Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357256
Bin Lin, Kai Cong, Zhenkun Yang, Z. Liao, T. Zhan, Christopher Havlicek, Fei Xie
SystemC is a system-level modelling language widely used in the semiconductor industry. SystemC validation is both necessary and important, since undetected bugs may propagate to final silicon products, which can be extremely expensive and dangerous. However, it is challenging to validate SystemC designs due to their heavy usage of object-oriented features, event-driven simulation semantics, and inherent concurrency. In this paper, we present CTSC, an automated, easy-to-deploy, scalable, and effective binary-level concolic testing framework for SystemC designs. We have implemented CTSC and applied it to an open source SystemC benchmark. In our extensive experiments, the CTSC-generated test cases achieved high code coverage, triggered 14 assertions, and found two severe bugs. In addition, the experiments on two designs with more than 2K lines of SystemC code show that our approach scales to designs of practical sizes.
SystemC是一种广泛应用于半导体行业的系统级建模语言。系统验证既必要又重要,因为未检测到的错误可能会传播到最终的硅产品中,这可能是非常昂贵和危险的。然而,验证SystemC设计是具有挑战性的,因为它们大量使用面向对象的特性、事件驱动的模拟语义和固有的并发性。在本文中,我们提出了CTSC,一个自动化的、易于部署的、可扩展的、有效的用于SystemC设计的二进制级集合测试框架。我们已经实现了CTSC,并将其应用于一个开源的SystemC基准测试。在我们广泛的实验中,ctsc生成的测试用例实现了高代码覆盖率,触发了14个断言,并发现了两个严重的错误。此外,在两个超过2K行的SystemC代码设计上的实验表明,我们的方法适用于实际尺寸的设计。
{"title":"Concolic testing of SystemC designs","authors":"Bin Lin, Kai Cong, Zhenkun Yang, Z. Liao, T. Zhan, Christopher Havlicek, Fei Xie","doi":"10.1109/ISQED.2018.8357256","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357256","url":null,"abstract":"SystemC is a system-level modelling language widely used in the semiconductor industry. SystemC validation is both necessary and important, since undetected bugs may propagate to final silicon products, which can be extremely expensive and dangerous. However, it is challenging to validate SystemC designs due to their heavy usage of object-oriented features, event-driven simulation semantics, and inherent concurrency. In this paper, we present CTSC, an automated, easy-to-deploy, scalable, and effective binary-level concolic testing framework for SystemC designs. We have implemented CTSC and applied it to an open source SystemC benchmark. In our extensive experiments, the CTSC-generated test cases achieved high code coverage, triggered 14 assertions, and found two severe bugs. In addition, the experiments on two designs with more than 2K lines of SystemC code show that our approach scales to designs of practical sizes.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132169260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2018 19th International Symposium on Quality Electronic Design (ISQED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1