Pub Date : 2018-05-09DOI: 10.1109/ISQED.2018.8357280
Jheng-Yi Chen, Ming-Yu Chang, Shi-Hao Chen, Jia-Wei Lee, M. Chiang
This work proposes a body-biasing technique to optimize Vmin of the 6T-SRAM based on 5nm-node multi-Vt FD-SOI devices. Accounting for the process variation, the operating voltage, Vmin, is estimated at 6-sigma yield. By properly selecting the back bias, the lowest Vmin is achieved for each of the three operation modes: high-performance, standard and low-voltage modes. In high-performance mode, the optimized Vmin is reduced to 0.491 V at back bias of 0.2 V. The proposed technique offers a design flexibility for optimizing the SRAM performance and yield by adjusting the back bias without complicated process technology requirements.
{"title":"Body-biasing assisted vmin optimization for 5nm-node multi-Vt FD-SOI 6T-SRAM","authors":"Jheng-Yi Chen, Ming-Yu Chang, Shi-Hao Chen, Jia-Wei Lee, M. Chiang","doi":"10.1109/ISQED.2018.8357280","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357280","url":null,"abstract":"This work proposes a body-biasing technique to optimize Vmin of the 6T-SRAM based on 5nm-node multi-Vt FD-SOI devices. Accounting for the process variation, the operating voltage, Vmin, is estimated at 6-sigma yield. By properly selecting the back bias, the lowest Vmin is achieved for each of the three operation modes: high-performance, standard and low-voltage modes. In high-performance mode, the optimized Vmin is reduced to 0.491 V at back bias of 0.2 V. The proposed technique offers a design flexibility for optimizing the SRAM performance and yield by adjusting the back bias without complicated process technology requirements.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115216267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-09DOI: 10.1109/ISQED.2018.8357260
A. Jadidi, M. Arjomand, M. Kandemir, C. Das
Cache compression is a promising technique to increase on-chip cache capacity and to decrease off-chip bandwidth usage. While prior compression techniques always consider a trade-off between compression ratio and decompression latency, they are oblivious to the variation in criticality of different cache blocks. In multi-core processors, last-level cache (LLC) is logically shared but physically distributed among cores. In this work, we demonstrate that, cache blocks within such nonuniform architecture exhibit different sensitivity to the access latency. Owing to this behavior, we propose a criticality-aware compressed LLC that favors lower latency over higher capacity based on the criticality of the data blocks. Based on our studies on a 16-core processor with 4MB LLC, our proposed criticality-aware mechanism improves the system performance comparable to that of with an 8MB uncompressed LLC.
{"title":"Hybrid-comp: A criticality-aware compressed last-level cache","authors":"A. Jadidi, M. Arjomand, M. Kandemir, C. Das","doi":"10.1109/ISQED.2018.8357260","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357260","url":null,"abstract":"Cache compression is a promising technique to increase on-chip cache capacity and to decrease off-chip bandwidth usage. While prior compression techniques always consider a trade-off between compression ratio and decompression latency, they are oblivious to the variation in criticality of different cache blocks. In multi-core processors, last-level cache (LLC) is logically shared but physically distributed among cores. In this work, we demonstrate that, cache blocks within such nonuniform architecture exhibit different sensitivity to the access latency. Owing to this behavior, we propose a criticality-aware compressed LLC that favors lower latency over higher capacity based on the criticality of the data blocks. Based on our studies on a 16-core processor with 4MB LLC, our proposed criticality-aware mechanism improves the system performance comparable to that of with an 8MB uncompressed LLC.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124906371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D Network-on-Chips (NoCs) is an efficient solution to multi-core communications. The routing algorithm has become a critical challenge for higher performance of NoCs. Performance of traditional methods based on the turn models degrades when the network gets saturated. To improve network stability after saturation, in this paper, a novel deadlock-free Path-Diversity-Aware Hybrid Planar Adaptive Routing (PDA-HyPAR) algorithm without using virtual channels is proposed. In this method, different routing rules are exploited in different XY-planes. And planar adaptive routing strategy is proposed to balance the network loads. We analyze path diversity theoretically and utilize path-diversity-aware selection strategy properly. Experimental results show that PDA-HyPAR is effective even if network load becomes heavy.
{"title":"PDA-HyPAR: Path-diversity-aware hybrid planar adaptive routing algorithm for 3D NoCs","authors":"Jindun Dai, Renjie Li, Xin Jiang, Takahiro Watanabe","doi":"10.1109/ISQED.2018.8357277","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357277","url":null,"abstract":"3D Network-on-Chips (NoCs) is an efficient solution to multi-core communications. The routing algorithm has become a critical challenge for higher performance of NoCs. Performance of traditional methods based on the turn models degrades when the network gets saturated. To improve network stability after saturation, in this paper, a novel deadlock-free Path-Diversity-Aware Hybrid Planar Adaptive Routing (PDA-HyPAR) algorithm without using virtual channels is proposed. In this method, different routing rules are exploited in different XY-planes. And planar adaptive routing strategy is proposed to balance the network loads. We analyze path diversity theoretically and utilize path-diversity-aware selection strategy properly. Experimental results show that PDA-HyPAR is effective even if network load becomes heavy.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121344780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-09DOI: 10.1109/ISQED.2018.8357308
K. Singh, Hailong Jiao, J. Huisken, H. Fatemi, J. P. D. Gyvez
Flip-flops and latches are two options to construct pipelines in digital integrated circuits (ICs). In this paper, the implications for converting a flip-flop based design to a latch-based design are investigated by performing timing and power analysis. Design flows are also proposed to convert a flip-flop based design to a latch-based design as well as a latch/flip-flop-mixed design. With a new retiming strategy, the optimum operating condition is identified for both the latch based design and the mixed design, where the maximum time borrowing or performance enhancement can be obtained. Compared to the flip-flop based design, 48% and 45% frequency boosting are achieved by the latch based design and the mixed design, respectively. While maintaining the same performance as the flip-flop based design with the aid of supply voltage scaling, the latch based design and the mixed design reduce the power consumption by 21% and 16%, respectively, in an industrial 28-nm FDSOI CMOS technology.
{"title":"Low power latch based design with smart retiming","authors":"K. Singh, Hailong Jiao, J. Huisken, H. Fatemi, J. P. D. Gyvez","doi":"10.1109/ISQED.2018.8357308","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357308","url":null,"abstract":"Flip-flops and latches are two options to construct pipelines in digital integrated circuits (ICs). In this paper, the implications for converting a flip-flop based design to a latch-based design are investigated by performing timing and power analysis. Design flows are also proposed to convert a flip-flop based design to a latch-based design as well as a latch/flip-flop-mixed design. With a new retiming strategy, the optimum operating condition is identified for both the latch based design and the mixed design, where the maximum time borrowing or performance enhancement can be obtained. Compared to the flip-flop based design, 48% and 45% frequency boosting are achieved by the latch based design and the mixed design, respectively. While maintaining the same performance as the flip-flop based design with the aid of supply voltage scaling, the latch based design and the mixed design reduce the power consumption by 21% and 16%, respectively, in an industrial 28-nm FDSOI CMOS technology.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"360 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134228431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-09DOI: 10.1109/ISQED.2018.8357273
Kazushi Kawamura, M. Yanagisawa, N. Togawa
Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.
{"title":"A loop structure optimization targeting high-level synthesis of fast number theoretic transform","authors":"Kazushi Kawamura, M. Yanagisawa, N. Togawa","doi":"10.1109/ISQED.2018.8357273","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357273","url":null,"abstract":"Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121997492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-13DOI: 10.1109/ISQED.2018.8357274
Ahmad Mansour, Ahmed El-Naggar, Bassma Al-Abassy, Mostafa Khamis, A. Shalaby
In this paper, a network-on-chip four-level pulse amplitude modulation (4-PAM) scheme is proposed to be used for communication within the network itself in MPSoCs. A current-mode based 4-PAM transmitter is used to encode data transactions between neighboring routers. Decoding data streams is done by a flash-ADC based receiver using clocked latched type comparators. Additionally, this scheme is implemented on networks utilizing high-radix routers with a local concentration factor of 2 IPs per node to encode data streams injected into the network at the network interface and decode them at the input port of the router. We also discuss the required modifications to the router architecture in the input port buffers and introduce a two-stage allocation method to resolve conflicts of output port requests which is essential to maintain system stability after saturation by utilizing a fair flow control methodology. This results in a reduction in wiring load for each router which is an added value that facilitates the routing stage. The evaluation is extended to reflect the overall network performance supporting the use of multi-valued logic and estimate the overhead of implementation on area and power budgets.
{"title":"A 4-PAM interconnect in network-on-chip for high-throughput and latency-sensitive applications","authors":"Ahmad Mansour, Ahmed El-Naggar, Bassma Al-Abassy, Mostafa Khamis, A. Shalaby","doi":"10.1109/ISQED.2018.8357274","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357274","url":null,"abstract":"In this paper, a network-on-chip four-level pulse amplitude modulation (4-PAM) scheme is proposed to be used for communication within the network itself in MPSoCs. A current-mode based 4-PAM transmitter is used to encode data transactions between neighboring routers. Decoding data streams is done by a flash-ADC based receiver using clocked latched type comparators. Additionally, this scheme is implemented on networks utilizing high-radix routers with a local concentration factor of 2 IPs per node to encode data streams injected into the network at the network interface and decode them at the input port of the router. We also discuss the required modifications to the router architecture in the input port buffers and introduce a two-stage allocation method to resolve conflicts of output port requests which is essential to maintain system stability after saturation by utilizing a fair flow control methodology. This results in a reduction in wiring load for each router which is an added value that facilitates the routing stage. The evaluation is extended to reflect the overall network performance supporting the use of multi-valued logic and estimate the overhead of implementation on area and power budgets.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"15 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120852562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-13DOI: 10.1109/ISQED.2018.8357263
M. Imani, Daniel Peroni, T. Simunic
Data generated by current computing systems is rapidly increasing as they become more interconnected as part of the Internet of Things (IoT). The growing amount of generated data, such as multimedia, needs to be accelerated using efficient massive parallel processors. Associative memories, in tandem with processing elements, in the form of look-up tables, can reduce energy consumption by eliminating redundant computations. In this paper, we propose a resistive associative unit, called RAU, which approximately performs basic computations with significantly higher efficiency compared to traditional processing units. RAU stores high frequency patterns corresponding to each operation and then retrieves the nearest distance row to the input data as an approximate output. In order to avoid using a large and energy intensive RAU, our design adaptively detects inputs with lower frequency and assigns them to precise cores to process. For each application, our design is able to adjust the ratio of data processed between RAU and precise cores to ensure computational accuracy. We consider the application of RAU on an AMD Southern Island GPU, a recent GPGPU architecture. Our experimental evaluation shows that GPGPU enhanced with RAU can achieve 61% average energy savings, and 2.2× speedup over eight diverse OpenCL applications, while ensuring acceptable quality of computation. The energy-delay product improvement of enhanced GPGPU is 5.7× and 2.8× higher compared to conventional and state-of-the-art approximate GPGPU, respectively.
随着当前计算系统作为物联网(IoT)的一部分变得更加互联,它们产生的数据正在迅速增加。越来越多的生成数据(如多媒体)需要使用高效的大规模并行处理器来加速。以查找表的形式与处理元素相结合的联想存储器可以通过消除冗余计算来减少能耗。在本文中,我们提出了一种称为RAU的电阻联想单元,与传统处理单元相比,它可以以显着更高的效率近似执行基本计算。RAU存储对应于每个操作的高频模式,然后检索距离输入数据最近的行作为近似输出。为了避免使用大型和能源密集型的RAU,我们的设计自适应地检测频率较低的输入,并将其分配给精确的核心进行处理。对于每个应用程序,我们的设计能够调整RAU和精确核心之间处理的数据比例,以确保计算精度。我们考虑RAU在AMD Southern Island GPU上的应用,这是一种最新的GPGPU架构。我们的实验评估表明,经过RAU增强的GPGPU在8种不同的OpenCL应用程序中可以实现61%的平均节能和2.2倍的加速,同时确保可接受的计算质量。增强型GPGPU的能量延迟积比传统GPGPU和最先进的近似GPGPU分别提高了5.7倍和2.8倍。
{"title":"Program acceleration using nearest distance associative search","authors":"M. Imani, Daniel Peroni, T. Simunic","doi":"10.1109/ISQED.2018.8357263","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357263","url":null,"abstract":"Data generated by current computing systems is rapidly increasing as they become more interconnected as part of the Internet of Things (IoT). The growing amount of generated data, such as multimedia, needs to be accelerated using efficient massive parallel processors. Associative memories, in tandem with processing elements, in the form of look-up tables, can reduce energy consumption by eliminating redundant computations. In this paper, we propose a resistive associative unit, called RAU, which approximately performs basic computations with significantly higher efficiency compared to traditional processing units. RAU stores high frequency patterns corresponding to each operation and then retrieves the nearest distance row to the input data as an approximate output. In order to avoid using a large and energy intensive RAU, our design adaptively detects inputs with lower frequency and assigns them to precise cores to process. For each application, our design is able to adjust the ratio of data processed between RAU and precise cores to ensure computational accuracy. We consider the application of RAU on an AMD Southern Island GPU, a recent GPGPU architecture. Our experimental evaluation shows that GPGPU enhanced with RAU can achieve 61% average energy savings, and 2.2× speedup over eight diverse OpenCL applications, while ensuring acceptable quality of computation. The energy-delay product improvement of enhanced GPGPU is 5.7× and 2.8× higher compared to conventional and state-of-the-art approximate GPGPU, respectively.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123545840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-13DOI: 10.1109/ISQED.2018.8357262
Jiacong He, Joseph Callenes-Sloan
The die-stacking DRAM cache can be used to increase bandwidth and reduce latency compared with conventional DRAM memory. However, energy becomes an inevitable challenge with the increasing size of DRAM cache. STT-RAM with near-zero leakage can be integrated with DRAM cache as a hybrid cache to reduce static energy, but the high write energy of STT-RAM brings another energy challenge. In this paper, we propose a tri-regional hybrid cache that can exploit the advantage of both DRAM and STT-RAM technologies. The asymmetric data access policy is introduced based on the non-uniform read/write property of the different hybrid cache regions. We also propose a prediction table that can reduce the searching energy of the hybrid cache. The results show that our hybrid cache reduces energy by 26% and improves performance by 11% on average compared with previous work.
{"title":"Optimizing energy in a DRAM based hybrid cache","authors":"Jiacong He, Joseph Callenes-Sloan","doi":"10.1109/ISQED.2018.8357262","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357262","url":null,"abstract":"The die-stacking DRAM cache can be used to increase bandwidth and reduce latency compared with conventional DRAM memory. However, energy becomes an inevitable challenge with the increasing size of DRAM cache. STT-RAM with near-zero leakage can be integrated with DRAM cache as a hybrid cache to reduce static energy, but the high write energy of STT-RAM brings another energy challenge. In this paper, we propose a tri-regional hybrid cache that can exploit the advantage of both DRAM and STT-RAM technologies. The asymmetric data access policy is introduced based on the non-uniform read/write property of the different hybrid cache regions. We also propose a prediction table that can reduce the searching energy of the hybrid cache. The results show that our hybrid cache reduces energy by 26% and improves performance by 11% on average compared with previous work.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116814219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-13DOI: 10.1109/ISQED.2018.8357315
A. Jantsch, A. Anzanpour, H. Kholerdi, I. Azimi, L. Siafara, A. Rahmani, N. Taherinejad, P. Liljeberg, N. Dutt
As the Internet of Things (IoT) penetrates ever more application domains, many IoT-based systems are increasingly becoming more complex, versatile and resource-rich, and need to serve one or more applications with diverse and changing goals. These systems face new challenges in dynamic goal management due to a combination of limited shared resources, and multiple goals that may not only conflict with each other, but which may also change dynamically. We motivate the need for hierarchical, dynamic goal management for this class of complex IoT systems and substantiate our arguments with case studies from two application domains: patient health monitoring and Cyber-Physical Production Systems (CPPSs).
{"title":"Hierarchical dynamic goal management for IoT systems","authors":"A. Jantsch, A. Anzanpour, H. Kholerdi, I. Azimi, L. Siafara, A. Rahmani, N. Taherinejad, P. Liljeberg, N. Dutt","doi":"10.1109/ISQED.2018.8357315","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357315","url":null,"abstract":"As the Internet of Things (IoT) penetrates ever more application domains, many IoT-based systems are increasingly becoming more complex, versatile and resource-rich, and need to serve one or more applications with diverse and changing goals. These systems face new challenges in dynamic goal management due to a combination of limited shared resources, and multiple goals that may not only conflict with each other, but which may also change dynamically. We motivate the need for hierarchical, dynamic goal management for this class of complex IoT systems and substantiate our arguments with case studies from two application domains: patient health monitoring and Cyber-Physical Production Systems (CPPSs).","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126817218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-03-13DOI: 10.1109/ISQED.2018.8357256
Bin Lin, Kai Cong, Zhenkun Yang, Z. Liao, T. Zhan, Christopher Havlicek, Fei Xie
SystemC is a system-level modelling language widely used in the semiconductor industry. SystemC validation is both necessary and important, since undetected bugs may propagate to final silicon products, which can be extremely expensive and dangerous. However, it is challenging to validate SystemC designs due to their heavy usage of object-oriented features, event-driven simulation semantics, and inherent concurrency. In this paper, we present CTSC, an automated, easy-to-deploy, scalable, and effective binary-level concolic testing framework for SystemC designs. We have implemented CTSC and applied it to an open source SystemC benchmark. In our extensive experiments, the CTSC-generated test cases achieved high code coverage, triggered 14 assertions, and found two severe bugs. In addition, the experiments on two designs with more than 2K lines of SystemC code show that our approach scales to designs of practical sizes.
{"title":"Concolic testing of SystemC designs","authors":"Bin Lin, Kai Cong, Zhenkun Yang, Z. Liao, T. Zhan, Christopher Havlicek, Fei Xie","doi":"10.1109/ISQED.2018.8357256","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357256","url":null,"abstract":"SystemC is a system-level modelling language widely used in the semiconductor industry. SystemC validation is both necessary and important, since undetected bugs may propagate to final silicon products, which can be extremely expensive and dangerous. However, it is challenging to validate SystemC designs due to their heavy usage of object-oriented features, event-driven simulation semantics, and inherent concurrency. In this paper, we present CTSC, an automated, easy-to-deploy, scalable, and effective binary-level concolic testing framework for SystemC designs. We have implemented CTSC and applied it to an open source SystemC benchmark. In our extensive experiments, the CTSC-generated test cases achieved high code coverage, triggered 14 assertions, and found two severe bugs. In addition, the experiments on two designs with more than 2K lines of SystemC code show that our approach scales to designs of practical sizes.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132169260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}