Kevin Kim;Katherine Parry;David Harris;Cedar Turek;Alessandro Maiuolo;Rose Thompson;James Stine
Division, square root, and remainder are fundamental operations required by most computer systems. Floating-point and integer operations are commonly performed on separate datapaths. This paper presents the first detailed implementation of a shared recurrence unit that supports floating-point division/square root and integer division/remainder. It supports early termination and shares the normalization shifter needed for integer and subnormal inputs. Synthesis results show that shared double-precision dividers producing at least 4 bits per cycle are 9 - 18% smaller and 3 - 16% faster than separate integer and floating-point units.
{"title":"Shared Recurrence Floating-Point Divide/Sqrt and Integer Divide/Remainder With Early Termination","authors":"Kevin Kim;Katherine Parry;David Harris;Cedar Turek;Alessandro Maiuolo;Rose Thompson;James Stine","doi":"10.1109/TC.2024.3500380","DOIUrl":"https://doi.org/10.1109/TC.2024.3500380","url":null,"abstract":"Division, square root, and remainder are fundamental operations required by most computer systems. Floating-point and integer operations are commonly performed on separate datapaths. This paper presents the first detailed implementation of a shared recurrence unit that supports floating-point division/square root and integer division/remainder. It supports early termination and shares the normalization shifter needed for integer and subnormal inputs. Synthesis results show that shared double-precision dividers producing at least 4 bits per cycle are 9 - 18% smaller and 3 - 16% faster than separate integer and floating-point units.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"740-748"},"PeriodicalIF":3.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Angione;Paolo Bernardi;Nicola di Gruttola Giardino;Gabriele Filipponi;Claudia Bertani;Vincenzo Tancorre
This paper deals with functional System-Level Test (SLT) for System-on-Chips (SoCs) communication peripherals. The proposed methodology is based on analyzing the potential weaknesses of applied structural tests such as Scan-based. Then, the paper illustrates how to develop a functional SLT programs software suite to address such issues. In case the communication peripheral provides detection/correction features, the methodology proposes the design of a hardware companion module to be added to the Automatic Test Equipment (ATE) to interact with the SoC communication module by purposely corrupting data frames. Experimental results are obtained on an industrial, automotive SoC produced by STMicroelectronics focusing on the Controller Area Network (CAN) communication peripheral and showing the effectiveness of the SLT suite to complement structural tests.
{"title":"A System-Level Test Methodology for Communication Peripherals in System-on-Chips","authors":"Francesco Angione;Paolo Bernardi;Nicola di Gruttola Giardino;Gabriele Filipponi;Claudia Bertani;Vincenzo Tancorre","doi":"10.1109/TC.2024.3500375","DOIUrl":"https://doi.org/10.1109/TC.2024.3500375","url":null,"abstract":"This paper deals with functional System-Level Test (SLT) for System-on-Chips (SoCs) communication peripherals. The proposed methodology is based on analyzing the potential weaknesses of applied structural tests such as Scan-based. Then, the paper illustrates how to develop a functional SLT programs software suite to address such issues. In case the communication peripheral provides detection/correction features, the methodology proposes the design of a hardware companion module to be added to the Automatic Test Equipment (ATE) to interact with the SoC communication module by purposely corrupting data frames. Experimental results are obtained on an industrial, automotive SoC produced by STMicroelectronics focusing on the Controller Area Network (CAN) communication peripheral and showing the effectiveness of the SLT suite to complement structural tests.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"731-739"},"PeriodicalIF":3.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10755212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arne Symons;Linyan Mei;Steven Colleman;Pouya Houshmand;Sebastian Karl;Marian Verhelst
As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far, these systems exploit the increased parallelism by coarsely mapping a single layer at a time across cores, which incurs frequent costly off-chip memory accesses, or by pipelining batches of inputs, which falls short in meeting the demands of latency-critical applications. To alleviate these bottlenecks, this work explores a new fine-grain mapping paradigm, referred to as layer fusion, on heterogeneous dataflow accelerators through a novel design space exploration framework called Stream