Pub Date : 2024-11-06DOI: 10.1109/TCAD.2024.3445810
Patrick Schmid;Paul Palomero Bernardo;Christoph Gerum;Oliver Bringmann
In this article, we rethink the dataflow processing paradigm to a higher level of abstraction to automate the generation of multi-instance compute and memory platforms with interfaces to I/O devices (sensors and actuators). Since the different compute instances (NPUs, CPUs, DSPs, etc.) and I/O devices do not necessarily have compatible interfaces on a dataflow level, an automated translation is required. However, in multidimensional dataflow scenarios, it becomes inherently difficult to reason about buffer sizes and iteration order without knowing the shape of the data access pattern (DAP) that the dataflow follows. To capture this shape and the platform composition, we define a domain-specific representation (DSR) and devise a toolchain to generate a synthesizable platform, including appropriate streaming buffers for platform-specific tensorization of the data between incompatible interfaces. This allows platforms, such as sensor edge AI devices, to be easily specified by simply focusing on the shape of the data provided by the sensors and transmitted among compute units, giving the ability to evaluate and generate different dataflow design alternatives with significantly reduced design time.
{"title":"GOURD: Tensorizing Streaming Applications to Generate Multi-Instance Compute Platforms","authors":"Patrick Schmid;Paul Palomero Bernardo;Christoph Gerum;Oliver Bringmann","doi":"10.1109/TCAD.2024.3445810","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3445810","url":null,"abstract":"In this article, we rethink the dataflow processing paradigm to a higher level of abstraction to automate the generation of multi-instance compute and memory platforms with interfaces to I/O devices (sensors and actuators). Since the different compute instances (NPUs, CPUs, DSPs, etc.) and I/O devices do not necessarily have compatible interfaces on a dataflow level, an automated translation is required. However, in multidimensional dataflow scenarios, it becomes inherently difficult to reason about buffer sizes and iteration order without knowing the shape of the data access pattern (DAP) that the dataflow follows. To capture this shape and the platform composition, we define a domain-specific representation (DSR) and devise a toolchain to generate a synthesizable platform, including appropriate streaming buffers for platform-specific tensorization of the data between incompatible interfaces. This allows platforms, such as sensor edge AI devices, to be easily specified by simply focusing on the shape of the data provided by the sensors and transmitted among compute units, giving the ability to evaluate and generate different dataflow design alternatives with significantly reduced design time.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4166-4177"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745814","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1109/TCAD.2024.3447209
Muhammad Abdullah Hanif;Ayoub Arous;Muhammad Shafique
Due to the significance and broad utilization of adders in computing systems, the design of low-power approximate adders (LPAAs) has received a significant amount of attention from the system design community. However, the selection and deployment of appropriate approximate modules require a thorough design space exploration, which is (in general) an extremely time-consuming process. Toward reducing the exploration time, different error estimation techniques have been proposed in the literature for evaluating the quality metrics of approximate adders. However, most of them are based on certain assumptions that limit the usability of such techniques for real-world settings. In this work, we highlight the impact of those assumptions on the quality of error estimates provided by the state-of-the-art techniques and how they limit the use of such techniques for real-world settings. Moreover, we highlight the significance of considering input data characteristics to improve the quality of error estimation. Based on our analysis, we propose a systematic data-driven error estimation methodology, DREAMx, for adders composed of cascaded approximate units, which covers a predominant set of LPAAs. DREAMx in principle factors in the dependence between input bits based on the given input distribution to compute the probability mass function (PMF) of error value at the output of an approximate adder. It achieves improved results compared to the state-of-the-art techniques while offering a substantial decrease in the overall execution(/exploration) time compared to exhaustive simulations. Our results further show that there exists a delicate tradeoff between the achievable quality of error estimates and the overall execution time.
{"title":"DREAMx: A Data-Driven Error Estimation Methodology for Adders Composed of Cascaded Approximate Units","authors":"Muhammad Abdullah Hanif;Ayoub Arous;Muhammad Shafique","doi":"10.1109/TCAD.2024.3447209","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447209","url":null,"abstract":"Due to the significance and broad utilization of adders in computing systems, the design of low-power approximate adders (LPAAs) has received a significant amount of attention from the system design community. However, the selection and deployment of appropriate approximate modules require a thorough design space exploration, which is (in general) an extremely time-consuming process. Toward reducing the exploration time, different error estimation techniques have been proposed in the literature for evaluating the quality metrics of approximate adders. However, most of them are based on certain assumptions that limit the usability of such techniques for real-world settings. In this work, we highlight the impact of those assumptions on the quality of error estimates provided by the state-of-the-art techniques and how they limit the use of such techniques for real-world settings. Moreover, we highlight the significance of considering input data characteristics to improve the quality of error estimation. Based on our analysis, we propose a systematic data-driven error estimation methodology, DREAMx, for adders composed of cascaded approximate units, which covers a predominant set of LPAAs. DREAMx in principle factors in the dependence between input bits based on the given input distribution to compute the probability mass function (PMF) of error value at the output of an approximate adder. It achieves improved results compared to the state-of-the-art techniques while offering a substantial decrease in the overall execution(/exploration) time compared to exhaustive simulations. Our results further show that there exists a delicate tradeoff between the achievable quality of error estimates and the overall execution time.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3348-3357"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1109/TCAD.2024.3436548
Alex Bendrick;Nora Sperling;Rolf Ernst
Vehicle-to-everything (V2X) roadmaps envision future applications that require the reliable exchange of large sensor data over a wireless network in real time. Applications include sensor fusion for cooperative perception or remote vehicle control that are subject to stringent real-time and safety constraints. Real-time requirements result from end-to-end latency constraints, while reliability refers to the quest for loss-free sensor data transfer to reach maximum application quality. In wireless networks, both requirements are in conflict, because of the need for error correction. Notably, the established video coding standards are not suitable for this task, as demonstrated in experiments. This article shows that middleware-based backward error correction (BEC) in combination with application controlled selective data transmission is far more effective for this purpose. The mechanisms proposed in this article use application and context knowledge to dynamically adapt the data object volume at high error rates at sustained application resilience. We evaluate popular camera datasets and perception pipelines from the automotive domain and apply two complementary strategies. The results and comparisons show that this approach has great benefits, far beyond the state of the art. It also shows that there is no single strategy that outperforms the other in all use cases.
{"title":"Large Data Transfer Optimization for Improved Robustness in Real-Time V2X-Communication","authors":"Alex Bendrick;Nora Sperling;Rolf Ernst","doi":"10.1109/TCAD.2024.3436548","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3436548","url":null,"abstract":"Vehicle-to-everything (V2X) roadmaps envision future applications that require the reliable exchange of large sensor data over a wireless network in real time. Applications include sensor fusion for cooperative perception or remote vehicle control that are subject to stringent real-time and safety constraints. Real-time requirements result from end-to-end latency constraints, while reliability refers to the quest for loss-free sensor data transfer to reach maximum application quality. In wireless networks, both requirements are in conflict, because of the need for error correction. Notably, the established video coding standards are not suitable for this task, as demonstrated in experiments. This article shows that middleware-based backward error correction (BEC) in combination with application controlled selective data transmission is far more effective for this purpose. The mechanisms proposed in this article use application and context knowledge to dynamically adapt the data object volume at high error rates at sustained application resilience. We evaluate popular camera datasets and perception pipelines from the automotive domain and apply two complementary strategies. The results and comparisons show that this approach has great benefits, far beyond the state of the art. It also shows that there is no single strategy that outperforms the other in all use cases.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3515-3526"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1109/TCAD.2024.3443698
Yen-Hsun Chen;Ting-En Liao;Li-Pin Chang
Energy harvesting enables long-running sensing applications on tiny Internet of Things (IoT) devices without a battery installed. To overcome the intermittency of ambient energy sources, system software creates intermittent computation using checkpoints. While the scope of intermittent computation is quickly expanding, there is a strong demand for data storage and local data processing in such IoT devices. When considering data storage options, flash memory is more compelling than other types of nonvolatile memory due to its affordability and availability. We introduce iFKVS, a flash-based key-value store for multisensor IoT devices. In this study, we aim at supporting efficient key-value operations while guaranteeing the correctness of program execution across power interruptions. For indexing of multidimensional sensor data, we propose a quadtree-based structure for the minimization of extra writes from splitting and rebalancing; for checkpointing in flash storage, we propose a rollback-based algorithm that exploits the capabilities of byte-level writing and one-way bit flipping of flash memory. Experimental results based on a real energy-driven testbed demonstrate that with the same index structure design, our rollback-based approach obtains a significant reduction of 45% and 84% in the total execution time compared with checkpointing using write-ahead logging (WAL) and copying on write (COW), respectively.
{"title":"iFKVS: Lightweight Key-Value Store for Flash-Based Intermittently Computing Devices","authors":"Yen-Hsun Chen;Ting-En Liao;Li-Pin Chang","doi":"10.1109/TCAD.2024.3443698","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443698","url":null,"abstract":"Energy harvesting enables long-running sensing applications on tiny Internet of Things (IoT) devices without a battery installed. To overcome the intermittency of ambient energy sources, system software creates intermittent computation using checkpoints. While the scope of intermittent computation is quickly expanding, there is a strong demand for data storage and local data processing in such IoT devices. When considering data storage options, flash memory is more compelling than other types of nonvolatile memory due to its affordability and availability. We introduce iFKVS, a flash-based key-value store for multisensor IoT devices. In this study, we aim at supporting efficient key-value operations while guaranteeing the correctness of program execution across power interruptions. For indexing of multidimensional sensor data, we propose a quadtree-based structure for the minimization of extra writes from splitting and rebalancing; for checkpointing in flash storage, we propose a rollback-based algorithm that exploits the capabilities of byte-level writing and one-way bit flipping of flash memory. Experimental results based on a real energy-driven testbed demonstrate that with the same index structure design, our rollback-based approach obtains a significant reduction of 45% and 84% in the total execution time compared with checkpointing using write-ahead logging (WAL) and copying on write (COW), respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3564-3575"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural networks have emerged as powerful tools across various domains, exhibiting remarkable empirical performance that motivated their widespread adoption in safety-critical applications, which, in turn, necessitates rigorous formal verification techniques to ensure their reliability and robustness. Tight bound propagation plays a crucial role in the formal verification process by providing precise bounds that can be used to formulate and verify properties, such as safety, robustness, and fairness. While state-of-the-art tools use linear and convex approximations to compute upper/lower bounds for each neuron’s outputs, recent advances have shown that nonlinear approximations based on Bernstein polynomials lead to tighter bounds but suffer from scalability issues. To that end, this article introduces BERN-NN-IBF, a significant enhancement of the Bernstein-polynomial-based bound propagation algorithms. BERN-NN-IBF offers three main contributions: 1) a memory-efficient encoding of Bernstein polynomials to scale the bound propagation algorithms; 2) optimized tensor operations for the new polynomial encoding to maintain the integrity of the bounds while enhancing computational efficiency; and 3) tighter under-approximations of the ReLU activation function using quadratic polynomials tailored to minimize approximation errors. Through comprehensive testing, we demonstrate that BERN-NN-IBF achieves tighter bounds and higher computational efficiency compared to the original BERN-NN and state-of-the-art methods, including linear and convex programming used within the winner of the VNN-COMPETITION.
{"title":"BERN-NN-IBF: Enhancing Neural Network Bound Propagation Through Implicit Bernstein Form and Optimized Tensor Operations","authors":"Wael Fatnassi;Arthur Feeney;Valen Yamamoto;Aparna Chandramowlishwaran;Yasser Shoukry","doi":"10.1109/TCAD.2024.3447577","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447577","url":null,"abstract":"Neural networks have emerged as powerful tools across various domains, exhibiting remarkable empirical performance that motivated their widespread adoption in safety-critical applications, which, in turn, necessitates rigorous formal verification techniques to ensure their reliability and robustness. Tight bound propagation plays a crucial role in the formal verification process by providing precise bounds that can be used to formulate and verify properties, such as safety, robustness, and fairness. While state-of-the-art tools use linear and convex approximations to compute upper/lower bounds for each neuron’s outputs, recent advances have shown that nonlinear approximations based on Bernstein polynomials lead to tighter bounds but suffer from scalability issues. To that end, this article introduces BERN-NN-IBF, a significant enhancement of the Bernstein-polynomial-based bound propagation algorithms. BERN-NN-IBF offers three main contributions: 1) a memory-efficient encoding of Bernstein polynomials to scale the bound propagation algorithms; 2) optimized tensor operations for the new polynomial encoding to maintain the integrity of the bounds while enhancing computational efficiency; and 3) tighter under-approximations of the ReLU activation function using quadratic polynomials tailored to minimize approximation errors. Through comprehensive testing, we demonstrate that BERN-NN-IBF achieves tighter bounds and higher computational efficiency compared to the original BERN-NN and state-of-the-art methods, including linear and convex programming used within the winner of the VNN-COMPETITION.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4334-4345"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}