{"title":"Application Specific Approximate Behavioral Processor","authors":"Qilin Si;Prattay Chowdhury;Rohit Sreekumar;Benjamin Carrion Schafer","doi":"10.1109/TSUSC.2022.3222117","DOIUrl":null,"url":null,"abstract":"Many applications require simple controllers that continuously run the same application. These applications are often found in battery operated embedded systems that require to be ultra-low power (ULP) and are very price sensitive. Some examples include IoT devices of different nature and medical devices. Currently, these systems rely on off-the-shelf general-purpose microprocessors. One of the problems of using these processors, is that not all of the resources are needed for a specific application. Furthermore, because of the regularity of the workloads running on these systems there is a large opportunity to optimize the processor by pruning those unused resources to achieve lower area (cost) and power. Moreover, these processors can be specified at the behavioral level and use High-Level Synthesis (HLS) to generate an efficient Register Transfer Level (RTL) description. This opens a window to additional optimizations as the processor implementation is fully re-optimized during the HLS process. Also, many applications running on these embedded systems tolerate imprecise outputs. These include image processing and digital signal processing (DSP) applications. This opens the door to further optimizations in the context of approximate computing. To address these issues, this work presents a methodology to customize a behavioral RISC processor automatically for a given workload such that its area and power are significantly reduced as compared to the original, general-purpose processor. First, generating a bespoke processor that leads to the exact output as compared to the original general-purpose one and then by approximating it allowing a certain level of error at the output. Compared to previous work that customizes a given processor at the gate netlist only, our proposed method shows significant benefits. In particular, this work shows that raising the level of abstraction reduces the area and power by 78.3% and 70.1% for the exact solution on average, and further reduces the area by an additional 10.0% and 16.5% for the approximate version tolerating a maximum of 10% and 20% output errors respectively.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"8 2","pages":"165-179"},"PeriodicalIF":3.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9950345/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Many applications require simple controllers that continuously run the same application. These applications are often found in battery operated embedded systems that require to be ultra-low power (ULP) and are very price sensitive. Some examples include IoT devices of different nature and medical devices. Currently, these systems rely on off-the-shelf general-purpose microprocessors. One of the problems of using these processors, is that not all of the resources are needed for a specific application. Furthermore, because of the regularity of the workloads running on these systems there is a large opportunity to optimize the processor by pruning those unused resources to achieve lower area (cost) and power. Moreover, these processors can be specified at the behavioral level and use High-Level Synthesis (HLS) to generate an efficient Register Transfer Level (RTL) description. This opens a window to additional optimizations as the processor implementation is fully re-optimized during the HLS process. Also, many applications running on these embedded systems tolerate imprecise outputs. These include image processing and digital signal processing (DSP) applications. This opens the door to further optimizations in the context of approximate computing. To address these issues, this work presents a methodology to customize a behavioral RISC processor automatically for a given workload such that its area and power are significantly reduced as compared to the original, general-purpose processor. First, generating a bespoke processor that leads to the exact output as compared to the original general-purpose one and then by approximating it allowing a certain level of error at the output. Compared to previous work that customizes a given processor at the gate netlist only, our proposed method shows significant benefits. In particular, this work shows that raising the level of abstraction reduces the area and power by 78.3% and 70.1% for the exact solution on average, and further reduces the area by an additional 10.0% and 16.5% for the approximate version tolerating a maximum of 10% and 20% output errors respectively.