Hardware data prefetching is a well-studied technique to bridge the processor-memory performance gap. Bit-pattern-based prefetchers are one of the most promising spatial data prefetchers that achieve substantial performance gains. In bit-pattern-based prefetchers, the region size is a crucial parameter, which denotes the memory size that can be recorded by a pattern or prefetched by a prediction. However, existing bit-pattern-based prefetchers only support one fixed region size. Our experiment shows that the fixed region size cannot meet the requirements for numerous applications and leads to suboptimal performance and high hardware overhead. In this article, we propose PARS, a pattern-aware spatial data prefetcher supporting multiple region sizes. The key idea of PARS is that it supports multiple region sizes, enabling it to simultaneously enhance application performance while reducing the hardware overhead. Moreover, PARS supports dynamically switching appropriate region sizes for different patterns through an adaptive RS-switching mechanism. We evaluated PARS on numerous workloads and results show that PARS provides an average performance improvement of 40.6% over a baseline with no data prefetchers and outperforms the two state-of-the-art prefetchers Bingo by 2.1% (up to 24.4%) and Pythia by 3.9% (up to 111.2%) in the single-core system. In the four-core system, PARS outperforms Bingo by 5.0% (up to 66.0%) and Pythia by 5.4% (up to 177.9%).
{"title":"PARS: A Pattern-Aware Spatial Data Prefetcher Supporting Multiple Region Sizes","authors":"Yiquan Lin;Wenhai Lin;Jiexiong Xu;Yiquan Chen;Zhen Jin;Jingchang Qin;Jiahao He;Shishun Cai;Yuzhong Zhang;Zonghui Wang;Wenzhi Chen","doi":"10.1109/TCAD.2024.3442981","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3442981","url":null,"abstract":"Hardware data prefetching is a well-studied technique to bridge the processor-memory performance gap. Bit-pattern-based prefetchers are one of the most promising spatial data prefetchers that achieve substantial performance gains. In bit-pattern-based prefetchers, the region size is a crucial parameter, which denotes the memory size that can be recorded by a pattern or prefetched by a prediction. However, existing bit-pattern-based prefetchers only support one fixed region size. Our experiment shows that the fixed region size cannot meet the requirements for numerous applications and leads to suboptimal performance and high hardware overhead. In this article, we propose PARS, a pattern-aware spatial data prefetcher supporting multiple region sizes. The key idea of PARS is that it supports multiple region sizes, enabling it to simultaneously enhance application performance while reducing the hardware overhead. Moreover, PARS supports dynamically switching appropriate region sizes for different patterns through an adaptive RS-switching mechanism. We evaluated PARS on numerous workloads and results show that PARS provides an average performance improvement of 40.6% over a baseline with no data prefetchers and outperforms the two state-of-the-art prefetchers Bingo by 2.1% (up to 24.4%) and Pythia by 3.9% (up to 111.2%) in the single-core system. In the four-core system, PARS outperforms Bingo by 5.0% (up to 66.0%) and Pythia by 5.4% (up to 177.9%).","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3638-3649"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern cyber-physical systems (CPSs) employ an increasingly large number of software control loops to enhance their autonomous capabilities. Such large task sets and their dependencies may lead to deadline misses caused by platform-level timing uncertainties, resource contention, etc. To ensure the schedulability of the task set in the embedded platform in the presence of these uncertainties, there exist co-design techniques that assign task periodicities such that control costs are minimized. Another line of work exists that addresses the same platform schedulability issue by skipping a bounded number of control executions within a fixed number of control instances. Considering that control tasks are designed to perform robustly against delayed actuation (due to deadline misses, network packet drops etc.) a bounded number of control skips can be applied while ensuring certain performance margin. Our work combines these two control scheduling co-design disciplines and develops a strategy to adaptively employ control skips or update periodicities of the control tasks depending on their current performance requirements. For this we leverage a novel theory of automata-based control skip sequence generation while ensuring periodicity, safety and stability constraints. We demonstrate the effectiveness of this dynamic resource sharing approach in an automotive Hardware-in-loop setup with realistic control task set implementations.
{"title":"Revisiting Dynamic Scheduling of Control Tasks: A Performance-Aware Fine-Grained Approach","authors":"Sunandan Adhikary;Ipsita Koley;Saurav Kumar Ghosh;Sumana Ghosh;Soumyajit Dey","doi":"10.1109/TCAD.2024.3443007","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443007","url":null,"abstract":"Modern cyber-physical systems (CPSs) employ an increasingly large number of software control loops to enhance their autonomous capabilities. Such large task sets and their dependencies may lead to deadline misses caused by platform-level timing uncertainties, resource contention, etc. To ensure the schedulability of the task set in the embedded platform in the presence of these uncertainties, there exist co-design techniques that assign task periodicities such that control costs are minimized. Another line of work exists that addresses the same platform schedulability issue by skipping a bounded number of control executions within a fixed number of control instances. Considering that control tasks are designed to perform robustly against delayed actuation (due to deadline misses, network packet drops etc.) a bounded number of control skips can be applied while ensuring certain performance margin. Our work combines these two control scheduling co-design disciplines and develops a strategy to adaptively employ control skips or update periodicities of the control tasks depending on their current performance requirements. For this we leverage a novel theory of automata-based control skip sequence generation while ensuring periodicity, safety and stability constraints. We demonstrate the effectiveness of this dynamic resource sharing approach in an automotive Hardware-in-loop setup with realistic control task set implementations.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3662-3673"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1109/TCAD.2024.3438113
Fabian Seiler;Nima TaheriNejad
Image processing algorithms continue to demand higher performance from computers. However, computer performance is not improving at the same rate as before. In response to the current challenges in enhancing computing performance, a wave of new technologies and computing paradigms is surfacing. Among these, memristors stand out as one of the most promising components due to their technological prospects and low power consumption. With efficient data storage capabilities and their ability to directly perform logical operations within the memory, they are well-suited for in-memory computation (IMC). Approximate computing emerges as another promising paradigm, offering improved performance metrics, notably speed. The tradeoff for this gain is the reduction of accuracy. In this article, we are using the stateful logic material implication (IMPLY) in the semi-serial topology and combine both the paradigms to further enhance the computational performance. We present three novel approximated adders that drastically improve speed and energy consumption with an normalized mean error distance (NMED) lower than 0.02 for most scenarios. We evaluated partially approximated Ripple carry adder (RCA) at the circuit-level and compared them to the State-of-the-Art (SoA). The proposed adders are applied in different image processing applications and the quality metrics are calculated. While maintaining acceptable quality, our approach achieves significant energy savings of 6%–38% and reduces the delay (number of computation cycles) by 5%–35%, demonstrating notable efficiency compared to exact calculations.
{"title":"Efficient Image Processing via Memristive-Based Approximate In-Memory Computing","authors":"Fabian Seiler;Nima TaheriNejad","doi":"10.1109/TCAD.2024.3438113","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438113","url":null,"abstract":"Image processing algorithms continue to demand higher performance from computers. However, computer performance is not improving at the same rate as before. In response to the current challenges in enhancing computing performance, a wave of new technologies and computing paradigms is surfacing. Among these, memristors stand out as one of the most promising components due to their technological prospects and low power consumption. With efficient data storage capabilities and their ability to directly perform logical operations within the memory, they are well-suited for in-memory computation (IMC). Approximate computing emerges as another promising paradigm, offering improved performance metrics, notably speed. The tradeoff for this gain is the reduction of accuracy. In this article, we are using the stateful logic material implication (IMPLY) in the semi-serial topology and combine both the paradigms to further enhance the computational performance. We present three novel approximated adders that drastically improve speed and energy consumption with an normalized mean error distance (NMED) lower than 0.02 for most scenarios. We evaluated partially approximated Ripple carry adder (RCA) at the circuit-level and compared them to the State-of-the-Art (SoA). The proposed adders are applied in different image processing applications and the quality metrics are calculated. While maintaining acceptable quality, our approach achieves significant energy savings of 6%–38% and reduces the delay (number of computation cycles) by 5%–35%, demonstrating notable efficiency compared to exact calculations.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3312-3323"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745792","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1109/TCAD.2024.3454934
{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information","authors":"","doi":"10.1109/TCAD.2024.3454934","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3454934","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 10","pages":"C2-C2"},"PeriodicalIF":2.7,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10684353","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142246472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1109/TCAD.2024.3449609
{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information","authors":"","doi":"10.1109/TCAD.2024.3449609","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3449609","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 10","pages":"C3-C3"},"PeriodicalIF":2.7,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10684352","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142276469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1109/tcad.2024.3463544
Sercan Aygun, M. Hassan Najafi
{"title":"Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures","authors":"Sercan Aygun, M. Hassan Najafi","doi":"10.1109/tcad.2024.3463544","DOIUrl":"https://doi.org/10.1109/tcad.2024.3463544","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"10 1","pages":""},"PeriodicalIF":2.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1109/tcad.2024.3463534
Jinyi Shen, Fan Yang, Li Shang, Changhao Yan, Zhaori Bi, Dian Zhou, Xuan Zeng
{"title":"ATOM: An Automatic Topology Synthesis Framework for Operational Amplifiers","authors":"Jinyi Shen, Fan Yang, Li Shang, Changhao Yan, Zhaori Bi, Dian Zhou, Xuan Zeng","doi":"10.1109/tcad.2024.3463534","DOIUrl":"https://doi.org/10.1109/tcad.2024.3463534","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"1 1","pages":""},"PeriodicalIF":2.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1109/tcad.2024.3462904
Zhenxin Zhao, Jun Liu, Wensheng Zhao, Lihong Zhang
{"title":"Automated Topology Synthesis of Analog Integrated Circuits With Frequency Compensation","authors":"Zhenxin Zhao, Jun Liu, Wensheng Zhao, Lihong Zhang","doi":"10.1109/tcad.2024.3462904","DOIUrl":"https://doi.org/10.1109/tcad.2024.3462904","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"117 1","pages":""},"PeriodicalIF":2.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}