Pub Date : 2023-09-20DOI: 10.1109/TETC.2023.3315298
Abbas Dehghani;Sadegh Fadaei;Bahman Ravaei;Keyvan RahimiZadeh
Hybrid Wireless Network-on-Chip (HWNoC) architecture has been introduced as a promising communication infrastructure for multicore systems. HWNoC-based multicore systems encounter extremely dynamic application workloads that are submitted at run-time. Mapping and scheduling of these applications are critical for system performance, especially for real-time applications. The existing resource allocation approaches either ignore the use of wireless links in task allocation on cores or ignore the timing characteristic of tasks. In this paper, we propose a new deadline-aware and energy-efficient dynamic task mapping and scheduling approach for the HWNoC-based multicore system. By using of core utilization threshold and tasks laxity time, the proposed approach aims to minimize communication energy consumption and satisfy the deadline of the real-time applications tasks. Through cycle-accurate simulation, the performance of the proposed approach has been compared with state-of-the-art approaches in terms of communication energy consumption, deadline violation rate, communication latency, and runtime overhead. The experimental results confirmed that the proposed approach is a very competitive approach among the alternative approaches.
{"title":"Deadline-Aware and Energy-Efficient Dynamic Task Mapping and Scheduling for Multicore Systems Based on Wireless Network-on-Chip","authors":"Abbas Dehghani;Sadegh Fadaei;Bahman Ravaei;Keyvan RahimiZadeh","doi":"10.1109/TETC.2023.3315298","DOIUrl":"10.1109/TETC.2023.3315298","url":null,"abstract":"Hybrid Wireless Network-on-Chip (HWNoC) architecture has been introduced as a promising communication infrastructure for multicore systems. HWNoC-based multicore systems encounter extremely dynamic application workloads that are submitted at run-time. Mapping and scheduling of these applications are critical for system performance, especially for real-time applications. The existing resource allocation approaches either ignore the use of wireless links in task allocation on cores or ignore the timing characteristic of tasks. In this paper, we propose a new deadline-aware and energy-efficient dynamic task mapping and scheduling approach for the HWNoC-based multicore system. By using of core utilization threshold and tasks laxity time, the proposed approach aims to minimize communication energy consumption and satisfy the deadline of the real-time applications tasks. Through cycle-accurate simulation, the performance of the proposed approach has been compared with state-of-the-art approaches in terms of communication energy consumption, deadline violation rate, communication latency, and runtime overhead. The experimental results confirmed that the proposed approach is a very competitive approach among the alternative approaches.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"1031-1044"},"PeriodicalIF":5.9,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135555776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-19DOI: 10.1109/TETC.2023.3315189
Ben Perach;Ronny Ronen;Benny Kimelfeld;Shahar Kvatinsky
Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This article examines the capabilities of bulk-bitwise PIM by constructing PIMDB, a fully-digital system based on memristive stateful logic, utilizing and focusing on in-memory bulk-bitwise operations, designed to accelerate a real-life workload: analytical processing of relational databases. We introduce a host processor programming model to support bulk-bitwise PIM in virtual memory, develop techniques to efficiently perform in-memory filtering and aggregation operations, and adapt the application data set into the memory. To understand bulk-bitwise PIM, we compare it to an equivalent in-memory database on the same host system. We show that bulk-bitwise PIM substantially lowers the number of required memory read operations, thus accelerating TPC-H filter operations by 1.6×–18× and full queries by 56×–608×, while reducing the energy consumption by 1.7×–18.6× and 0.81×–12× for these benchmarks, respectively. Our extensive evaluation uses the gem5 full-system simulation environment. The simulations also evaluate cell endurance, showing that the required endurance is within the range of existing endurance of RRAM devices.
{"title":"Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics","authors":"Ben Perach;Ronny Ronen;Benny Kimelfeld;Shahar Kvatinsky","doi":"10.1109/TETC.2023.3315189","DOIUrl":"10.1109/TETC.2023.3315189","url":null,"abstract":"Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This article examines the capabilities of bulk-bitwise PIM by constructing PIMDB, a fully-digital system based on memristive stateful logic, utilizing and focusing on in-memory bulk-bitwise operations, designed to accelerate a real-life workload: analytical processing of relational databases. We introduce a host processor programming model to support bulk-bitwise PIM in virtual memory, develop techniques to efficiently perform in-memory filtering and aggregation operations, and adapt the application data set into the memory. To understand bulk-bitwise PIM, we compare it to an equivalent in-memory database on the same host system. We show that bulk-bitwise PIM substantially lowers the number of required memory read operations, thus accelerating TPC-H filter operations by 1.6×–18× and full queries by 56×–608×, while reducing the energy consumption by 1.7×–18.6× and 0.81×–12× for these benchmarks, respectively. Our extensive evaluation uses the gem5 full-system simulation environment. The simulations also evaluate cell endurance, showing that the required endurance is within the range of existing endurance of RRAM devices.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"7-22"},"PeriodicalIF":5.9,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135551143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-05DOI: 10.1109/TETC.2023.3299150
Michael Grottke;Alberto Avritzer;Hironori Washizaki;Kishor Trivedi
Since the publication of the first paper on software aging and rejuvenation by Huang et al. in 1995 [1], considerable research has been devoted to this topic. It deals with the phenomenon that continuously-running software systems may show an increasing failure rate and/or a degrading performance, either because error conditions accumulate inside the running system or because the rate at which faults are activated and errors are propagated is positively correlated with system uptime. Software rejuvenation relates to techniques counteracting aging (for example, by regularly stopping and restarting the software) in order to remove aging effects and to proactively prevent failures from occurring.
{"title":"Guest Editorial Special Section on Applied Software Aging and Rejuvenation","authors":"Michael Grottke;Alberto Avritzer;Hironori Washizaki;Kishor Trivedi","doi":"10.1109/TETC.2023.3299150","DOIUrl":"10.1109/TETC.2023.3299150","url":null,"abstract":"Since the publication of the first paper on software aging and rejuvenation by Huang et al. in 1995 [1], considerable research has been devoted to this topic. It deals with the phenomenon that continuously-running software systems may show an increasing failure rate and/or a degrading performance, either because error conditions accumulate inside the running system or because the rate at which faults are activated and errors are propagated is positively correlated with system uptime. Software rejuvenation relates to techniques counteracting aging (for example, by regularly stopping and restarting the software) in order to remove aging effects and to proactively prevent failures from occurring.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 3","pages":"550-552"},"PeriodicalIF":5.9,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10241255","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62529112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-05DOI: 10.1109/TETC.2023.3300132
{"title":"IEEE Transactions on Emerging Topics in Computing Information for Authors","authors":"","doi":"10.1109/TETC.2023.3300132","DOIUrl":"10.1109/TETC.2023.3300132","url":null,"abstract":"","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 3","pages":"C2-C2"},"PeriodicalIF":5.9,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10241262","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135804629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, a Residue Numbering System (RNS)-based Convolutional Neural Network (CNN) accelerator utilizing a multiplier-free distributed-arithmetic Processing Element (PE) is proposed. A method for maximizing the utilization of the arithmetic hardware resources is presented. It leads to an increase of the system's throughput, by exploiting bit-level sparsity within the weight vectors. The proposed PE design takes advantage of the properties of RNS and Canonical Signed Digit (CSD) encoding to achieve higher energy efficiency and effective processing rate, without requiring any compression mechanism or introducing any approximation. An extensive design space exploration for various parameters (RNS base, PE micro-architecture, encoding) using analytical models as well as experimental results from CNN benchmarks is conducted and the various trade-offs are analyzed. A complete end-to-end RNS accelerator is developed based on the proposed PE. The introduced accelerator is compared to traditional binary and RNS counterparts as well as to other state-of-the-art systems. Implementation results in a 22-nm process show that the proposed PE can lead to $1.85times$