Pub Date : 2024-03-18DOI: 10.1109/TETC.2024.3374568
Jinguang Han;Patrick Schaumont;Willy Susilo
Machine learning and cloud computing have dramatically increased the utility of data. These technologies facilitate our life and provide smart and intelligent services. Notably, machine learning algorithms need to learn from massive training data to improve accuracy. Hence, data is the core component of machine learning and plays an important role. Cloud computing is a new computing model that provides on-demand services, such as data storage, computing power, and infrastructure. Data owners are allowed to outsource their data to cloud servers, but will lose direct control of their data. The rising trend in data breach shows that privacy and security have been major issues in machine learning and cloud computing.
{"title":"Guest Editorial IEEE Transactions on Emerging Topics in Computing Special Section on Advances in Emerging Privacy-Preserving Computing","authors":"Jinguang Han;Patrick Schaumont;Willy Susilo","doi":"10.1109/TETC.2024.3374568","DOIUrl":"https://doi.org/10.1109/TETC.2024.3374568","url":null,"abstract":"Machine learning and cloud computing have dramatically increased the utility of data. These technologies facilitate our life and provide smart and intelligent services. Notably, machine learning algorithms need to learn from massive training data to improve accuracy. Hence, data is the core component of machine learning and plays an important role. Cloud computing is a new computing model that provides on-demand services, such as data storage, computing power, and infrastructure. Data owners are allowed to outsource their data to cloud servers, but will lose direct control of their data. The rising trend in data breach shows that privacy and security have been major issues in machine learning and cloud computing.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"266-268"},"PeriodicalIF":5.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474207","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140164101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETC.2024.3377773
{"title":"IEEE Transactions on Emerging Topics in Computing Information for Authors","authors":"","doi":"10.1109/TETC.2024.3377773","DOIUrl":"https://doi.org/10.1109/TETC.2024.3377773","url":null,"abstract":"","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"C2-C2"},"PeriodicalIF":5.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474198","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140161123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-19DOI: 10.1109/TETC.2024.3365650
Tingting Zhang;Zijing Niu;Jie Han
The symbiotic use of logarithmic approximation in floating-point (FP) multiplication can significantly reduce the hardware complexity of a multiplier. However, it is difficult for a limited number of logarithmic FP multipliers (LFPMs) to fit in a specific error-tolerant application, such as neural networks (NNs) and digital signal processing, due to their unique error characteristics. This article proposes a design framework for generating LFPMs. We consider two FP representation formats with different ranges of mantissas, the IEEE 754 Standard FP Format and the Nearest Power of Two FP Format. For both logarithm and anti-logarithm computation, the applicable regions of inputs are first evenly divided into several intervals, and then approximation methods with negative or positive errors are developed for each sub-region. By using piece-wise functions, different configurations of approximation methods throughout applicable regions are created, leading to LFPMs with various trade-offs between accuracy and hardware cost. The variety of error characteristics of LFPMs is discussed and the generic hardware implementation is illustrated. As case studies, two LFPM designs are presented and evaluated in applications of JPEG compression and NNs. They do not only increase the classification accuracy, but also achieve smaller PDPs compared to the exact FP multiplier, while being more accurate than a recent logarithmic FP design.
{"title":"A Design Framework for Hardware-Efficient Logarithmic Floating-Point Multipliers","authors":"Tingting Zhang;Zijing Niu;Jie Han","doi":"10.1109/TETC.2024.3365650","DOIUrl":"10.1109/TETC.2024.3365650","url":null,"abstract":"The symbiotic use of logarithmic approximation in floating-point (FP) multiplication can significantly reduce the hardware complexity of a multiplier. However, it is difficult for a limited number of logarithmic FP multipliers (LFPMs) to fit in a specific error-tolerant application, such as neural networks (NNs) and digital signal processing, due to their unique error characteristics. This article proposes a design framework for generating LFPMs. We consider two FP representation formats with different ranges of mantissas, the IEEE 754 Standard FP Format and the Nearest Power of Two FP Format. For both logarithm and anti-logarithm computation, the applicable regions of inputs are first evenly divided into several intervals, and then approximation methods with negative or positive errors are developed for each sub-region. By using piece-wise functions, different configurations of approximation methods throughout applicable regions are created, leading to LFPMs with various trade-offs between accuracy and hardware cost. The variety of error characteristics of LFPMs is discussed and the generic hardware implementation is illustrated. As case studies, two LFPM designs are presented and evaluated in applications of JPEG compression and NNs. They do not only increase the classification accuracy, but also achieve smaller PDPs compared to the exact FP multiplier, while being more accurate than a recent logarithmic FP design.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"991-1001"},"PeriodicalIF":5.1,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32.
{"title":"MiniFloats on RISC-V Cores: ISA Extensions With Mixed-Precision Short Dot Products","authors":"Luca Bertaccini;Gianna Paulin;Matheus Cavalcante;Tim Fischer;Stefan Mach;Luca Benini","doi":"10.1109/TETC.2024.3365354","DOIUrl":"10.1109/TETC.2024.3365354","url":null,"abstract":"Low-precision floating-point (FP) formats have recently been intensely investigated in the context of machine learning inference and training applications. While 16-bit formats are already widely used, 8-bit FP data types have lately emerged as a viable option for neural network training when employed in a mixed-precision scenario and combined with rounding methods increasing the precision in compound additions, such as stochastic rounding. So far, hardware implementations supporting FP8 are mostly implemented within domain-specific accelerators. We propose two RISC-V instruction set architecture (ISA) extensions, enhancing respectively scalar and vector general-purpose cores with low and mixed-precision capabilities. The extensions support two 8-bit and two 16-bit FP formats and are based on dot-product instructions accumulating at higher precision. We develop a hardware unit supporting mixed-precision dot products and stochastic rounding and integrate it into an open-source floating-point unit (FPU). Finally, we integrate the enhanced FPU into a cluster of scalar cores, as well as a cluster of vector cores, and implement them in a 12 nm FinFET technology. The former achieves 575 GFLOPS/W on FP8-to-FP16 matrix multiplications at 0.8 V, 1.26 GHz; the latter reaches 860 GFLOPS/W at 0.8 V, 1.08 GHz, 1.93x higher efficiency than computing on FP16-to-FP32.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 4","pages":"1040-1055"},"PeriodicalIF":5.1,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139948120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-16DOI: 10.1109/TETC.2024.3364703
Kai Di;Fulin Chen;Yuanshuang Jiang;Pan Li;Tianyi Liu;Yichuan Jiang
In recent years, the cooperation structures of industrial chains have evolved into multiplex networks, in which product agents are connected through various types of links. Due to the constraints of the multi-coupled interaction structure of the multiplex networked industrial chains, the load imbalances generated by the industrial production processes will cascade in and between different network layers, thus affecting the load balance of the whole system. The challenges that arise when attempting such load balancing among multiplex networked industrial chains are twofold: 1) The multiplex networked interaction structure adds new constraints to traditional multiagent task migration problems, which increases the solution space dimension, and 2) The cascaded load imbalances require tasks to be migrated adaptively, which complicates the solution space structure, and it is proven $mathcal {NP}$