Bobin Deng, Bhargava Nadendla, Kun Suo, Yixin Xie, Dan Chia-Tien Lo
{"title":"余数系统的定点编码和架构探索","authors":"Bobin Deng, Bhargava Nadendla, Kun Suo, Yixin Xie, Dan Chia-Tien Lo","doi":"10.1145/3664923","DOIUrl":null,"url":null,"abstract":"<p>Residue Number Systems (RNS) demonstrate the fascinating potential to serve integer addition/multiplication-intensive applications. The complexity of Artificial Intelligence (AI) models has grown enormously in recent years. From a computer system’s perspective, ensuring the training of these large-scale AI models within an adequate time and energy consumption has become a big concern. Matrix multiplication is a dominant subroutine in many prevailing AI models, with an addition/multiplication-intensive attribute. However, the data type of matrix multiplication within machine learning training typically requires real numbers, which indicates that RNS benefits for integer applications cannot be directly gained by AI training. The state-of-the-art RNS real number encodings, including floating-point and fixed-point, have defects and can be further enhanced. To transform default RNS benefits to the efficiency of large-scale AI training, we propose a low-cost and high-accuracy RNS fixed-point representation: <i>Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Postprocessing Multiplication (SD-Post-Mul)</i>. Moreover, we extend the implementation details of the other two RNS fixed-point methods: <i>Double RNS Concatenation (D-RNS-Concat)</i> and <i>Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Preprocessing Multiplication (SD-Pre-Mul)</i>. We also design the architectures of these three fixed-point multipliers. In empirical experiments, our <i>S-RNS-Logic-P representation with SD-Post-Mul</i> method achieves less latency and energy overhead while maintaining good accuracy. Furthermore, this method can easily extend to the Redundant Residue Number System (RRNS) to raise the efficiency of error-tolerant domains, such as improving the error correction efficiency of quantum computing.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"147 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fixed-point Encoding and Architecture Exploration for Residue Number Systems\",\"authors\":\"Bobin Deng, Bhargava Nadendla, Kun Suo, Yixin Xie, Dan Chia-Tien Lo\",\"doi\":\"10.1145/3664923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Residue Number Systems (RNS) demonstrate the fascinating potential to serve integer addition/multiplication-intensive applications. The complexity of Artificial Intelligence (AI) models has grown enormously in recent years. From a computer system’s perspective, ensuring the training of these large-scale AI models within an adequate time and energy consumption has become a big concern. Matrix multiplication is a dominant subroutine in many prevailing AI models, with an addition/multiplication-intensive attribute. However, the data type of matrix multiplication within machine learning training typically requires real numbers, which indicates that RNS benefits for integer applications cannot be directly gained by AI training. The state-of-the-art RNS real number encodings, including floating-point and fixed-point, have defects and can be further enhanced. To transform default RNS benefits to the efficiency of large-scale AI training, we propose a low-cost and high-accuracy RNS fixed-point representation: <i>Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Postprocessing Multiplication (SD-Post-Mul)</i>. Moreover, we extend the implementation details of the other two RNS fixed-point methods: <i>Double RNS Concatenation (D-RNS-Concat)</i> and <i>Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Preprocessing Multiplication (SD-Pre-Mul)</i>. We also design the architectures of these three fixed-point multipliers. In empirical experiments, our <i>S-RNS-Logic-P representation with SD-Post-Mul</i> method achieves less latency and energy overhead while maintaining good accuracy. Furthermore, this method can easily extend to the Redundant Residue Number System (RRNS) to raise the efficiency of error-tolerant domains, such as improving the error correction efficiency of quantum computing.</p>\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"147 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3664923\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3664923","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Fixed-point Encoding and Architecture Exploration for Residue Number Systems
Residue Number Systems (RNS) demonstrate the fascinating potential to serve integer addition/multiplication-intensive applications. The complexity of Artificial Intelligence (AI) models has grown enormously in recent years. From a computer system’s perspective, ensuring the training of these large-scale AI models within an adequate time and energy consumption has become a big concern. Matrix multiplication is a dominant subroutine in many prevailing AI models, with an addition/multiplication-intensive attribute. However, the data type of matrix multiplication within machine learning training typically requires real numbers, which indicates that RNS benefits for integer applications cannot be directly gained by AI training. The state-of-the-art RNS real number encodings, including floating-point and fixed-point, have defects and can be further enhanced. To transform default RNS benefits to the efficiency of large-scale AI training, we propose a low-cost and high-accuracy RNS fixed-point representation: Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Postprocessing Multiplication (SD-Post-Mul). Moreover, we extend the implementation details of the other two RNS fixed-point methods: Double RNS Concatenation (D-RNS-Concat) and Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Preprocessing Multiplication (SD-Pre-Mul). We also design the architectures of these three fixed-point multipliers. In empirical experiments, our S-RNS-Logic-P representation with SD-Post-Mul method achieves less latency and energy overhead while maintaining good accuracy. Furthermore, this method can easily extend to the Redundant Residue Number System (RRNS) to raise the efficiency of error-tolerant domains, such as improving the error correction efficiency of quantum computing.
期刊介绍:
ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.