T. Anderson, Duc Bui, S. Moharil, Soujanya Narnur, Mujibur Rahman, A. Lell, Eric Biscondi, A. Shrivastava, P. Dent, Mingjian Yan, Hasan Mahmood
{"title":"A 1.5 Ghz VLIW DSP CPU with Integrated Floating Point and Fixed Point Instructions in 40 nm CMOS","authors":"T. Anderson, Duc Bui, S. Moharil, Soujanya Narnur, Mujibur Rahman, A. Lell, Eric Biscondi, A. Shrivastava, P. Dent, Mingjian Yan, Hasan Mahmood","doi":"10.1109/ARITH.2011.20","DOIUrl":null,"url":null,"abstract":"A next generation VLIW DSP Central Processing Unit (CPU) which has an integrated fixed point and floating point Instruction Set Architecture (ISA) is presented. It is designed to meet a 1.5 GHz core clock frequency in a 40nm process with aggressive area and power goals. In this paper, the benchmarking process and benefits of newly defined instructions such as complex matrix multiply is explained. Also, the CPU data path is described in detail, highlighting several novel micro-architecture features. Finally, our design methodology as well as verification methodology to ensure functional correctness utilizing formal equivalent verification is described.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 20th Symposium on Computer Arithmetic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2011.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
A next generation VLIW DSP Central Processing Unit (CPU) which has an integrated fixed point and floating point Instruction Set Architecture (ISA) is presented. It is designed to meet a 1.5 GHz core clock frequency in a 40nm process with aggressive area and power goals. In this paper, the benchmarking process and benefits of newly defined instructions such as complex matrix multiply is explained. Also, the CPU data path is described in detail, highlighting several novel micro-architecture features. Finally, our design methodology as well as verification methodology to ensure functional correctness utilizing formal equivalent verification is described.