K. Shimamura, Shigeya Tanaka, Tetsuya Shimomura, T. Hotta, E. Kamada, H. Sawamoto, Teruhisa Shimizu, K. Nakazawa
{"title":"具有伪向量处理特性的超标量RISC处理器","authors":"K. Shimamura, Shigeya Tanaka, Tetsuya Shimomura, T. Hotta, E. Kamada, H. Sawamoto, Teruhisa Shimizu, K. Nakazawa","doi":"10.1109/ICCD.1995.528797","DOIUrl":null,"url":null,"abstract":"A novel architectural extension, in which floating-point data are transferred directly from main memory to floating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows main memory access throughput of 1.2 Gbyte/s, and effective performance reaches 267 MFLOPS (89% of the peak performance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm/spl times/15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and scoreboard-based dependency check method allow the extension to be realized without affecting the operating frequency.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A superscalar RISC processor with pseudo vector processing feature\",\"authors\":\"K. Shimamura, Shigeya Tanaka, Tetsuya Shimomura, T. Hotta, E. Kamada, H. Sawamoto, Teruhisa Shimizu, K. Nakazawa\",\"doi\":\"10.1109/ICCD.1995.528797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel architectural extension, in which floating-point data are transferred directly from main memory to floating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows main memory access throughput of 1.2 Gbyte/s, and effective performance reaches 267 MFLOPS (89% of the peak performance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm/spl times/15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and scoreboard-based dependency check method allow the extension to be realized without affecting the operating frequency.\",\"PeriodicalId\":281907,\"journal\":{\"name\":\"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.1995.528797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.1995.528797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A superscalar RISC processor with pseudo vector processing feature
A novel architectural extension, in which floating-point data are transferred directly from main memory to floating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows main memory access throughput of 1.2 Gbyte/s, and effective performance reaches 267 MFLOPS (89% of the peak performance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm/spl times/15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and scoreboard-based dependency check method allow the extension to be realized without affecting the operating frequency.