Pub Date : 2015-07-27DOI: 10.1109/ASAP.2015.7245714
Ran Zheng, Wei Wang, Hai Jin, Song Wu, Yong Chen, Han Jiang
In many scientific computing applications, sparse Cholesky factorization is used to solve large sparse linear equations in distributed environment. GPU computing is a new way to solve the problem. However, sparse Cholesky factorization on GPU is hardly to achieve excellent performance due to the structure irregularity of matrix and the low GPU resource utilization. A hybrid CPU-GPU implementation of sparse Cholesky factorization is proposed based on multifrontal method. A large sparse coefficient matrix is decomposed into a series of small dense matrices (frontal matrices) in the method, and then multiple GEMM (General Matrix-matrix Multiplication) operations are computed. GEMMs are the main operations in sparse Cholesky factorization, but they are hardly to perform better in parallel on GPU. In order to improve the performance, the scheme of multiple task queues is adopted when performing multiple GEMMs parallelized with multifrontal method; all GEMM tasks are scheduled dynamically on GPU and CPU based on computation scales for load balance and computing-time reduction. Experimental results show that the approach can outperform the implementations of BLAS and cuBLAS, achieving up to 3.15× and 1.98× speedup, respectively.
{"title":"GPU-based multifrontal optimizing method in sparse Cholesky factorization","authors":"Ran Zheng, Wei Wang, Hai Jin, Song Wu, Yong Chen, Han Jiang","doi":"10.1109/ASAP.2015.7245714","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245714","url":null,"abstract":"In many scientific computing applications, sparse Cholesky factorization is used to solve large sparse linear equations in distributed environment. GPU computing is a new way to solve the problem. However, sparse Cholesky factorization on GPU is hardly to achieve excellent performance due to the structure irregularity of matrix and the low GPU resource utilization. A hybrid CPU-GPU implementation of sparse Cholesky factorization is proposed based on multifrontal method. A large sparse coefficient matrix is decomposed into a series of small dense matrices (frontal matrices) in the method, and then multiple GEMM (General Matrix-matrix Multiplication) operations are computed. GEMMs are the main operations in sparse Cholesky factorization, but they are hardly to perform better in parallel on GPU. In order to improve the performance, the scheme of multiple task queues is adopted when performing multiple GEMMs parallelized with multifrontal method; all GEMM tasks are scheduled dynamically on GPU and CPU based on computation scales for load balance and computing-time reduction. Experimental results show that the approach can outperform the implementations of BLAS and cuBLAS, achieving up to 3.15× and 1.98× speedup, respectively.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"36 1","pages":"90-97"},"PeriodicalIF":0.0,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85603328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-20DOI: 10.1109/ASAP.2015.7245729
P. Klages, K. Bandura, N. Denman, A. Recnik, J. Sievers, K. Vanderlinde
Interferometric radio telescopes often rely on computationally expensive O(N2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit-depth of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With these kernels, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. By design, these kernels allow correlations to scale to large numbers of input elements, and are limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.
{"title":"GPU kernels for high-speed 4-bit astrophysical data processing","authors":"P. Klages, K. Bandura, N. Denman, A. Recnik, J. Sievers, K. Vanderlinde","doi":"10.1109/ASAP.2015.7245729","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245729","url":null,"abstract":"Interferometric radio telescopes often rely on computationally expensive O(N2) correlation calculations; fortunately these computations map well to massively parallel accelerators such as low-cost GPUs. This paper describes the OpenCL kernels developed for the GPU based X-engine of a new hybrid FX correlator. Channelized data from the F-engine is supplied to the GPUs as 4-bit, offset-encoded real and imaginary integers. Because of the low bit-depth of the data, two values may be packed into a 32-bit register, allowing multiplication and addition of more than one value with a single fused multiply-add instruction. With these kernels, as many as 5.6 effective tera-operations per second (TOPS) can be executed on a 4.3 TOPS GPU. By design, these kernels allow correlations to scale to large numbers of input elements, and are limited only by maximum buffer sizes on the GPU. This code is currently working on-sky with the CHIME Pathfinder Correlator in BC, Canada.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"89 1","pages":"164-165"},"PeriodicalIF":0.0,"publicationDate":"2015-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81448218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-20DOI: 10.1109/ASAP.2015.7245705
A. Recnik, K. Bandura, N. Denman, A. Hincks, G. Hinshaw, P. Klages, U. Pen, K. Vanderlinde
The CHIME Pathfinder is a new interferometric radio telescope that uses a hybrid FPGA/GPU FX correlator. The GPU-based X-engine of this correlator processes over 819 Gb/s of 4+4-bit complex astronomical data from N=256 inputs across a 400MHz radio band. A software framework is presented to manage this real-time data flow, which allows each of 16 processing servers to handle 51.2 Gb/s of astronomical data, plus 8 Gb/s of ancillary data. Each server receives data in the form of UDP packets from an FPGA F-engine over the eight 10 GbE links, combines data from these packets into large (32MB-256MB) buffered frames, and transfers them to multiple GPU co-processors for correlation. The results from the GPUs are combined and normalized, then transmitted to a collection server, where they are merged into a single file. Aggressive optimizations enable each server to handle this high rate of data; allowing the efficient correlation of 25MHz of radio bandwidth per server. The solution scales well to larger values of N by adding additional servers.
{"title":"An efficient real-time data pipeline for the CHIME Pathfinder radio telescope X-engine","authors":"A. Recnik, K. Bandura, N. Denman, A. Hincks, G. Hinshaw, P. Klages, U. Pen, K. Vanderlinde","doi":"10.1109/ASAP.2015.7245705","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245705","url":null,"abstract":"The CHIME Pathfinder is a new interferometric radio telescope that uses a hybrid FPGA/GPU FX correlator. The GPU-based X-engine of this correlator processes over 819 Gb/s of 4+4-bit complex astronomical data from N=256 inputs across a 400MHz radio band. A software framework is presented to manage this real-time data flow, which allows each of 16 processing servers to handle 51.2 Gb/s of astronomical data, plus 8 Gb/s of ancillary data. Each server receives data in the form of UDP packets from an FPGA F-engine over the eight 10 GbE links, combines data from these packets into large (32MB-256MB) buffered frames, and transfers them to multiple GPU co-processors for correlation. The results from the GPUs are combined and normalized, then transmitted to a collection server, where they are merged into a single file. Aggressive optimizations enable each server to handle this high rate of data; allowing the efficient correlation of 25MHz of radio bandwidth per server. The solution scales well to larger values of N by adding additional servers.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"22 1","pages":"57-61"},"PeriodicalIF":0.0,"publicationDate":"2015-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88821969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-03-20DOI: 10.1109/ASAP.2015.7245702
N. Denman, M. Amiri, K. Bandura, L. Connor, M. Dobbs, M. Fandino, M. Halpern, A. Hincks, G. Hinshaw, C. Höfer, P. Klages, K. Masui, J. Parra, L. Newburgh, A. Recnik, J. Shaw, K. Sigurdson, Kendrick M. Smith, K. Vanderlinde
We present the design and implementation of a custom GPU-based compute cluster that provides the correlation X-engine of the CHIME Pathfinder radio telescope. It is among the largest such systems in operation, correlating 32,896 baselines (256 inputs) over 400MHz of radio bandwidth. Making heavy use of consumer-grade parts and a custom software stack, the system was developed at a small fraction of the cost of comparable installations. Unlike existing GPU backends, this system is built around OpenCL kernels running on consumer-level AMD GPUs, taking advantage of low-cost hardware and leveraging packed integer operations to double algorithmic efficiency. The system achieves the required 105 TOPS in a 10kW power envelope, making it one of the most power-efficient X-engines in use today.
{"title":"A GPU-based correlator X-engine implemented on the CHIME Pathfinder","authors":"N. Denman, M. Amiri, K. Bandura, L. Connor, M. Dobbs, M. Fandino, M. Halpern, A. Hincks, G. Hinshaw, C. Höfer, P. Klages, K. Masui, J. Parra, L. Newburgh, A. Recnik, J. Shaw, K. Sigurdson, Kendrick M. Smith, K. Vanderlinde","doi":"10.1109/ASAP.2015.7245702","DOIUrl":"https://doi.org/10.1109/ASAP.2015.7245702","url":null,"abstract":"We present the design and implementation of a custom GPU-based compute cluster that provides the correlation X-engine of the CHIME Pathfinder radio telescope. It is among the largest such systems in operation, correlating 32,896 baselines (256 inputs) over 400MHz of radio bandwidth. Making heavy use of consumer-grade parts and a custom software stack, the system was developed at a small fraction of the cost of comparable installations. Unlike existing GPU backends, this system is built around OpenCL kernels running on consumer-level AMD GPUs, taking advantage of low-cost hardware and leveraging packed integer operations to double algorithmic efficiency. The system achieves the required 105 TOPS in a 10kW power envelope, making it one of the most power-efficient X-engines in use today.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"73 1","pages":"35-40"},"PeriodicalIF":0.0,"publicationDate":"2015-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86376274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-05DOI: 10.1109/ASAP.2013.6567537
Hong Jiang
Summary form only given. Everyday 2.5 quintillion (2.5×1018, or 2.5 million trillion) bytes of data are created by people. This data comes from everywhere: from traditional scientific computing and on-line transactions, to popular social network and mobile applications. Data produced in the last two years alone amounts to 90% of the data in the world today! This phenomenal growth and ubiquity of data has ushered in an era of “Big Data”, which brings with it new challenges as well as opportunities. In this talk, I will first discuss big data challenges facing computer and storage systems research, brought on by the huge volume, high velocity, great variety and veracity with which digital data are being produced in the world. I will first introduce some new and ongoing programs at NSF that are relevant to Big Data and to ASAP. I will then present research being conducted in my research group that seeks a scalable systems and application-aware approach to addressing some of the challenges, from the many core and storage architectures to the systems and up to the applications.
{"title":"An application-aware approach to systems support for big data","authors":"Hong Jiang","doi":"10.1109/ASAP.2013.6567537","DOIUrl":"https://doi.org/10.1109/ASAP.2013.6567537","url":null,"abstract":"Summary form only given. Everyday 2.5 quintillion (2.5×1018, or 2.5 million trillion) bytes of data are created by people. This data comes from everywhere: from traditional scientific computing and on-line transactions, to popular social network and mobile applications. Data produced in the last two years alone amounts to 90% of the data in the world today! This phenomenal growth and ubiquity of data has ushered in an era of “Big Data”, which brings with it new challenges as well as opportunities. In this talk, I will first discuss big data challenges facing computer and storage systems research, brought on by the huge volume, high velocity, great variety and veracity with which digital data are being produced in the world. I will first introduce some new and ongoing programs at NSF that are relevant to Big Data and to ASAP. I will then present research being conducted in my research group that seeks a scalable systems and application-aware approach to addressing some of the challenges, from the many core and storage architectures to the systems and up to the applications.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"11 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2013-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83677767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-05DOI: 10.1109/ASAP.2013.6567541
R. Hartenstein
Summary form only given. Not only the multicore dilemma massively reduces programmer productivity and the progress of energy-efficient performance — a critical issue for the long term overall affordability of computing. Because of the Tunnel Vision Syndrome the solutions coming from a few isolated areas, are by far too slow and massively imperfect. Systolic arrays (SA) have been introduced by a mathematician. His synthesis method was “of course” algebraic, supporting only a few applications and sequencing concepts were “not his job”. A decade later we transformed this SA draft into a general purpose machine paradigm which was presented at the 3rd and 8th through 11th ASAP. The acceptance of our other fundamental idea, Term Rewriting System (TRS) top-down use for microchip design EDA, was delayed by the TRS expert scene: by 30 years! The R&D landscape requires radically new solutions. We must avoid the reductionist philosophies of most specialized research areas and introduce connected thinking to bridge the gaps between different paradigms and between several abstraction levels. We must urgently rethink all basic assumptions and far-reaching cooperation patterns.
{"title":"The tunnel vision syndrome: Massively delaying progress","authors":"R. Hartenstein","doi":"10.1109/ASAP.2013.6567541","DOIUrl":"https://doi.org/10.1109/ASAP.2013.6567541","url":null,"abstract":"Summary form only given. Not only the multicore dilemma massively reduces programmer productivity and the progress of energy-efficient performance — a critical issue for the long term overall affordability of computing. Because of the Tunnel Vision Syndrome the solutions coming from a few isolated areas, are by far too slow and massively imperfect. Systolic arrays (SA) have been introduced by a mathematician. His synthesis method was “of course” algebraic, supporting only a few applications and sequencing concepts were “not his job”. A decade later we transformed this SA draft into a general purpose machine paradigm which was presented at the 3rd and 8th through 11th ASAP. The acceptance of our other fundamental idea, Term Rewriting System (TRS) top-down use for microchip design EDA, was delayed by the TRS expert scene: by 30 years! The R&D landscape requires radically new solutions. We must avoid the reductionist philosophies of most specialized research areas and introduce connected thinking to bridge the gaps between different paradigms and between several abstraction levels. We must urgently rethink all basic assumptions and far-reaching cooperation patterns.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"456 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2013-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75101194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-09-11DOI: 10.1109/ASAP.2011.6043229
M. Flynn
The following topics are dealt with: reconfigurable systems; computer arithmetic; computer algorithm; system profiling; multicore processor; communication systems; GPU; accelerator; image processing and FPGA application.
{"title":"More than 50 years of parallel processing and still no easy path to speedup","authors":"M. Flynn","doi":"10.1109/ASAP.2011.6043229","DOIUrl":"https://doi.org/10.1109/ASAP.2011.6043229","url":null,"abstract":"The following topics are dealt with: reconfigurable systems; computer arithmetic; computer algorithm; system profiling; multicore processor; communication systems; GPU; accelerator; image processing and FPGA application.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"6 1","pages":"4"},"PeriodicalIF":0.0,"publicationDate":"2011-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90854519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-09-11DOI: 10.1109/ASAP.2011.6043230
V. Prasanna
As the Information and Communication (ICT) infrastructure continues to evolve, significant energy dissipation is incurred in the core routers. Core router performance will soon be limited by the power density. About two-thirds of the power dissipation in a router is in layer 3. Packet forwarding, classification, etc. contribute significantly to this. This talk explores architectures and algorithms for network functions including deep packet inspection and packet classification in core routers. We propose energy efficient designs to realize the “Green Internet” vision. We illustrate the performance improvements for such systems and demonstrate the suitability of FPGAs for these computations. We show that SRAM based solutions combined with FPGA based architectures lead to high throughput as well as reduced power dissipation compared with the state of the art solutions based TCAMs.
{"title":"Architectures for Green routers","authors":"V. Prasanna","doi":"10.1109/ASAP.2011.6043230","DOIUrl":"https://doi.org/10.1109/ASAP.2011.6043230","url":null,"abstract":"As the Information and Communication (ICT) infrastructure continues to evolve, significant energy dissipation is incurred in the core routers. Core router performance will soon be limited by the power density. About two-thirds of the power dissipation in a router is in layer 3. Packet forwarding, classification, etc. contribute significantly to this. This talk explores architectures and algorithms for network functions including deep packet inspection and packet classification in core routers. We propose energy efficient designs to realize the “Green Internet” vision. We illustrate the performance improvements for such systems and demonstrate the suitability of FPGAs for these computations. We show that SRAM based solutions combined with FPGA based architectures lead to high throughput as well as reduced power dissipation compared with the state of the art solutions based TCAMs.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"2 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2011-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78797896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-09-11DOI: 10.1109/ASAP.2011.6043228
J. Cong
In order to drastically improve the energy efficiency, we believe that future computer processors need to go beyond parallelization, and provide architecture support of customization and specialization so that the processor architecture can be adapted and optimized for different application domains. Customization can be made to computing cores, memory hierarchy, and network-on-chips for efficient adaptation for different workload. Also, we believe that future processor architectures will make extensive use of accelerators to further increase energy efficiency. Such architectures present many new challenges and opportunities, such as accelerator scheduling, sharing, memory hierarchy optimization, and efficient compilation and runtime support. In this talk, I shall present our ongoing research in these areas in the Center for Domain-Specific Computing.
{"title":"Era of customization and specialization","authors":"J. Cong","doi":"10.1109/ASAP.2011.6043228","DOIUrl":"https://doi.org/10.1109/ASAP.2011.6043228","url":null,"abstract":"In order to drastically improve the energy efficiency, we believe that future computer processors need to go beyond parallelization, and provide architecture support of customization and specialization so that the processor architecture can be adapted and optimized for different application domains. Customization can be made to computing cores, memory hierarchy, and network-on-chips for efficient adaptation for different workload. Also, we believe that future processor architectures will make extensive use of accelerators to further increase energy efficiency. Such architectures present many new challenges and opportunities, such as accelerator scheduling, sharing, memory hierarchy optimization, and efficient compilation and runtime support. In this talk, I shall present our ongoing research in these areas in the Center for Domain-Specific Computing.","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"3 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2011-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74268193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-07-07DOI: 10.1109/ASAP.2010.5540756
S. Nassif
In spite of numerous predictions to the contrary, Silicon technology is marching along past the 22nm node and on to ever finer dimensions. Innovations at the technology device, circuit and system levels continue to enable us to scale in spite of what sometime appear to be insurmountable problems in power, lack of performance, manufacturability and so on. To a large degree, these innovations are necessary because no substitute technology has been found as yet and, in fact, it does not appear likely that any such technology will become practical this decade. This leaves us with the need to anticipate and predict the near and medium term futures of CMOS for the next handful of technology nodes. This talk will focus on doing just that, and will show how an important new constraint on future system scaling is circuit resilience. Resilience is the ability of circuits to operate in spite of challenges like noise, difficult environmental conditions, ageing and manufacturing imperfections. These factors conspire to cause transient or permanent errors that are indistinguishable from traditional "hard" faults typically caused by defects during fabrication. Without significant innovation at the circuit and system levels, the probability of these events can rise quite dramatically. In the area of SRAM, such phenomena have existed for the last three or four technology nodes, but significant investments in this area have indeed allowed continued system level scaling with ever larger on-chip memories. As these same phenomena start attacking integrated circuits more pervasively, there is an urgent need for research and development in this area to avert the problems certain to arise with increased defect rates. This keynote paper explores the link between the old subject of manufacturing variability and its well-known impact on circuit performance, and the new subject of the way that same variability -in the extreme- can cause complete circuit failure. With care, we will find that the light at the end of the CMOS tunnel is the opening of new opportunities to enrich CMOS with new technologies like MEMS, optics, sensors and even biological devices. Otherwise, that light is likely to be another train…
{"title":"The light at the end of the CMOS tunnel","authors":"S. Nassif","doi":"10.1109/ASAP.2010.5540756","DOIUrl":"https://doi.org/10.1109/ASAP.2010.5540756","url":null,"abstract":"In spite of numerous predictions to the contrary, Silicon technology is marching along past the 22nm node and on to ever finer dimensions. Innovations at the technology device, circuit and system levels continue to enable us to scale in spite of what sometime appear to be insurmountable problems in power, lack of performance, manufacturability and so on. To a large degree, these innovations are necessary because no substitute technology has been found as yet and, in fact, it does not appear likely that any such technology will become practical this decade. This leaves us with the need to anticipate and predict the near and medium term futures of CMOS for the next handful of technology nodes. This talk will focus on doing just that, and will show how an important new constraint on future system scaling is circuit resilience. Resilience is the ability of circuits to operate in spite of challenges like noise, difficult environmental conditions, ageing and manufacturing imperfections. These factors conspire to cause transient or permanent errors that are indistinguishable from traditional \"hard\" faults typically caused by defects during fabrication. Without significant innovation at the circuit and system levels, the probability of these events can rise quite dramatically. In the area of SRAM, such phenomena have existed for the last three or four technology nodes, but significant investments in this area have indeed allowed continued system level scaling with ever larger on-chip memories. As these same phenomena start attacking integrated circuits more pervasively, there is an urgent need for research and development in this area to avert the problems certain to arise with increased defect rates. This keynote paper explores the link between the old subject of manufacturing variability and its well-known impact on circuit performance, and the new subject of the way that same variability -in the extreme- can cause complete circuit failure. With care, we will find that the light at the end of the CMOS tunnel is the opening of new opportunities to enrich CMOS with new technologies like MEMS, optics, sensors and even biological devices. Otherwise, that light is likely to be another train…","PeriodicalId":6642,"journal":{"name":"2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"9 1","pages":"4-9"},"PeriodicalIF":0.0,"publicationDate":"2010-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78394187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}