首页 > 最新文献

2007 25th International Conference on Computer Design最新文献

英文 中文
Twiddle factor transformation for pipelined FFT processing 流水线FFT处理的旋转因子变换
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601872
I. Park, WonHee Son, Ji-Hoon Kim
This paper presents a novel transformation technique that can derive various fast Fourier transform (FFT) in a unified paradigm. The proposed algorithm is to find a common twiddle factor at the input side of a butterfly and migrate it to the output side. Starting from the radix-2 FFT algorithm, the proposed common factor migration technique can generate most of previous FFT algorithms without using mathematical manipulation. In addition, we propose new FFT algorithms derived by applying the proposed twiddle factor moving technique, which reduce the number of twiddle factors significantly compared with the previous algorithms being widely used for pipelined FFT processing.
本文提出了一种新的变换技术,可以在统一的范式下导出各种快速傅里叶变换(FFT)。提出的算法是在蝴蝶的输入侧找到一个共同的旋转因子,并将其迁移到输出侧。从基数-2 FFT算法出发,提出的公因子迁移技术可以在不使用数学操作的情况下生成大多数以前的FFT算法。此外,我们提出了应用所提出的旋转因子移动技术衍生的新FFT算法,与之前广泛用于流水线FFT处理的算法相比,该算法显著减少了旋转因子的数量。
{"title":"Twiddle factor transformation for pipelined FFT processing","authors":"I. Park, WonHee Son, Ji-Hoon Kim","doi":"10.1109/ICCD.2007.4601872","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601872","url":null,"abstract":"This paper presents a novel transformation technique that can derive various fast Fourier transform (FFT) in a unified paradigm. The proposed algorithm is to find a common twiddle factor at the input side of a butterfly and migrate it to the output side. Starting from the radix-2 FFT algorithm, the proposed common factor migration technique can generate most of previous FFT algorithms without using mathematical manipulation. In addition, we propose new FFT algorithms derived by applying the proposed twiddle factor moving technique, which reduce the number of twiddle factors significantly compared with the previous algorithms being widely used for pipelined FFT processing.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84203304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A novel profile-driven technique for simultaneous power and code-size optimization of microcoded IPs 一种用于微编码ip同时优化功率和代码大小的新型配置文件驱动技术
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601960
B. Gorjiara, D. Gajski
Microcoded customized IPs have significantly better performance, yet larger code size, compared to similarly-sized instruction-based processors. Storing wide microcodes on-chip requires wide memory-blocks that occupy a large area and consume high leakage power. Therefore, addressing the code size of microcoded IPs is very important. In this paper, we introduce compression techniques that along with careful resolution of ldquodonpsilat carerdquo values (denoted by dasiaXpsila) in microcode can address the code size issue. We observed that dasiaXpsila values can be used for improving either dynamic power of IPs or their compression. However, achieving the efficiency of both is challenging. In this paper, we propose a profile-guided dasiaXpsila-resolution technique that can achieve both power and compression efficiency. Using our technique, the code size of microcoded IPs is reduced by 2.7 times, while saving 20% dynamic power, on average.
与类似大小的基于指令的处理器相比,微编码定制ip具有更好的性能,但代码大小更大。在片上存储宽微码需要宽存储块,占用面积大,泄漏功率高。因此,处理微编码ip的代码大小是非常重要的。在本文中,我们介绍了压缩技术,以及在微码中对ldquodonpsilat carerdquo值(由dasiaXpsila表示)的仔细解析可以解决代码大小问题。我们观察到,dasiaXpsila值可以用于提高ip的动态功率或其压缩。然而,实现两者的效率是具有挑战性的。在本文中,我们提出了一种轮廓引导的dasiaxsila -分辨率技术,可以同时实现功率和压缩效率。使用我们的技术,微编码ip的代码大小减少了2.7倍,同时平均节省20%的动态功率。
{"title":"A novel profile-driven technique for simultaneous power and code-size optimization of microcoded IPs","authors":"B. Gorjiara, D. Gajski","doi":"10.1109/ICCD.2007.4601960","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601960","url":null,"abstract":"Microcoded customized IPs have significantly better performance, yet larger code size, compared to similarly-sized instruction-based processors. Storing wide microcodes on-chip requires wide memory-blocks that occupy a large area and consume high leakage power. Therefore, addressing the code size of microcoded IPs is very important. In this paper, we introduce compression techniques that along with careful resolution of ldquodonpsilat carerdquo values (denoted by dasiaXpsila) in microcode can address the code size issue. We observed that dasiaXpsila values can be used for improving either dynamic power of IPs or their compression. However, achieving the efficiency of both is challenging. In this paper, we propose a profile-guided dasiaXpsila-resolution technique that can achieve both power and compression efficiency. Using our technique, the code size of microcoded IPs is reduced by 2.7 times, while saving 20% dynamic power, on average.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"68 1","pages":"609-614"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84101842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improving the reliability of on-chip L2 cache using redundancy 利用冗余提高片上L2缓存的可靠性
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601906
K. Bhattacharya, Soontae Kim, N. Ranganathan
The reliability of large on-chip L2 cache poses a significant challenge due to technology scaling trends. As the minimum feature size continues to decrease, the L2 caches become more vulnerable to multi-bit soft errors. Traditionally, L2 caches have been protected from multi-bit soft errors using techniques like using error detection/correction codes or employing physical interleaving of cache bit lines to convert multi-bit errors into single-bit errors. These methods, however, incur large overheads in area and power. In this work, we investigate several new techniques for reducing multi-bit errors in large L2 caches, in which the multi-bit errors are detected using simple error detection codes and corrected using the data redundancy in the memory hierarchy. Further, we develop a reliability aware replacement policy that dynamically trades performance for reliability whenever the soft-error budget is exceeded. In order to further improve reliability, we propose the duplication of the data values in cache lines by exploiting their small data widths. The proposed techniques were implemented in the Simplescalar framework and validated using the SPEC 2000 integer and floating point benchmarks. The proposed techniques improve the reliability of L2 caches by 40% and 32% on the average, for integer and floating point applications respectively, with little impact on performance and area.
由于技术的扩展趋势,大型片上二级缓存的可靠性提出了重大挑战。随着最小特征尺寸继续减小,L2缓存变得更容易受到多位软错误的影响。传统上,L2缓存通过使用错误检测/纠错码或使用缓存位线的物理交错将多比特错误转换为单比特错误等技术来防止多位软错误。然而,这些方法在面积和功率方面会产生很大的开销。在这项工作中,我们研究了几种用于减少大型L2缓存中多比特错误的新技术,其中使用简单的错误检测代码检测多比特错误,并使用内存层次结构中的数据冗余进行纠正。此外,我们还开发了一种可靠性感知替换策略,该策略在超出软错误预算时动态地以性能换取可靠性。为了进一步提高可靠性,我们建议利用缓存线的小数据宽度来复制数据值。提出的技术在Simplescalar框架中实现,并使用SPEC 2000整数和浮点基准测试进行验证。对于整数和浮点应用程序,所提出的技术将L2缓存的可靠性平均提高了40%和32%,对性能和面积的影响很小。
{"title":"Improving the reliability of on-chip L2 cache using redundancy","authors":"K. Bhattacharya, Soontae Kim, N. Ranganathan","doi":"10.1109/ICCD.2007.4601906","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601906","url":null,"abstract":"The reliability of large on-chip L2 cache poses a significant challenge due to technology scaling trends. As the minimum feature size continues to decrease, the L2 caches become more vulnerable to multi-bit soft errors. Traditionally, L2 caches have been protected from multi-bit soft errors using techniques like using error detection/correction codes or employing physical interleaving of cache bit lines to convert multi-bit errors into single-bit errors. These methods, however, incur large overheads in area and power. In this work, we investigate several new techniques for reducing multi-bit errors in large L2 caches, in which the multi-bit errors are detected using simple error detection codes and corrected using the data redundancy in the memory hierarchy. Further, we develop a reliability aware replacement policy that dynamically trades performance for reliability whenever the soft-error budget is exceeded. In order to further improve reliability, we propose the duplication of the data values in cache lines by exploiting their small data widths. The proposed techniques were implemented in the Simplescalar framework and validated using the SPEC 2000 integer and floating point benchmarks. The proposed techniques improve the reliability of L2 caches by 40% and 32% on the average, for integer and floating point applications respectively, with little impact on performance and area.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"7 1","pages":"224-229"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83646846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Contention-free switch-based implementation of 1024-point Radix-2 Fourier Transform Engine 基于无争用开关的1024点基数-2傅立叶变换引擎实现
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601873
H. Saleh, B. Mohd, A. Aziz, E. Swartzlander
This paper examines the use of a switch based architecture to implement a Radix-2 decimation in frequency fast Fourier transform engine. The architecture interconnects M processing elements with 2*M memories. An algorithm to detect and resolve memory access contention is presented. The implementation of 1024-point FFTs with 2 processing elements is discussed in detail, including timing and place-and-route results. The switch based architecture provides a factor of M speedup over a single processing element realization.
本文研究了在频率快速傅立叶变换引擎中使用基于开关的结构来实现基数-2抽取。该架构将M个处理元件与2*M存储器互连。提出了一种检测和解决内存访问争用的算法。详细讨论了具有2个处理单元的1024点fft的实现,包括时序和放置和路由结果。基于交换机的体系结构在单个处理元素实现上提供了M倍的加速。
{"title":"Contention-free switch-based implementation of 1024-point Radix-2 Fourier Transform Engine","authors":"H. Saleh, B. Mohd, A. Aziz, E. Swartzlander","doi":"10.1109/ICCD.2007.4601873","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601873","url":null,"abstract":"This paper examines the use of a switch based architecture to implement a Radix-2 decimation in frequency fast Fourier transform engine. The architecture interconnects M processing elements with 2*M memories. An algorithm to detect and resolve memory access contention is presented. The implementation of 1024-point FFTs with 2 processing elements is discussed in detail, including timing and place-and-route results. The switch based architecture provides a factor of M speedup over a single processing element realization.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"35 1","pages":"7-12"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75601104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving the reliability of on-chip data caches under process variations 在工艺变化下提高片上数据缓存的可靠性
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601920
Wei Wu, S. Tan, Jun Yang, Shih-Lien Lu
On-chip caches take a large portion of the chip area. They are much more vulnerable to parameter variation than smaller units. As leakage current becomes a significant component of the total power consumption, the leakage current variations induced thermal and reliability problem to the on-chip caches become an important design concern. This paper studies the impact of process variations, particular the leakage variations, on the temperature and reliability of on-chip caches. Our statistical simulation shows that, under process variation, 85% of the caches see shortened lifetime, with average lifetime being 81.6% of the ideal cache. At runtime, unevenly distributed dynamic power and the corresponding thermal variation would further deteriorate the situation. To mitigate this problem, we propose a dynamic cache subarray permutation scheme that can alleviate the thermal stress on a high-leakage area to improve the reliability of the caches. Experiments on 17 Spec2k benchmarks show that our scheme can extend the cache lifetime by up to 20.3%, and reduce the peak temperature by 7 degrees on average and more on data-intensive applications.
片上高速缓存占据了芯片面积的很大一部分。它们比较小的单元更容易受到参数变化的影响。随着泄漏电流成为总功耗的重要组成部分,泄漏电流变化引起的热和可靠性问题成为片上高速缓存设计的一个重要问题。本文研究了工艺变化,特别是泄漏变化对片上高速缓存的温度和可靠性的影响。我们的统计模拟显示,在进程变化的情况下,85%的缓存寿命缩短,平均寿命是理想缓存的81.6%。在运行时,动力分布的不均匀和相应的热变化会进一步恶化这种情况。为了解决这一问题,我们提出了一种动态缓存子阵列排列方案,该方案可以减轻高泄漏区域的热应力,从而提高缓存的可靠性。在17个Spec2k基准测试上的实验表明,我们的方案可以将缓存寿命延长20.3%,平均降低峰值温度7度,在数据密集型应用中甚至更多。
{"title":"Improving the reliability of on-chip data caches under process variations","authors":"Wei Wu, S. Tan, Jun Yang, Shih-Lien Lu","doi":"10.1109/ICCD.2007.4601920","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601920","url":null,"abstract":"On-chip caches take a large portion of the chip area. They are much more vulnerable to parameter variation than smaller units. As leakage current becomes a significant component of the total power consumption, the leakage current variations induced thermal and reliability problem to the on-chip caches become an important design concern. This paper studies the impact of process variations, particular the leakage variations, on the temperature and reliability of on-chip caches. Our statistical simulation shows that, under process variation, 85% of the caches see shortened lifetime, with average lifetime being 81.6% of the ideal cache. At runtime, unevenly distributed dynamic power and the corresponding thermal variation would further deteriorate the situation. To mitigate this problem, we propose a dynamic cache subarray permutation scheme that can alleviate the thermal stress on a high-leakage area to improve the reliability of the caches. Experiments on 17 Spec2k benchmarks show that our scheme can extend the cache lifetime by up to 20.3%, and reduce the peak temperature by 7 degrees on average and more on data-intensive applications.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"13 1","pages":"325-332"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72782691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Limits on voltage scaling for caches utilizing fault tolerant techniques 使用容错技术的缓存电压缩放限制
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601943
Avesta Sasan, A. Djahromi, A. Eltawil, F. Kurdahi
This paper proposes a new low power cache architecture that utilizes fault tolerance to allow aggressively reduced voltage levels. The fault tolerant overhead circuits consume little energy, but enable the system to operate correctly and boost the system performance to close to defect free operation. Overall, power savings of over 40% are reported on standard benchmarks.
本文提出了一种新的低功耗缓存架构,该架构利用容错能力来大幅降低电压水平。容错架空电路耗能小,但能使系统正常运行,提高系统性能,接近无缺陷运行。总体而言,在标准基准测试中报告的节电超过40%。
{"title":"Limits on voltage scaling for caches utilizing fault tolerant techniques","authors":"Avesta Sasan, A. Djahromi, A. Eltawil, F. Kurdahi","doi":"10.1109/ICCD.2007.4601943","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601943","url":null,"abstract":"This paper proposes a new low power cache architecture that utilizes fault tolerance to allow aggressively reduced voltage levels. The fault tolerant overhead circuits consume little energy, but enable the system to operate correctly and boost the system performance to close to defect free operation. Overall, power savings of over 40% are reported on standard benchmarks.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"92 1","pages":"488-495"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74325202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Exploring DRAM cache architectures for CMP server platforms 探索CMP服务器平台的DRAM缓存架构
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601880
Li Zhao, R. Iyer, R. Illikkal, D. Newell
As dual-core and quad-core processors arrive in the marketplace, the momentum behind CMP architectures continues to grow strong. As more and more cores/threads are placed on-die, the pressure on the memory subsystem is rapidly increasing. To address this issue, we explore DRAM cache architectures for CMP platforms. In this paper, we investigate the impact of introducing a low latency, large capacity and high bandwidth DRAM-based cache between the last level SRAM cache and memory subsystem. We first show the potential benefits of large DRAM caches for key commercial server workloads. As the primary hurdle to achieving these benefits with DRAM caches is the tag space overheads associated with them, we identify the most efficient DRAM cache organization and investigate various options. Our results show that the combination of 8-bit partial tags and 2-way sectoring achieves the highest performance (20% to 70%) with the lowest tag space (<25%) overhead.
随着双核和四核处理器进入市场,CMP架构背后的势头继续强劲增长。随着越来越多的内核/线程被放置在芯片上,内存子系统的压力正在迅速增加。为了解决这个问题,我们探索了CMP平台的DRAM缓存架构。在本文中,我们研究了在最后一级SRAM缓存和内存子系统之间引入低延迟,大容量和高带宽基于dram的缓存的影响。我们首先展示了大型DRAM缓存对关键商业服务器工作负载的潜在好处。由于使用DRAM缓存实现这些好处的主要障碍是与之相关的标签空间开销,因此我们确定了最有效的DRAM缓存组织并研究了各种选项。我们的结果表明,8位部分标签和双向分界的组合以最低的标签空间开销(<25%)实现了最高的性能(20%到70%)。
{"title":"Exploring DRAM cache architectures for CMP server platforms","authors":"Li Zhao, R. Iyer, R. Illikkal, D. Newell","doi":"10.1109/ICCD.2007.4601880","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601880","url":null,"abstract":"As dual-core and quad-core processors arrive in the marketplace, the momentum behind CMP architectures continues to grow strong. As more and more cores/threads are placed on-die, the pressure on the memory subsystem is rapidly increasing. To address this issue, we explore DRAM cache architectures for CMP platforms. In this paper, we investigate the impact of introducing a low latency, large capacity and high bandwidth DRAM-based cache between the last level SRAM cache and memory subsystem. We first show the potential benefits of large DRAM caches for key commercial server workloads. As the primary hurdle to achieving these benefits with DRAM caches is the tag space overheads associated with them, we identify the most efficient DRAM cache organization and investigate various options. Our results show that the combination of 8-bit partial tags and 2-way sectoring achieves the highest performance (20% to 70%) with the lowest tag space (<25%) overhead.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"74 1","pages":"55-62"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77376986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Fault-based alternate test of RF components 基于故障的射频组件交替测试
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601947
S. S. Akbay, A. Chatterjee
Defect-based RF testing is a strong candidate for providing the best solution in terms of ATE complexity and cost. However, specification-based testing is still the norm for analog/RF because of the limitations of analog fault models. Unfortunately, as the amount of functionality packed into individual devices is increased with each generation, the cost of testing larger numbers of specifications also increases. To address this, the alternate test methodology proposed in the past, which significantly cuts costs associated with specification tests by crafting a single test stimulus and mapping the response signatures into all specifications at once, can be modified for defect-based testing as well. In this work, we explore a new type of alternate test that is more fundamental than defect-based or specification-based approaches. Rather than focusing on physical defect mechanisms or the way individual specifications are measured, fault-based alternate test studies the abstractions of physical phenomena that cause specification violations; it unifies the benefits of reduced ATE complexity of defect-based approaches and the compact stimulus-signature pairs of specification-based alternate tests.
基于缺陷的射频测试是在ATE复杂性和成本方面提供最佳解决方案的有力候选。然而,由于模拟故障模型的局限性,基于规范的测试仍然是模拟/射频的规范。不幸的是,随着每一代设备中功能的增加,测试大量规格的成本也在增加。为了解决这个问题,过去提出的替代测试方法也可以针对基于缺陷的测试进行修改,该方法通过制作单个测试刺激并将响应签名一次性映射到所有规范中,从而显著降低了与规范测试相关的成本。在这项工作中,我们探索了一种新的替代测试类型,它比基于缺陷或基于规范的方法更基本。而不是关注物理缺陷机制或单个规范的测量方式,基于故障的替代测试研究导致规范违反的物理现象的抽象;它结合了基于缺陷的方法降低ATE复杂性的好处和基于规范的替代测试的紧凑刺激签名对。
{"title":"Fault-based alternate test of RF components","authors":"S. S. Akbay, A. Chatterjee","doi":"10.1109/ICCD.2007.4601947","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601947","url":null,"abstract":"Defect-based RF testing is a strong candidate for providing the best solution in terms of ATE complexity and cost. However, specification-based testing is still the norm for analog/RF because of the limitations of analog fault models. Unfortunately, as the amount of functionality packed into individual devices is increased with each generation, the cost of testing larger numbers of specifications also increases. To address this, the alternate test methodology proposed in the past, which significantly cuts costs associated with specification tests by crafting a single test stimulus and mapping the response signatures into all specifications at once, can be modified for defect-based testing as well. In this work, we explore a new type of alternate test that is more fundamental than defect-based or specification-based approaches. Rather than focusing on physical defect mechanisms or the way individual specifications are measured, fault-based alternate test studies the abstractions of physical phenomena that cause specification violations; it unifies the benefits of reduced ATE complexity of defect-based approaches and the compact stimulus-signature pairs of specification-based alternate tests.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"518-525"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81569939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Application of symbolic computer algebra to arithmetic circuit verification 符号计算机代数在算术电路验证中的应用
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601876
Yuki Watanabe, N. Homma, T. Aoki, T. Higuchi
This paper presents a formal approach to verify arithmetic circuits using symbolic computer algebra. Our method describes arithmetic circuits directly with high-level mathematical objects based on weighted number systems and arithmetic formulae. Such circuit description can be effectively verified by polynomial reduction techniques using Grobner Bases. In this paper, we describe how the symbolic computer algebra can be used to describe and verify arithmetic circuits. The advantageous effects of the proposed approach are demonstrated through experimental verification of some arithmetic circuits such as multiply-accumulator and FIR filter. The result shows that the proposed approach has a definite possibility of verifying practical arithmetic circuits where the conventional techniques failed.
本文提出了一种用符号计算机代数验证算术电路的形式化方法。该方法基于加权数系统和算术公式,直接用高级数学对象描述算术电路。利用Grobner基的多项式约简技术可以有效地验证这种电路描述。在本文中,我们描述了如何使用符号计算机代数来描述和验证算术电路。通过对乘加器和FIR滤波器等算术电路的实验验证,证明了该方法的优越性。结果表明,该方法具有一定的可行性,可用于验证传统方法无法实现的实际算法电路。
{"title":"Application of symbolic computer algebra to arithmetic circuit verification","authors":"Yuki Watanabe, N. Homma, T. Aoki, T. Higuchi","doi":"10.1109/ICCD.2007.4601876","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601876","url":null,"abstract":"This paper presents a formal approach to verify arithmetic circuits using symbolic computer algebra. Our method describes arithmetic circuits directly with high-level mathematical objects based on weighted number systems and arithmetic formulae. Such circuit description can be effectively verified by polynomial reduction techniques using Grobner Bases. In this paper, we describe how the symbolic computer algebra can be used to describe and verify arithmetic circuits. The advantageous effects of the proposed approach are demonstrated through experimental verification of some arithmetic circuits such as multiply-accumulator and FIR filter. The result shows that the proposed approach has a definite possibility of verifying practical arithmetic circuits where the conventional techniques failed.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"32 1","pages":"25-32"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86208841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Statistical simulation of chip multiprocessors running multi-program workloads 芯片多处理器运行多程序工作负载的统计模拟
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601940
Davy Genbrugge, L. Eeckhout
This paper explores statistical simulation as a fast simulation technique for driving chip multiprocessor (CMP) design space exploration. The idea of statistical simulation is to measure a number of important program execution characteristics, generate a synthetic trace, and simulate that synthetic trace. The important benefit is that a synthetic trace is very small compared to real program traces. This paper advances statistical simulation by modeling shared resources, such as shared caches and off-chip bandwidth. This is done (i) by collecting cache set access probabilities and per-set LRU stack depth profiles, and (ii) by modeling a programpsilas time-varying execution behavior in the synthetic trace. The key benefit is that the statistical profile is independent of a given cache configuration and the amount of multiprocessing, which enables statistical simulation to model conflict behavior in shared caches when multiple programs are co-executing on a CMP. We demonstrate that statistical simulation is both accurate and fast with average IPC prediction errors of less than 5.5% and simulation speedups of 40X to 70X compared to the detailed simulation of 100M-instruction traces. This makes statistical simulation a viable tool for CMP design space exploration.
本文探讨了统计仿真作为一种驱动芯片多处理器(CMP)设计空间探索的快速仿真技术。统计模拟的思想是度量许多重要的程序执行特征,生成合成跟踪,并模拟该合成跟踪。重要的好处是,与真正的程序跟踪相比,合成跟踪非常小。本文通过对共享资源(如共享缓存和片外带宽)的建模来推进统计仿真。这是通过(i)收集缓存集访问概率和每集LRU堆栈深度概况,以及(ii)通过在合成跟踪中对程序的时变执行行为进行建模来完成的。关键的好处是,统计概要文件独立于给定的缓存配置和多处理数量,这使得当多个程序在CMP上共同执行时,可以对共享缓存中的冲突行为进行统计模拟。我们证明了统计模拟既准确又快速,平均IPC预测误差小于5.5%,与100m指令迹线的详细模拟相比,模拟速度为40X至70X。这使得统计模拟成为CMP设计空间探索的可行工具。
{"title":"Statistical simulation of chip multiprocessors running multi-program workloads","authors":"Davy Genbrugge, L. Eeckhout","doi":"10.1109/ICCD.2007.4601940","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601940","url":null,"abstract":"This paper explores statistical simulation as a fast simulation technique for driving chip multiprocessor (CMP) design space exploration. The idea of statistical simulation is to measure a number of important program execution characteristics, generate a synthetic trace, and simulate that synthetic trace. The important benefit is that a synthetic trace is very small compared to real program traces. This paper advances statistical simulation by modeling shared resources, such as shared caches and off-chip bandwidth. This is done (i) by collecting cache set access probabilities and per-set LRU stack depth profiles, and (ii) by modeling a programpsilas time-varying execution behavior in the synthetic trace. The key benefit is that the statistical profile is independent of a given cache configuration and the amount of multiprocessing, which enables statistical simulation to model conflict behavior in shared caches when multiple programs are co-executing on a CMP. We demonstrate that statistical simulation is both accurate and fast with average IPC prediction errors of less than 5.5% and simulation speedups of 40X to 70X compared to the detailed simulation of 100M-instruction traces. This makes statistical simulation a viable tool for CMP design space exploration.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"26 1","pages":"464-471"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87554741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2007 25th International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1