首页 > 最新文献

2007 25th International Conference on Computer Design最新文献

英文 中文
A novel O(1) parallel deadlock detection algorithm and architecture for multi-unit resource systems 一种新的多单元资源系统O(1)并行死锁检测算法及体系结构
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601942
Xiang Xiao, J. Lee
This paper introduces a novel O(1) parallel deadlock detection approach for multi-unit resource system-on-a-chips (SoCs), inspired by Kimpsilas method in O(1) detection as well as Shiupsilas method in parallel processing. Our contributions are (i) the first O(1) hardware deadlock detection and (ii) O(min(m, n)) preparation, both for multi-unit resource systems, where m and n are the number of processes and resources, respectively. O(min(m, n)), previously O(m times n), is achieved by performing all the searches for sink nodes for each and every resource in parallel in hardware over a matrix representing resource allocations as well as other auxiliary matrices. Our experiments demonstrate that deadlock detection always takes two clock cycles.
本文在借鉴Kimpsilas方法和Shiupsilas方法的基础上,提出了一种适用于多单元资源片上系统(soc)的O(1)并行死锁检测方法。我们的贡献是:(i)第一个O(1)硬件死锁检测和(ii) O(min(m, n))准备,两者都适用于多单元资源系统,其中m和n分别是进程和资源的数量。O(min(m, n)),以前是O(m乘以n),通过在表示资源分配和其他辅助矩阵的矩阵上并行执行硬件中每个资源的汇聚节点的所有搜索来实现。我们的实验表明,死锁检测总是需要两个时钟周期。
{"title":"A novel O(1) parallel deadlock detection algorithm and architecture for multi-unit resource systems","authors":"Xiang Xiao, J. Lee","doi":"10.1109/ICCD.2007.4601942","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601942","url":null,"abstract":"This paper introduces a novel O(1) parallel deadlock detection approach for multi-unit resource system-on-a-chips (SoCs), inspired by Kimpsilas method in O(1) detection as well as Shiupsilas method in parallel processing. Our contributions are (i) the first O(1) hardware deadlock detection and (ii) O(min(m, n)) preparation, both for multi-unit resource systems, where m and n are the number of processes and resources, respectively. O(min(m, n)), previously O(m times n), is achieved by performing all the searches for sink nodes for each and every resource in parallel in hardware over a matrix representing resource allocations as well as other auxiliary matrices. Our experiments demonstrate that deadlock detection always takes two clock cycles.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"12 1","pages":"480-487"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82225130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Compiler-assisted architectural support for program code integrity monitoring in application-specific instruction set processors 在特定于应用程序的指令集处理器中对程序代码完整性监视的编译器辅助体系结构支持
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601899
Hai Lin, Xuan Guan, Yunsi Fei, Z. Shi
(ASIPs) are being increasingly used in mobile embedded systems, the ubiquitous networking connections have exposed these systems under various malicious security attacks, which may alter the program code running on the systems. In addition, soft errors in microprocessors can also change program code and result in system malfunction. At the instruction level, all code modifications are manifested as bit flips. In this work, we present a generalized methodology for monitoring code integrity at run-time in ASIPs, where both the instruction set architecture (ISA) and the underlying microarchitecture can be customized for a particular application domain. Based on the microoperation-based monitoring architecture that we have presented in previous work, we propose a compiler-assisted and application-controlled management approach for the monitoring architecture. Experimental results show that compared with the OS-managed scheme and other compiler-assisted schemes, our approach can detect program code integrity compromises with much less performance degradation.
(asip)在移动嵌入式系统中的应用越来越多,无处不在的网络连接使这些系统暴露在各种恶意安全攻击之下,这些攻击可能会改变系统上运行的程序代码。此外,微处理器中的软错误也会改变程序代码,导致系统故障。在指令级,所有代码修改都表现为位翻转。在这项工作中,我们提出了一种在api运行时监控代码完整性的通用方法,其中指令集体系结构(ISA)和底层微体系结构都可以针对特定的应用领域进行定制。基于我们在之前的工作中提出的基于微操作的监控体系结构,我们提出了一种编译器辅助和应用程序控制的监控体系结构管理方法。实验结果表明,与操作系统管理的方案和其他编译器辅助的方案相比,我们的方法可以检测到程序代码完整性的损害,并且性能下降很小。
{"title":"Compiler-assisted architectural support for program code integrity monitoring in application-specific instruction set processors","authors":"Hai Lin, Xuan Guan, Yunsi Fei, Z. Shi","doi":"10.1109/ICCD.2007.4601899","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601899","url":null,"abstract":"(ASIPs) are being increasingly used in mobile embedded systems, the ubiquitous networking connections have exposed these systems under various malicious security attacks, which may alter the program code running on the systems. In addition, soft errors in microprocessors can also change program code and result in system malfunction. At the instruction level, all code modifications are manifested as bit flips. In this work, we present a generalized methodology for monitoring code integrity at run-time in ASIPs, where both the instruction set architecture (ISA) and the underlying microarchitecture can be customized for a particular application domain. Based on the microoperation-based monitoring architecture that we have presented in previous work, we propose a compiler-assisted and application-controlled management approach for the monitoring architecture. Experimental results show that compared with the OS-managed scheme and other compiler-assisted schemes, our approach can detect program code integrity compromises with much less performance degradation.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"9 1","pages":"187-193"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72668289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Tutorial: Software-defined radio technology 教程:软件定义无线电技术
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601887
M. Cummings, T. Cooklev
Software defined radio (SDR) is one of the most important emerging disruptive technologies that shaped wireless communication and mobile computing industries. The "ideal" software radio consists of a wideband antenna, wideband ADC and DAC, and a programmable processor. This paper discusses the development of software radios along with their applications in different fields of telecommunication.
软件定义无线电(SDR)是影响无线通信和移动计算行业的最重要的新兴颠覆性技术之一。“理想的”软件无线电由宽带天线、宽带ADC和DAC以及可编程处理器组成。本文讨论了软件无线电的发展及其在电信各个领域的应用。
{"title":"Tutorial: Software-defined radio technology","authors":"M. Cummings, T. Cooklev","doi":"10.1109/ICCD.2007.4601887","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601887","url":null,"abstract":"Software defined radio (SDR) is one of the most important emerging disruptive technologies that shaped wireless communication and mobile computing industries. The \"ideal\" software radio consists of a wideband antenna, wideband ADC and DAC, and a programmable processor. This paper discusses the development of software radios along with their applications in different fields of telecommunication.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"32 1","pages":"103-104"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75743650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Post-layout comparison of high performance 64b static adders in energy-delay space 高性能64b静态加法器在能量延迟空间的布局后比较
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601931
Sheng Sun, C. Sechen
Our objective was to determine the most energy efficient 64 b static CMOS adder architecture, for a range of high-performance delay targets. We examine extensively carry-lookahead (CLA) and carry-select adders with a wide range of tradeoffs in logic levels, fanouts and wiring complexity. We propose sparse CLA adder architectures based on buffering techniques to reduce logic redundancy and improve energy efficiency. All the designs were implemented using an energy-delay layout optimization flow with full RC extraction. Our new 64 b adder designs have a relative delay as low as 9.9 F04 (fanout-offour inverter) delays and promise better scaling for smaller technology nodes. They yield the best energy efficiency for a wide range of delay targets and are 30%, 15% and 7% more energy efficient than full Kogge-Stone, sparse-2 Kogge-Stone and Han-Carlson, respectively, at the fastest points. They consume only about 1/3 the energy of dynamic adders.
我们的目标是确定最节能的64 b静态CMOS加法器架构,用于一系列高性能延迟目标。我们广泛地研究了超前进位(CLA)和进位选择加法器,在逻辑电平、扇出和布线复杂性方面进行了广泛的权衡。我们提出了基于缓冲技术的稀疏CLA加法器架构,以减少逻辑冗余并提高能源效率。所有的设计都是使用全RC提取的能量延迟布局优化流程来实现的。我们新的64 b加法器设计的相对延迟低至9.9 F04(四扇输出逆变器)延迟,并承诺为更小的技术节点提供更好的缩放。对于大范围的延迟目标,它们产生了最佳的能源效率,在最快的点上,它们的能源效率分别比全Kogge-Stone、稀疏-2 Kogge-Stone和汉-卡尔森高30%、15%和7%。它们消耗的能量只有动态加法器的1/3左右。
{"title":"Post-layout comparison of high performance 64b static adders in energy-delay space","authors":"Sheng Sun, C. Sechen","doi":"10.1109/ICCD.2007.4601931","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601931","url":null,"abstract":"Our objective was to determine the most energy efficient 64 b static CMOS adder architecture, for a range of high-performance delay targets. We examine extensively carry-lookahead (CLA) and carry-select adders with a wide range of tradeoffs in logic levels, fanouts and wiring complexity. We propose sparse CLA adder architectures based on buffering techniques to reduce logic redundancy and improve energy efficiency. All the designs were implemented using an energy-delay layout optimization flow with full RC extraction. Our new 64 b adder designs have a relative delay as low as 9.9 F04 (fanout-offour inverter) delays and promise better scaling for smaller technology nodes. They yield the best energy efficiency for a wide range of delay targets and are 30%, 15% and 7% more energy efficient than full Kogge-Stone, sparse-2 Kogge-Stone and Han-Carlson, respectively, at the fastest points. They consume only about 1/3 the energy of dynamic adders.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"8 1","pages":"401-408"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77796177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A parallel IEEE P754 decimal floating-point multiplier 并行IEEE P754十进制浮点乘法器
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601916
Brian J. Hickmann, A. Krioukov, M. Schulte, M. A. Erle
Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.
十进制浮点乘法在许多商业应用程序中都很重要,包括银行、税收计算、货币转换和其他金融领域。本文提出了一种符合IEEE P754浮点运算标准(IEEE P754)最新草案的全并行十进制浮点乘法器。该设计的新颖之处在于,它是第一个提供低延迟和高吞吐量的并行十进制浮点乘法器。本设计基于先前发表的并行定点十进制乘法器,该乘法器使用交替十进制数字编码来减少面积和延迟。通过添加一些组件,包括指数生成、舍入、移动和异常处理,将定点设计扩展为支持浮点乘法。面积和延迟估计显示,与唯一发布的符合IEEE P754的顺序浮点乘法器相比,面积大幅增加,延迟和吞吐量得到了显著改善。据我们所知,这是第一个提出符合IEEE P754的完全并行十进制浮点乘法器的出版物。
{"title":"A parallel IEEE P754 decimal floating-point multiplier","authors":"Brian J. Hickmann, A. Krioukov, M. Schulte, M. A. Erle","doi":"10.1109/ICCD.2007.4601916","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601916","url":null,"abstract":"Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"50 1","pages":"296-303"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81149755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Fast power network analysis with multiple clock domains 具有多个时钟域的快速电源网络分析
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601939
Wanping Zhang, Ling Zhang, Rui Shi, He Peng, Zhi Zhu, L. Chua-Eoan, R. Murgai, Toshiyuki Shibuya, N. Ito, Chung-Kuan Cheng
This paper proposes an efficient analysis flow and an algorithm to identify the worst case noise for power networks with multiple clock domains. First, we apply the Laplace transform on the input current sources to derive the analytical formula. Then, we calculate the circuit frequency response with logarithmic scale frequency components. The frequency domain response is approximated by a rational function using vector fitting modeling. The rational function is used to derive the natural frequency of the power ground networks, and can be converted back into time domain easily. Based on the analysis results, we then present the worst case clock gating pattern algorithm to analyze the power networks with multiple clock domains. The most expensive part of the proposed algorithm is the matrix solving: O(F(N) ldr log f ldr D). Function F is the complexity of iterative solution of complex matrix with dimension N. We assume that there are D clock domains and the frequency spans from 0 to f Hz. Experimental results show that our method is up to 60X faster than HSPICE, and can analyze large circuits which are not affordable by HSPICE.
本文提出了一种有效的多时钟域电网最坏情况噪声识别分析流程和算法。首先,我们对输入电流源进行拉普拉斯变换,推导出解析公式。然后,我们用对数尺度频率分量计算电路的频率响应。频域响应近似为有理函数,采用向量拟合建模。利用有理函数推导出电力地网的固有频率,并可方便地转换回时域。在分析结果的基础上,提出了最坏情况下的时钟门控模式算法,用于分析具有多个时钟域的电网。该算法最昂贵的部分是矩阵求解:O(F(N) ldr log F ldr D)。函数F是维数为N的复矩阵迭代解的复杂度。我们假设有D个时钟域,频率从0到fhz。实验结果表明,该方法的速度比HSPICE快60倍,可以分析HSPICE无法负担的大型电路。
{"title":"Fast power network analysis with multiple clock domains","authors":"Wanping Zhang, Ling Zhang, Rui Shi, He Peng, Zhi Zhu, L. Chua-Eoan, R. Murgai, Toshiyuki Shibuya, N. Ito, Chung-Kuan Cheng","doi":"10.1109/ICCD.2007.4601939","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601939","url":null,"abstract":"This paper proposes an efficient analysis flow and an algorithm to identify the worst case noise for power networks with multiple clock domains. First, we apply the Laplace transform on the input current sources to derive the analytical formula. Then, we calculate the circuit frequency response with logarithmic scale frequency components. The frequency domain response is approximated by a rational function using vector fitting modeling. The rational function is used to derive the natural frequency of the power ground networks, and can be converted back into time domain easily. Based on the analysis results, we then present the worst case clock gating pattern algorithm to analyze the power networks with multiple clock domains. The most expensive part of the proposed algorithm is the matrix solving: O(F(N) ldr log f ldr D). Function F is the complexity of iterative solution of complex matrix with dimension N. We assume that there are D clock domains and the frequency spans from 0 to f Hz. Experimental results show that our method is up to 60X faster than HSPICE, and can analyze large circuits which are not affordable by HSPICE.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"52 1","pages":"456-463"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81632033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Amdahl’s figure of merit, SiGe HBT BiCMOS, and 3D chip stacking Amdahl的优点图,SiGe HBT BiCMOS和3D芯片堆叠
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601901
P. Jacob, A. Zia, Okan Erdogan, P. Belemjian, Peng Jin, Jin Woo Kim, M. Chu, R. Kraft, J. McDonald
Forty years ago Gene Amdahl published a figure of merit for parallel computation, which proved extremely controversial. The controversy still rages today, although those that have looked closely at this figure of merit conclude that it is correct, but perhaps misinterpreted. In this paper we will look at a small variation on that law that suggests computer designers should take a closer look at two emerging technologies, SiGe HBT BiCMOS and 3D chip stacking. We may be overlooking a way to continue the clock race, and in so doing accomplish better parallelism.
四十年前,吉恩·阿姆达尔发表了一个并行计算的优点图,结果引起了极大的争议。尽管那些仔细研究过这一价值数字的人得出结论认为这是正确的,但可能被误解了,但争论至今仍在激烈进行。在本文中,我们将研究该定律的一个小变化,该定律建议计算机设计师应该仔细研究两种新兴技术,SiGe HBT BiCMOS和3D芯片堆叠。我们可能忽略了一种继续时钟竞赛的方法,这样做可以实现更好的并行性。
{"title":"Amdahl’s figure of merit, SiGe HBT BiCMOS, and 3D chip stacking","authors":"P. Jacob, A. Zia, Okan Erdogan, P. Belemjian, Peng Jin, Jin Woo Kim, M. Chu, R. Kraft, J. McDonald","doi":"10.1109/ICCD.2007.4601901","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601901","url":null,"abstract":"Forty years ago Gene Amdahl published a figure of merit for parallel computation, which proved extremely controversial. The controversy still rages today, although those that have looked closely at this figure of merit conclude that it is correct, but perhaps misinterpreted. In this paper we will look at a small variation on that law that suggests computer designers should take a closer look at two emerging technologies, SiGe HBT BiCMOS and 3D chip stacking. We may be overlooking a way to continue the clock race, and in so doing accomplish better parallelism.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"330 1","pages":"202-207"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76367543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A technique for selecting CMOS transistor orders 一种选择CMOS晶体管阶数的技术
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601936
T. Chiang, C. Y. Chen, Weiyu Chen
Transistor reordering has been known to be effective in reducing delays of a circuit with nearly zero penalties. However, techniques to determine good transistor orders have not been proposed in literature. Previous work on this has to resort to running SPICE for all meaningful transistor orders and selecting a best one, which is extremely time-consuming. This paper proposes an efficient and accurate technique for determining best transistor orders without running SPICE simulations. Experimental results from SPICE3 show that the predictions are very accurate.
众所周知,晶体管重新排序在减少电路延迟方面几乎是零损失的有效方法。然而,确定良好晶体管顺序的技术尚未在文献中提出。以前在这方面的工作必须诉诸于运行SPICE所有有意义的晶体管订单和选择一个最好的,这是非常耗时的。本文提出了一种有效而准确的技术,可以在不运行SPICE模拟的情况下确定最佳晶体管顺序。SPICE3的实验结果表明,预测是非常准确的。
{"title":"A technique for selecting CMOS transistor orders","authors":"T. Chiang, C. Y. Chen, Weiyu Chen","doi":"10.1109/ICCD.2007.4601936","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601936","url":null,"abstract":"Transistor reordering has been known to be effective in reducing delays of a circuit with nearly zero penalties. However, techniques to determine good transistor orders have not been proposed in literature. Previous work on this has to resort to running SPICE for all meaningful transistor orders and selecting a best one, which is extremely time-consuming. This paper proposes an efficient and accurate technique for determining best transistor orders without running SPICE simulations. Experimental results from SPICE3 show that the predictions are very accurate.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"5 1","pages":"438-443"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84995446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Detecting errors in a polynomial basis multiplier using multiple parity bits for both inputs 对两个输入使用多个奇偶校验位检测多项式基乘法器中的错误
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601926
Siavash Bayat Sarmadi, M. A. Hasan
This paper investigates the concurrent detection of multiple-bit errors in polynomial basis (PB) multipliers over binary extension fields. To this end, multiple parity bits are considered for both inputs of the multiplier. For the multiplier architecture considered here, the two inputs go through considerably different sets of circuits and this allows us to use different number of parity bits with the inputs. In a bit-parallel implementation of a GF(2163) PB multiplier with eight parity bits for the first input and three parity bits for the second input, the area overhead and the probability of error detection are approximately 55.59% and 0.997, respectively. Additionally, the average time overhead of the scheme implemented in a bit-parallel fashion is approximately 25%.
研究了二进制扩展域上多项式基乘法器中多比特错误的并发检测。为此,对乘法器的两个输入都考虑了多个奇偶校验位。对于这里考虑的乘法器架构,两个输入经过相当不同的电路集,这允许我们对输入使用不同数量的奇偶校验位。在GF(2163) PB乘法器的位并行实现中,第一个输入为8个奇偶校验位,第二个输入为3个奇偶校验位,面积开销和错误检测概率分别约为55.59%和0.997。此外,以位并行方式实现的方案的平均时间开销约为25%。
{"title":"Detecting errors in a polynomial basis multiplier using multiple parity bits for both inputs","authors":"Siavash Bayat Sarmadi, M. A. Hasan","doi":"10.1109/ICCD.2007.4601926","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601926","url":null,"abstract":"This paper investigates the concurrent detection of multiple-bit errors in polynomial basis (PB) multipliers over binary extension fields. To this end, multiple parity bits are considered for both inputs of the multiplier. For the multiplier architecture considered here, the two inputs go through considerably different sets of circuits and this allows us to use different number of parity bits with the inputs. In a bit-parallel implementation of a GF(2163) PB multiplier with eight parity bits for the first input and three parity bits for the second input, the area overhead and the probability of error detection are approximately 55.59% and 0.997, respectively. Additionally, the average time overhead of the scheme implemented in a bit-parallel fashion is approximately 25%.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"368-375"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83909991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamically compressible context architecture for low power coarse-grained reconfigurable array 低功耗粗粒度可重构阵列的动态压缩上下文架构
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601930
Yoonjin Kim, R. Mahapatra
Most of the coarse-grained reconfigurable array architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead.
大多数粗粒度可重构阵列架构(CGRAs)由可重构ALU阵列和配置缓存(或上下文内存)组成,以实现高性能和灵活性。特别地,配置缓存是CGRA的主要组成部分,它为每个周期的动态重新配置提供了独特的特性。但是,动态重新配置的频繁内存读取操作会导致大量的功耗。因此,降低配置缓存中的功耗对于CGRA在嵌入式系统中使用时更具竞争力和可靠性变得至关重要。在本文中,我们提出了动态压缩上下文架构,以节省配置缓存的功耗。这种高效的上下文体系结构设计不会降低CGRA的性能和灵活性。实验结果表明,该方法在配置缓存时节省了39.72%的功耗,而面积开销可以忽略不计。
{"title":"Dynamically compressible context architecture for low power coarse-grained reconfigurable array","authors":"Yoonjin Kim, R. Mahapatra","doi":"10.1109/ICCD.2007.4601930","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601930","url":null,"abstract":"Most of the coarse-grained reconfigurable array architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"395-400"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85328436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2007 25th International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1