2007 25th International Conference on Computer Design最新文献

英文中文

A novel O(1) parallel deadlock detection algorithm and architecture for multi-unit resource systems 一种新的多单元资源系统O(1)并行死锁检测算法及体系结构

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601942

Xiang Xiao, J. Lee

This paper introduces a novel O(1) parallel deadlock detection approach for multi-unit resource system-on-a-chips (SoCs), inspired by Kimpsilas method in O(1) detection as well as Shiupsilas method in parallel processing. Our contributions are (i) the first O(1) hardware deadlock detection and (ii) O(min(m, n)) preparation, both for multi-unit resource systems, where m and n are the number of processes and resources, respectively. O(min(m, n)), previously O(m times n), is achieved by performing all the searches for sink nodes for each and every resource in parallel in hardware over a matrix representing resource allocations as well as other auxiliary matrices. Our experiments demonstrate that deadlock detection always takes two clock cycles.

本文在借鉴Kimpsilas方法和Shiupsilas方法的基础上，提出了一种适用于多单元资源片上系统(soc)的O(1)并行死锁检测方法。我们的贡献是:(i)第一个O(1)硬件死锁检测和(ii) O(min(m, n))准备，两者都适用于多单元资源系统，其中m和n分别是进程和资源的数量。O(min(m, n))，以前是O(m乘以n)，通过在表示资源分配和其他辅助矩阵的矩阵上并行执行硬件中每个资源的汇聚节点的所有搜索来实现。我们的实验表明，死锁检测总是需要两个时钟周期。

引用次数: 2

Tutorial: Software-defined radio technology 教程:软件定义无线电技术

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601887

M. Cummings, T. Cooklev

Software defined radio (SDR) is one of the most important emerging disruptive technologies that shaped wireless communication and mobile computing industries. The "ideal" software radio consists of a wideband antenna, wideband ADC and DAC, and a programmable processor. This paper discusses the development of software radios along with their applications in different fields of telecommunication.

软件定义无线电(SDR)是影响无线通信和移动计算行业的最重要的新兴颠覆性技术之一。“理想的”软件无线电由宽带天线、宽带ADC和DAC以及可编程处理器组成。本文讨论了软件无线电的发展及其在电信各个领域的应用。

引用次数: 7

Amdahl’s figure of merit, SiGe HBT BiCMOS, and 3D chip stacking Amdahl的优点图，SiGe HBT BiCMOS和3D芯片堆叠

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601901

P. Jacob, A. Zia, Okan Erdogan, P. Belemjian, Peng Jin, Jin Woo Kim, M. Chu, R. Kraft, J. McDonald

Forty years ago Gene Amdahl published a figure of merit for parallel computation, which proved extremely controversial. The controversy still rages today, although those that have looked closely at this figure of merit conclude that it is correct, but perhaps misinterpreted. In this paper we will look at a small variation on that law that suggests computer designers should take a closer look at two emerging technologies, SiGe HBT BiCMOS and 3D chip stacking. We may be overlooking a way to continue the clock race, and in so doing accomplish better parallelism.

四十年前，吉恩·阿姆达尔发表了一个并行计算的优点图，结果引起了极大的争议。尽管那些仔细研究过这一价值数字的人得出结论认为这是正确的，但可能被误解了，但争论至今仍在激烈进行。在本文中，我们将研究该定律的一个小变化，该定律建议计算机设计师应该仔细研究两种新兴技术，SiGe HBT BiCMOS和3D芯片堆叠。我们可能忽略了一种继续时钟竞赛的方法，这样做可以实现更好的并行性。

引用次数: 3

Post-layout comparison of high performance 64b static adders in energy-delay space 高性能64b静态加法器在能量延迟空间的布局后比较

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601931

Sheng Sun, C. Sechen

Our objective was to determine the most energy efficient 64 b static CMOS adder architecture, for a range of high-performance delay targets. We examine extensively carry-lookahead (CLA) and carry-select adders with a wide range of tradeoffs in logic levels, fanouts and wiring complexity. We propose sparse CLA adder architectures based on buffering techniques to reduce logic redundancy and improve energy efficiency. All the designs were implemented using an energy-delay layout optimization flow with full RC extraction. Our new 64 b adder designs have a relative delay as low as 9.9 F04 (fanout-offour inverter) delays and promise better scaling for smaller technology nodes. They yield the best energy efficiency for a wide range of delay targets and are 30%, 15% and 7% more energy efficient than full Kogge-Stone, sparse-2 Kogge-Stone and Han-Carlson, respectively, at the fastest points. They consume only about 1/3 the energy of dynamic adders.

我们的目标是确定最节能的64 b静态CMOS加法器架构，用于一系列高性能延迟目标。我们广泛地研究了超前进位(CLA)和进位选择加法器，在逻辑电平、扇出和布线复杂性方面进行了广泛的权衡。我们提出了基于缓冲技术的稀疏CLA加法器架构，以减少逻辑冗余并提高能源效率。所有的设计都是使用全RC提取的能量延迟布局优化流程来实现的。我们新的64 b加法器设计的相对延迟低至9.9 F04(四扇输出逆变器)延迟，并承诺为更小的技术节点提供更好的缩放。对于大范围的延迟目标，它们产生了最佳的能源效率，在最快的点上，它们的能源效率分别比全Kogge-Stone、稀疏-2 Kogge-Stone和汉-卡尔森高30%、15%和7%。它们消耗的能量只有动态加法器的1/3左右。

引用次数: 5

Compiler-assisted architectural support for program code integrity monitoring in application-specific instruction set processors 在特定于应用程序的指令集处理器中对程序代码完整性监视的编译器辅助体系结构支持

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601899

Hai Lin, Xuan Guan, Yunsi Fei, Z. Shi

(ASIPs) are being increasingly used in mobile embedded systems, the ubiquitous networking connections have exposed these systems under various malicious security attacks, which may alter the program code running on the systems. In addition, soft errors in microprocessors can also change program code and result in system malfunction. At the instruction level, all code modifications are manifested as bit flips. In this work, we present a generalized methodology for monitoring code integrity at run-time in ASIPs, where both the instruction set architecture (ISA) and the underlying microarchitecture can be customized for a particular application domain. Based on the microoperation-based monitoring architecture that we have presented in previous work, we propose a compiler-assisted and application-controlled management approach for the monitoring architecture. Experimental results show that compared with the OS-managed scheme and other compiler-assisted schemes, our approach can detect program code integrity compromises with much less performance degradation.

(asip)在移动嵌入式系统中的应用越来越多，无处不在的网络连接使这些系统暴露在各种恶意安全攻击之下，这些攻击可能会改变系统上运行的程序代码。此外，微处理器中的软错误也会改变程序代码，导致系统故障。在指令级，所有代码修改都表现为位翻转。在这项工作中，我们提出了一种在api运行时监控代码完整性的通用方法，其中指令集体系结构(ISA)和底层微体系结构都可以针对特定的应用领域进行定制。基于我们在之前的工作中提出的基于微操作的监控体系结构，我们提出了一种编译器辅助和应用程序控制的监控体系结构管理方法。实验结果表明，与操作系统管理的方案和其他编译器辅助的方案相比，我们的方法可以检测到程序代码完整性的损害，并且性能下降很小。

引用次数: 5

Detecting errors in a polynomial basis multiplier using multiple parity bits for both inputs 对两个输入使用多个奇偶校验位检测多项式基乘法器中的错误

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601926

Siavash Bayat Sarmadi, M. A. Hasan

This paper investigates the concurrent detection of multiple-bit errors in polynomial basis (PB) multipliers over binary extension fields. To this end, multiple parity bits are considered for both inputs of the multiplier. For the multiplier architecture considered here, the two inputs go through considerably different sets of circuits and this allows us to use different number of parity bits with the inputs. In a bit-parallel implementation of a GF(2163) PB multiplier with eight parity bits for the first input and three parity bits for the second input, the area overhead and the probability of error detection are approximately 55.59% and 0.997, respectively. Additionally, the average time overhead of the scheme implemented in a bit-parallel fashion is approximately 25%.

研究了二进制扩展域上多项式基乘法器中多比特错误的并发检测。为此，对乘法器的两个输入都考虑了多个奇偶校验位。对于这里考虑的乘法器架构，两个输入经过相当不同的电路集，这允许我们对输入使用不同数量的奇偶校验位。在GF(2163) PB乘法器的位并行实现中，第一个输入为8个奇偶校验位，第二个输入为3个奇偶校验位，面积开销和错误检测概率分别约为55.59%和0.997。此外，以位并行方式实现的方案的平均时间开销约为25%。

引用次数: 3

Accurate modeling and fault simulation of Byzantine resistive bridges 拜占庭式电阻桥的精确建模与故障仿真

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601923

H. Cheung, S. Gupta

Many recent studies show that a resistive bridging fault may cause intermediate voltages at the bridging fault site. Since the gates in the fanout of the fault site may have distinct and multiple logic threshold voltages, namely VIL and VIH, these gates may interpret the intermediate voltage as logic '1', logic '0', or logically indeterminate. Such fault behavior is described as the bridging fault Byzantine general problem (T. Nanya et al., Nov. 1989). None of the existing models of bridging faults used by bridging fault simulators accurately captures the indeterminate logic behavior of such bridges. We present a resistive bridging fault model that accurately yet efficiently captures indeterminate logic values. We also describe an efficient PPSFP bridging fault simulator and show that all previous approaches seriously overestimate bridging fault coverage.

近年来的许多研究表明，阻性桥接故障可能在桥接故障点产生中间电压。由于故障点的扇出门可能具有不同的多个逻辑阈值电压，即VIL和VIH，因此这些门可能将中间电压解释为逻辑“1”、逻辑“0”或逻辑不确定。这种故障行为被描述为桥接故障拜占庭一般问题(T. Nanya et al.， Nov. 1989)。桥接故障模拟器所使用的现有桥接故障模型都不能准确地捕捉此类桥的不确定逻辑行为。我们提出了一种准确而有效地捕获不确定逻辑值的电阻桥接故障模型。我们还描述了一个高效的PPSFP桥接故障模拟器，并表明所有以前的方法都严重高估了桥接故障覆盖率。

引用次数: 4

A parallel IEEE P754 decimal floating-point multiplier 并行IEEE P754十进制浮点乘法器

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601916

Brian J. Hickmann, A. Krioukov, M. Schulte, M. A. Erle

Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.

十进制浮点乘法在许多商业应用程序中都很重要，包括银行、税收计算、货币转换和其他金融领域。本文提出了一种符合IEEE P754浮点运算标准(IEEE P754)最新草案的全并行十进制浮点乘法器。该设计的新颖之处在于，它是第一个提供低延迟和高吞吐量的并行十进制浮点乘法器。本设计基于先前发表的并行定点十进制乘法器，该乘法器使用交替十进制数字编码来减少面积和延迟。通过添加一些组件，包括指数生成、舍入、移动和异常处理，将定点设计扩展为支持浮点乘法。面积和延迟估计显示，与唯一发布的符合IEEE P754的顺序浮点乘法器相比，面积大幅增加，延迟和吞吐量得到了显著改善。据我们所知，这是第一个提出符合IEEE P754的完全并行十进制浮点乘法器的出版物。

{"title":"A parallel IEEE P754 decimal floating-point multiplier","authors":"Brian J. Hickmann, A. Krioukov, M. Schulte, M. A. Erle","doi":"10.1109/ICCD.2007.4601916","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601916","url":null,"abstract":"Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"50 1","pages":"296-303"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81149755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

SCAFFI: An intrachip FPGA asynchronous interface based on hard macros 基于硬宏的片内FPGA异步接口

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601950

Julian J. H. Pontes, R. Soares, Ewerson Carvalho, F. Moraes, Ney Laert Vilar Calazans

Building fully synchronous VLSI circuits is becoming less viable as circuit geometries evolve. However, before the adoption of purely asynchronous strategies in VLSI design, globally asynchronous, locally synchronous (GALS) design approaches should take over. The design of circuits using complex field programmable components like state of the art FPGAs follows this same trend. In GALS design, a critical step is the definition of asynchronous interfaces between synchronous regions. This paper proposes SCAFFI, a new asynchronous interface to interconnect modules inside FPGAs. The interface is based on clock stretching techniques to avoid metastability. Differently from other interfaces, it can use both logic levels for stretching and do not require the use of arbiters. Also, compactness of the implementation is enhanced by the use of dedicated FPGA hard macros. A GALS version implementation of an RSA cryptography core demonstrates the use of SCAFFI.

随着电路几何形状的发展，构建完全同步的VLSI电路变得越来越不可行。然而，在VLSI设计中采用纯异步策略之前，应该采用全局异步，局部同步(GALS)设计方法。使用复杂的现场可编程组件(如最先进的fpga)的电路设计遵循同样的趋势。在GALS设计中，一个关键步骤是定义同步区域之间的异步接口。本文提出了一种用于fpga内部模块互连的新型异步接口SCAFFI。该接口基于时钟拉伸技术以避免亚稳态。与其他接口不同的是，它可以使用两个逻辑级别进行拉伸，并且不需要使用仲裁器。此外，通过使用专用FPGA硬宏，增强了实现的紧凑性。RSA加密核心的GALS版本实现演示了SCAFFI的使用。

引用次数: 38

Fast power network analysis with multiple clock domains 具有多个时钟域的快速电源网络分析

2007 25th International Conference on Computer Design

Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601939

Wanping Zhang, Ling Zhang, Rui Shi, He Peng, Zhi Zhu, L. Chua-Eoan, R. Murgai, Toshiyuki Shibuya, N. Ito, Chung-Kuan Cheng

This paper proposes an efficient analysis flow and an algorithm to identify the worst case noise for power networks with multiple clock domains. First, we apply the Laplace transform on the input current sources to derive the analytical formula. Then, we calculate the circuit frequency response with logarithmic scale frequency components. The frequency domain response is approximated by a rational function using vector fitting modeling. The rational function is used to derive the natural frequency of the power ground networks, and can be converted back into time domain easily. Based on the analysis results, we then present the worst case clock gating pattern algorithm to analyze the power networks with multiple clock domains. The most expensive part of the proposed algorithm is the matrix solving: O(F(N) ldr log f ldr D). Function F is the complexity of iterative solution of complex matrix with dimension N. We assume that there are D clock domains and the frequency spans from 0 to f Hz. Experimental results show that our method is up to 60X faster than HSPICE, and can analyze large circuits which are not affordable by HSPICE.

本文提出了一种有效的多时钟域电网最坏情况噪声识别分析流程和算法。首先，我们对输入电流源进行拉普拉斯变换，推导出解析公式。然后，我们用对数尺度频率分量计算电路的频率响应。频域响应近似为有理函数，采用向量拟合建模。利用有理函数推导出电力地网的固有频率，并可方便地转换回时域。在分析结果的基础上，提出了最坏情况下的时钟门控模式算法，用于分析具有多个时钟域的电网。该算法最昂贵的部分是矩阵求解:O(F(N) ldr log F ldr D)。函数F是维数为N的复矩阵迭代解的复杂度。我们假设有D个时钟域，频率从0到fhz。实验结果表明，该方法的速度比HSPICE快60倍，可以分析HSPICE无法负担的大型电路。

{"title":"Fast power network analysis with multiple clock domains","authors":"Wanping Zhang, Ling Zhang, Rui Shi, He Peng, Zhi Zhu, L. Chua-Eoan, R. Murgai, Toshiyuki Shibuya, N. Ito, Chung-Kuan Cheng","doi":"10.1109/ICCD.2007.4601939","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601939","url":null,"abstract":"This paper proposes an efficient analysis flow and an algorithm to identify the worst case noise for power networks with multiple clock domains. First, we apply the Laplace transform on the input current sources to derive the analytical formula. Then, we calculate the circuit frequency response with logarithmic scale frequency components. The frequency domain response is approximated by a rational function using vector fitting modeling. The rational function is used to derive the natural frequency of the power ground networks, and can be converted back into time domain easily. Based on the analysis results, we then present the worst case clock gating pattern algorithm to analyze the power networks with multiple clock domains. The most expensive part of the proposed algorithm is the matrix solving: O(F(N) ldr log f ldr D). Function F is the complexity of iterative solution of complex matrix with dimension N. We assume that there are D clock domains and the frequency spans from 0 to f Hz. Experimental results show that our method is up to 60X faster than HSPICE, and can analyze large circuits which are not affordable by HSPICE.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"52 1","pages":"456-463"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81632033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2007 25th International Conference on Computer Design

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀