首页 > 最新文献

2011 24th Internatioal Conference on VLSI Design最新文献

英文 中文
Multi-CoDec Configurations for Low Power and High Quality Scan Test 低功耗和高质量扫描测试的多编解码器配置
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.15
A. Jain, S. Subramanian, R. Parekhji, S. Ravi
Scan compression techniques are widely used to contain test application time and test data volume. Smart techniques exist to match the scan compression CoDec (compactor-decompressor) module with the DUT (design under test), to realize high levels of compression with no loss of coverage. DUT partitioning is often desirable for ease of implementing sub-chips and integrating them into an SOC (system-on-chip). This paper presents various multi-CoDec configurations for partitioned DUTs to enable efficient scan testing, which address the requirements of reduced test mode power with no compromise in test quality. Different configurations are examined, tradeoffs discussed, and the most suitable one amongst them identified. It is shown how the preferred configuration can be architected with low implementation overhead (with no new requirements for bounding when creating the individual partitions), and how the different CoDec – DUT partitions can be operated together to meet dual goals of high quality and low power, with no increase in test time. Experimental data is presented on industrial circuits to illustrate the benefits.
扫描压缩技术被广泛用于控制测试应用时间和测试数据量。现有的智能技术将扫描压缩CoDec(压缩-解压)模块与DUT(测试中设计)相匹配,以实现高水平的压缩,而不会损失覆盖范围。DUT分区通常是理想的,以便于实现子芯片并将它们集成到SOC(片上系统)中。为实现高效的扫描测试,本文提出了用于分区被测件的各种多编解码器配置,以满足在不影响测试质量的情况下降低测试模式功耗的要求。检查了不同的配置,讨论了权衡,并确定了其中最合适的配置。它展示了如何以低实现开销(在创建单个分区时没有新的绑定要求)构建首选配置,以及如何在不增加测试时间的情况下一起操作不同的CoDec - DUT分区以满足高质量和低功耗的双重目标。在工业电路上给出了实验数据来说明这种方法的好处。
{"title":"Multi-CoDec Configurations for Low Power and High Quality Scan Test","authors":"A. Jain, S. Subramanian, R. Parekhji, S. Ravi","doi":"10.1109/VLSID.2011.15","DOIUrl":"https://doi.org/10.1109/VLSID.2011.15","url":null,"abstract":"Scan compression techniques are widely used to contain test application time and test data volume. Smart techniques exist to match the scan compression CoDec (compactor-decompressor) module with the DUT (design under test), to realize high levels of compression with no loss of coverage. DUT partitioning is often desirable for ease of implementing sub-chips and integrating them into an SOC (system-on-chip). This paper presents various multi-CoDec configurations for partitioned DUTs to enable efficient scan testing, which address the requirements of reduced test mode power with no compromise in test quality. Different configurations are examined, tradeoffs discussed, and the most suitable one amongst them identified. It is shown how the preferred configuration can be architected with low implementation overhead (with no new requirements for bounding when creating the individual partitions), and how the different CoDec – DUT partitions can be operated together to meet dual goals of high quality and low power, with no increase in test time. Experimental data is presented on industrial circuits to illustrate the benefits.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116664386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Interconnected Tile Standing Wave Resonant Oscillator Based Clock Distribution Circuits 基于时钟分配电路的互连瓦片驻波谐振振荡器
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.70
Ayan Mandal, V. Karkala, S. Khatri, R. Mahapatra
Standing wave oscillators (SWOs) are attractive since they can sustain extremely high oscillation frequencies with very low power consumption due to their resonant nature. In this paper, we present a technique to design a high frequency SWO to cover a large area on an IC. We achieve this by combining two techniques. The first technique increases the area coverage of an individual SWO by ensuring that it sustains an odd number (greater than one) of standing waves along the ring. The second approach further increases the area coverage by tiling multiple SWOs side by side, and connecting them such that they oscillate with the same high frequency and phase. The combined approach is simulated for a 3×3 array of tiles, using 3D, skin-effect adjusted RLC parasitic extraction. Our simulations are performed using a 90nm process, and indicate that this tiled structure can oscillate at about 7.25 GHz, with low power (about 68 mW per SWO tile) and low jitter (about 3.1% of the nominal clock period).
驻波振荡器(swo)由于其谐振特性,可以以极低的功耗维持极高的振荡频率,因此具有吸引力。在本文中,我们提出了一种在集成电路上设计高频SWO以覆盖大面积的技术。我们通过结合两种技术来实现这一目标。第一种技术通过确保单个SWO沿环保持奇数(大于1)驻波来增加SWO的面积覆盖。第二种方法通过将多个swo并排平铺,并将它们连接起来,使它们以相同的高频和相位振荡,从而进一步增加面积覆盖。结合方法模拟了3×3阵列的瓷砖,使用3D,皮肤效应调整的RLC寄生提取。我们使用90nm工艺进行了模拟,结果表明这种平铺结构可以在7.25 GHz左右振荡,功耗低(每个SWO平铺约68 mW),抖动低(约为名义时钟周期的3.1%)。
{"title":"Interconnected Tile Standing Wave Resonant Oscillator Based Clock Distribution Circuits","authors":"Ayan Mandal, V. Karkala, S. Khatri, R. Mahapatra","doi":"10.1109/VLSID.2011.70","DOIUrl":"https://doi.org/10.1109/VLSID.2011.70","url":null,"abstract":"Standing wave oscillators (SWOs) are attractive since they can sustain extremely high oscillation frequencies with very low power consumption due to their resonant nature. In this paper, we present a technique to design a high frequency SWO to cover a large area on an IC. We achieve this by combining two techniques. The first technique increases the area coverage of an individual SWO by ensuring that it sustains an odd number (greater than one) of standing waves along the ring. The second approach further increases the area coverage by tiling multiple SWOs side by side, and connecting them such that they oscillate with the same high frequency and phase. The combined approach is simulated for a 3×3 array of tiles, using 3D, skin-effect adjusted RLC parasitic extraction. Our simulations are performed using a 90nm process, and indicate that this tiled structure can oscillate at about 7.25 GHz, with low power (about 68 mW per SWO tile) and low jitter (about 3.1% of the nominal clock period).","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129986480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Variation-Conscious Formal Timing Verification in RTL RTL中变化意识的形式时序验证
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.48
Jayanand Asok Kumar, Shobha Vasudevan
ariations in timing can occur due to multiple sources on a chip. Many circuit level statistical techniques are used to analyze timing in the presence of these sources of variation. It is desirable to have “variation awareness” at the Register Transfer Level (RTL), and estimate block level delay distributions early in the design cycle, to evaluate design choices quickly and minimize post-synthesis simulation costs. We introduce SHARPE, a rigorous, systematic methodology to verify design correctness in RTL in the presence of variations. In this paper, we describe SHARPE in the context of computing statistical delay invariants in the presence of input variations. We treat the RTL source code as a program and use static program analysis techniques to compute probabilities. We model the probabilistic RTL modules as Discrete Time Markov Chains (DTMCs) that are then checked formally for probabilistic invariants using PRISM, a probabilistic model checker. Our technique is illustrated on the RTL description of the data path of OR1200, an open source embedded processor. We demonstrate the enhanced scalability of SHARPE by applying compositional reasoning for probabilistic model checking.
由于芯片上有多个源,可能会发生时序变化。许多电路级统计技术被用于分析存在这些变化源的时序。在寄存器传输级别(RTL)具有“变化意识”是可取的,并且在设计周期的早期估计块级延迟分布,以快速评估设计选择并最小化合成后仿真成本。我们介绍夏普,一个严格的,系统的方法来验证设计正确性的RTL在变化的存在。在本文中,我们描述了SHARPE在计算存在输入变化的统计延迟不变量的背景下。我们将RTL源代码视为一个程序,并使用静态程序分析技术来计算概率。我们将概率RTL模块建模为离散时间马尔可夫链(dtmc),然后使用概率模型检查器PRISM正式检查概率不变量。我们的技术在开源嵌入式处理器OR1200的数据路径的RTL描述上进行了说明。我们通过将组合推理应用于概率模型检查来证明SHARPE的增强可扩展性。
{"title":"Variation-Conscious Formal Timing Verification in RTL","authors":"Jayanand Asok Kumar, Shobha Vasudevan","doi":"10.1109/VLSID.2011.48","DOIUrl":"https://doi.org/10.1109/VLSID.2011.48","url":null,"abstract":"ariations in timing can occur due to multiple sources on a chip. Many circuit level statistical techniques are used to analyze timing in the presence of these sources of variation. It is desirable to have “variation awareness” at the Register Transfer Level (RTL), and estimate block level delay distributions early in the design cycle, to evaluate design choices quickly and minimize post-synthesis simulation costs. We introduce SHARPE, a rigorous, systematic methodology to verify design correctness in RTL in the presence of variations. In this paper, we describe SHARPE in the context of computing statistical delay invariants in the presence of input variations. We treat the RTL source code as a program and use static program analysis techniques to compute probabilities. We model the probabilistic RTL modules as Discrete Time Markov Chains (DTMCs) that are then checked formally for probabilistic invariants using PRISM, a probabilistic model checker. Our technique is illustrated on the RTL description of the data path of OR1200, an open source embedded processor. We demonstrate the enhanced scalability of SHARPE by applying compositional reasoning for probabilistic model checking.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133158466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A 1.8GHz Digital PLL in 65nm CMOS 1.8GHz数字锁相环65nm CMOS
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.32
B. Chattopadhyay, Anant S. Kamath, G. Nayak
A 1.8GHz high-accuracy, ring-oscillator based Digital Phase Lock Loop (DPLL), suitable for Serializer-Deserializer (SERDES) applications like HDMI, eSATA and USB2.0 is presented here. Sigma-Delta (??) dithering followed by passive filtering, along with Temperature Compensation is used to ensure frequency accuracy and low accumulated jitter, over a large temperature range. A re-circulating delay line based Time to Digital Converter (T2D) is used to handle large phase differences between the reference and feedback clocks. The DPLL is built in 65nm technology, and provides up to 1.8GHz output, with a phase noise of –87dBc/Hz at 1 MHz offset, and a frequency accuracy of +/-100ppm. It supports input frequencies in the range 0.7MHz to 50MHz, occupies a core area of 0.11 sq mm, and does not require external components.
本文介绍了一种1.8GHz高精度、基于环形振荡器的数字锁相环(DPLL),适用于HDMI、eSATA和USB2.0等串行-反串行(SERDES)应用。Sigma-Delta(??)抖动,然后是无源滤波,以及温度补偿,用于确保在大温度范围内的频率精度和低累积抖动。一种基于再循环延迟线的时间数字转换器(T2D)用于处理参考时钟和反馈时钟之间的大相位差。DPLL采用65nm技术,可提供高达1.8GHz的输出,在1 MHz偏移时相位噪声为-87dBc /Hz,频率精度为+/-100ppm。它支持0.7MHz ~ 50MHz的输入频率,核心面积为0.11 sq mm,不需要外部组件。
{"title":"A 1.8GHz Digital PLL in 65nm CMOS","authors":"B. Chattopadhyay, Anant S. Kamath, G. Nayak","doi":"10.1109/VLSID.2011.32","DOIUrl":"https://doi.org/10.1109/VLSID.2011.32","url":null,"abstract":"A 1.8GHz high-accuracy, ring-oscillator based Digital Phase Lock Loop (DPLL), suitable for Serializer-Deserializer (SERDES) applications like HDMI, eSATA and USB2.0 is presented here. Sigma-Delta (??) dithering followed by passive filtering, along with Temperature Compensation is used to ensure frequency accuracy and low accumulated jitter, over a large temperature range. A re-circulating delay line based Time to Digital Converter (T2D) is used to handle large phase differences between the reference and feedback clocks. The DPLL is built in 65nm technology, and provides up to 1.8GHz output, with a phase noise of –87dBc/Hz at 1 MHz offset, and a frequency accuracy of +/-100ppm. It supports input frequencies in the range 0.7MHz to 50MHz, occupies a core area of 0.11 sq mm, and does not require external components.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127136916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Trace Buffer-Based Silicon Debug with Lossless Compression 基于跟踪缓冲的无损压缩硅调试
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.31
S. Prabhakar, R. Sethuram, M. Hsiao
The capacity of the available on-chip trace buffer is limited. To increase its capacity, we propose real-time compression of the trace data via novel source transformation functions, namely real-time difference vector computation, efficient interconnect network and real time alternate vector reversal that reduces the entropy of the trace data. The proposed compression technique is implemented on hardware and operates real-time to capture debug data. Experimental results for sequential benchmark circuits show that the proposed method gives better compression percentage compared to prior works. The area overhead of our trace compressor is up to 20X less compared to dictionary-based codes and yields up to 4X improvement in the compression ratio.
可用片上跟踪缓冲区的容量是有限的。为了提高其容量,我们提出了通过新颖的源变换函数对迹数据进行实时压缩,即实时差分矢量计算、高效互连网络和实时交替矢量反转,从而降低迹数据的熵。所提出的压缩技术在硬件上实现,并实时操作以捕获调试数据。串行基准电路的实验结果表明,该方法比以往的方法具有更好的压缩率。与基于字典的代码相比,我们的跟踪压缩器的面积开销减少了20倍,压缩比提高了4倍。
{"title":"Trace Buffer-Based Silicon Debug with Lossless Compression","authors":"S. Prabhakar, R. Sethuram, M. Hsiao","doi":"10.1109/VLSID.2011.31","DOIUrl":"https://doi.org/10.1109/VLSID.2011.31","url":null,"abstract":"The capacity of the available on-chip trace buffer is limited. To increase its capacity, we propose real-time compression of the trace data via novel source transformation functions, namely real-time difference vector computation, efficient interconnect network and real time alternate vector reversal that reduces the entropy of the trace data. The proposed compression technique is implemented on hardware and operates real-time to capture debug data. Experimental results for sequential benchmark circuits show that the proposed method gives better compression percentage compared to prior works. The area overhead of our trace compressor is up to 20X less compared to dictionary-based codes and yields up to 4X improvement in the compression ratio.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124610846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Automated Approach for Minimum Jitter Buffered H-Tree Construction 最小抖动缓冲h树构造的一种自动化方法
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.69
Ayan Mandal, N. Jayakumar, Kalyana C. Bollapalli, S. Khatri, R. Mahapatra
In recent fabrication technologies, buffered clock distribution networks have become increasingly popular due to increasing on-chip wiring delays. Traditionally, clock distribution networks has been optimized to minimize end-to-end skew of the distribution network. However, since most ICs have an on-chip PLL, we argue that the design goal of minimizing end-to-end jitter is more relevant. In this paper, we present a dynamic programming based approach to synthesize a minimum cost buffered H-tree clock distribution network. Our cost functions are a weighted sum of power and jitter, and a weighted sum of power and end-to-end delay of the distribution network. Our approach is based on precharacterizing the delay, jitter and power of buffered segments of different lengths, topologies, buffer sizes and wire-codes. Using this information, a dynamic programming (DP) engine automatically generates the optimal H-tree that minimizes the appropriate cost function. Compared to a manually constructed buffered H-tree network, our approaches are able to reduce both jitter (by as much as 28%, and power by as much as 46%. When optimizing for minimum jitter, the DP engine generates a H-tree with lower jitter than when optimizing for minimum delay, thereby validating our approach, and proving its usefulness.
在最近的制造技术中,由于片上布线延迟的增加,缓冲时钟分配网络变得越来越流行。传统上,时钟配电网已被优化,以尽量减少配电网的端到端倾斜。然而,由于大多数ic具有片上锁相环,我们认为最小化端到端抖动的设计目标更相关。本文提出了一种基于动态规划的方法来合成最小代价缓冲h树时钟分配网络。我们的成本函数是功率和抖动的加权和,以及配电网功率和端到端延迟的加权和。我们的方法是基于对不同长度、拓扑、缓冲区大小和线码的缓冲段的延迟、抖动和功率进行预表征。使用这些信息,动态规划(DP)引擎自动生成最小化适当代价函数的最优h树。与手动构建的缓冲h树网络相比,我们的方法能够减少抖动(最多减少28%)和功耗(最多减少46%)。当优化最小抖动时,DP引擎生成的h树比优化最小延迟时抖动更小,从而验证了我们的方法,并证明了它的实用性。
{"title":"An Automated Approach for Minimum Jitter Buffered H-Tree Construction","authors":"Ayan Mandal, N. Jayakumar, Kalyana C. Bollapalli, S. Khatri, R. Mahapatra","doi":"10.1109/VLSID.2011.69","DOIUrl":"https://doi.org/10.1109/VLSID.2011.69","url":null,"abstract":"In recent fabrication technologies, buffered clock distribution networks have become increasingly popular due to increasing on-chip wiring delays. Traditionally, clock distribution networks has been optimized to minimize end-to-end skew of the distribution network. However, since most ICs have an on-chip PLL, we argue that the design goal of minimizing end-to-end jitter is more relevant. In this paper, we present a dynamic programming based approach to synthesize a minimum cost buffered H-tree clock distribution network. Our cost functions are a weighted sum of power and jitter, and a weighted sum of power and end-to-end delay of the distribution network. Our approach is based on precharacterizing the delay, jitter and power of buffered segments of different lengths, topologies, buffer sizes and wire-codes. Using this information, a dynamic programming (DP) engine automatically generates the optimal H-tree that minimizes the appropriate cost function. Compared to a manually constructed buffered H-tree network, our approaches are able to reduce both jitter (by as much as 28%, and power by as much as 46%. When optimizing for minimum jitter, the DP engine generates a H-tree with lower jitter than when optimizing for minimum delay, thereby validating our approach, and proving its usefulness.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125662586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hardware Implementation of Real-Time Speech Recognition System Using TMS320C6713 DSP 基于TMS320C6713 DSP的实时语音识别系统硬件实现
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.12
J. Manikandan, B. Venkataramani, K. Girish, H. Karthic, V. Siddharth
Continuous, real-time speech recognition is required for various mobile and hands-free applications. In this paper, hardware implementation of real-time speech recognition system is proposed using two approaches and their performances are evaluated. The first approach uses Mel Filter Banks with Mel Frequency Cepstrum Coefficients (MFCC) as feature input and the second approach uses Cochlear Filter Banks with Zero-crossings (ZC) as feature input for recognition. The features extracted from input speech are fed to multi-class Support Vector Machine (SVM) classifier for recognition. The proposed recognition systems are implemented on a Texas Instruments TMS320C6713 floating point digital signal processor for recognizing isolated digits (0-9) and their performances are compared. It is observed that the program memory required for MFCC feature extraction is 44.42% higher than that required for feature extraction using Cochlear filters. Recognition accuracies of 93.33% and 98.67% are achieved for feature inputs from Mel filter banks and Cochlear filter banks respectively. It is also observed that the computational complexity of feature extraction using cochlear filters is 1.53 times of that required for MFCC feature extraction. The recognition performance is also studied for different combinations of test and training utterances. It is found that training using 15 utterances of each digit results in best recognition accuracy. The techniques proposed here can be adapted for various other hands-free consumer applications such as washing machines, hands-free cordless and many more.
各种移动和免提应用都需要连续、实时的语音识别。本文提出了用两种方法实现实时语音识别系统的硬件实现,并对其性能进行了评价。第一种方法使用带有Mel频率倒谱系数(MFCC)的Mel滤波器组作为特征输入,第二种方法使用带有零交叉(ZC)的Cochlear滤波器组作为特征输入进行识别。从输入语音中提取的特征被送入多类支持向量机(SVM)分类器进行识别。在德州仪器TMS320C6713浮点数字信号处理器上实现了对孤立数字(0-9)的识别,并对其性能进行了比较。观察到,MFCC特征提取所需的程序内存比使用Cochlear滤波器的特征提取所需的程序内存高44.42%。Mel滤波器组和Cochlear滤波器组的特征输入的识别准确率分别达到93.33%和98.67%。耳蜗滤波器特征提取的计算复杂度是MFCC特征提取的1.53倍。本文还研究了不同测试话语和训练话语组合的识别性能。研究发现,使用每个数字的15个发音进行训练,识别准确率最高。这里提出的技术可以适用于其他各种免提消费应用,如洗衣机、免提无线等等。
{"title":"Hardware Implementation of Real-Time Speech Recognition System Using TMS320C6713 DSP","authors":"J. Manikandan, B. Venkataramani, K. Girish, H. Karthic, V. Siddharth","doi":"10.1109/VLSID.2011.12","DOIUrl":"https://doi.org/10.1109/VLSID.2011.12","url":null,"abstract":"Continuous, real-time speech recognition is required for various mobile and hands-free applications. In this paper, hardware implementation of real-time speech recognition system is proposed using two approaches and their performances are evaluated. The first approach uses Mel Filter Banks with Mel Frequency Cepstrum Coefficients (MFCC) as feature input and the second approach uses Cochlear Filter Banks with Zero-crossings (ZC) as feature input for recognition. The features extracted from input speech are fed to multi-class Support Vector Machine (SVM) classifier for recognition. The proposed recognition systems are implemented on a Texas Instruments TMS320C6713 floating point digital signal processor for recognizing isolated digits (0-9) and their performances are compared. It is observed that the program memory required for MFCC feature extraction is 44.42% higher than that required for feature extraction using Cochlear filters. Recognition accuracies of 93.33% and 98.67% are achieved for feature inputs from Mel filter banks and Cochlear filter banks respectively. It is also observed that the computational complexity of feature extraction using cochlear filters is 1.53 times of that required for MFCC feature extraction. The recognition performance is also studied for different combinations of test and training utterances. It is found that training using 15 utterances of each digit results in best recognition accuracy. The techniques proposed here can be adapted for various other hands-free consumer applications such as washing machines, hands-free cordless and many more.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130527006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A Reconfigurable Processor for Phylogenetic Inference 用于系统发育推断的可重构处理器
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.74
Pei Liu, A. Hemani, K. Paul
A reconfigurable processor tailored for accelerating Phylogenetic Inference is proposed. In this paper, a programmable and scalable architectural platform instantiates an array of coarse grained light weight processing elements and allows arbitrary partitioning and scheduling schemes and capable of solving complete Maximum Likelihood algorithm and deal with arbitrarily large sequences. The key difference of the proposed CGRA based solution compared to FPGA and GPU based solutions is a much better match of the architecture and algorithm for the core computational need as well as the system level architectural need. For the same degree of parallelism, we provide a 2.27X speed-up improvements compared to FPGA with the same amount of core logic, and an 81.87X speed-up improvements compared to GPU with the same silicon area respectively.
提出了一种加速系统发育推断的可重构处理器。在本文中,一个可编程和可扩展的架构平台实例化了一组粗粒度轻量级处理元素,允许任意分区和调度方案,能够求解完全极大似然算法并处理任意大的序列。与基于FPGA和GPU的解决方案相比,所提出的基于CGRA的解决方案的关键区别在于,它能够更好地匹配核心计算需求和系统级架构需求的体系结构和算法。对于相同的并行度,我们提供了与具有相同核心逻辑数量的FPGA相比的2.27倍的加速改进,以及与具有相同硅面积的GPU相比的81.87倍的加速改进。
{"title":"A Reconfigurable Processor for Phylogenetic Inference","authors":"Pei Liu, A. Hemani, K. Paul","doi":"10.1109/VLSID.2011.74","DOIUrl":"https://doi.org/10.1109/VLSID.2011.74","url":null,"abstract":"A reconfigurable processor tailored for accelerating Phylogenetic Inference is proposed. In this paper, a programmable and scalable architectural platform instantiates an array of coarse grained light weight processing elements and allows arbitrary partitioning and scheduling schemes and capable of solving complete Maximum Likelihood algorithm and deal with arbitrarily large sequences. The key difference of the proposed CGRA based solution compared to FPGA and GPU based solutions is a much better match of the architecture and algorithm for the core computational need as well as the system level architectural need. For the same degree of parallelism, we provide a 2.27X speed-up improvements compared to FPGA with the same amount of core logic, and an 81.87X speed-up improvements compared to GPU with the same silicon area respectively.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122753536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A SPICE Macromodel for the Analysis of Lossy Dispersive Coupled GaAs Interconnect Line System 用于分析损耗色散耦合GaAs互连线系统的SPICE宏观模型
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.11
Bhaskar Gopalan
A SPICE macro model for the transient analysis of lossy dispersive coupled GaAs interconnect line system is considered. The model is based on finite Fourier integral transform in spatial domain and is used to the study the transient nature of the signals, signal delays, distortions and cross talk in IC interconnections in digital integrated circuits. An equivalent circuit model is derived from the resulting nonlinear differential equations and is implemented as a macro model in a general purpose circuit simulator, SPICE. The model provides an easy method of including skin effect and dispersion of the lines. This macro model is an alternative method to the multiple PI or Tee sections lumped element modeling of distributed systems. The simulation times and accuracy are well compared to the reduced order PI section lumped element models.
提出了一种用于损耗色散耦合砷化镓互连线系统瞬态分析的SPICE宏观模型。该模型基于空间有限傅里叶积分变换,用于研究数字集成电路中集成电路互连中信号的瞬态特性、信号延迟、失真和串扰。根据所得到的非线性微分方程推导出等效电路模型,并在通用电路模拟器SPICE中作为宏模型实现。该模型提供了一种简便的方法,包括趋肤效应和线的分散。该宏模型是分布式系统的多个PI或Tee部分集总元素建模的替代方法。与降阶PI截面集总单元模型相比,该模型的仿真次数和精度都较好。
{"title":"A SPICE Macromodel for the Analysis of Lossy Dispersive Coupled GaAs Interconnect Line System","authors":"Bhaskar Gopalan","doi":"10.1109/VLSID.2011.11","DOIUrl":"https://doi.org/10.1109/VLSID.2011.11","url":null,"abstract":"A SPICE macro model for the transient analysis of lossy dispersive coupled GaAs interconnect line system is considered. The model is based on finite Fourier integral transform in spatial domain and is used to the study the transient nature of the signals, signal delays, distortions and cross talk in IC interconnections in digital integrated circuits. An equivalent circuit model is derived from the resulting nonlinear differential equations and is implemented as a macro model in a general purpose circuit simulator, SPICE. The model provides an easy method of including skin effect and dispersion of the lines. This macro model is an alternative method to the multiple PI or Tee sections lumped element modeling of distributed systems. The simulation times and accuracy are well compared to the reduced order PI section lumped element models.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122962386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intra-Flit Skew Reduction for Asynchronous Bypass Channel in NoCs noc中异步旁路信道的飞内倾斜减小
Pub Date : 2011-01-02 DOI: 10.1109/VLSID.2011.73
Reeshav Kumar, Yoon Seok Yang, G. Choi
Various novel NoC designs attempt to improve network throughput and latency by leveraging upon asynchronous bypass and specialized clock routing. The performance of such architectures is limited by the skewing of signal transitions on the bit-lines of link due to cross talk noise. This work proposes a two-step technique: TransSync-RecSync, to eliminate packet errors resulting from inter-bit-line transition skew. TransSync preemptively adds delay to bits in a flit before they are transmitted to overcome skewing of transitions on link while RecSync de-skews the bits at the receiving end by delaying all the transitions by the same amount as the maximum skew on the bus. The approach adds minimally to router complexity and involves no wire overhead. The proposed scheme when employed to augment a NoC design with asynchronous bypass channel was found to improve the average network latency by 38%.
各种新颖的NoC设计试图通过利用异步旁路和专用时钟路由来提高网络吞吐量和延迟。这种结构的性能受到串扰噪声引起的链路位线上信号转换的扭曲的限制。这项工作提出了一种两步技术:TransSync-RecSync,以消除由位线间转换倾斜引起的数据包错误。TransSync在传输之前先发制人地在flit中添加延迟,以克服链路上转换的倾斜,而RecSync在接收端通过延迟与总线上最大倾斜相同数量的所有转换来消除比特的倾斜。这种方法使路由器的复杂性降到最低,并且不涉及线路开销。将该方案用于具有异步旁路通道的NoC设计时,发现平均网络延迟提高了38%。
{"title":"Intra-Flit Skew Reduction for Asynchronous Bypass Channel in NoCs","authors":"Reeshav Kumar, Yoon Seok Yang, G. Choi","doi":"10.1109/VLSID.2011.73","DOIUrl":"https://doi.org/10.1109/VLSID.2011.73","url":null,"abstract":"Various novel NoC designs attempt to improve network throughput and latency by leveraging upon asynchronous bypass and specialized clock routing. The performance of such architectures is limited by the skewing of signal transitions on the bit-lines of link due to cross talk noise. This work proposes a two-step technique: TransSync-RecSync, to eliminate packet errors resulting from inter-bit-line transition skew. TransSync preemptively adds delay to bits in a flit before they are transmitted to overcome skewing of transitions on link while RecSync de-skews the bits at the receiving end by delaying all the transitions by the same amount as the maximum skew on the bus. The approach adds minimally to router complexity and involves no wire overhead. The proposed scheme when employed to augment a NoC design with asynchronous bypass channel was found to improve the average network latency by 38%.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"16 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114128986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2011 24th Internatioal Conference on VLSI Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1