Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398017
T. English, Maurice Keller, K. L. Man, E. Popovici, M. Schellekens, W. Marnane
We report on the implementation of an IP core for Pairing-based cryptography. The core performs an elliptic curve cryptographic operation called the Tate Pairing over the field GF(2251). In this paper, we describe the implementation of the design in TSMC 65nm GP CMOS standard cells and the optimisations made for low-power operation. The resulting core computes the pairing in 1.5ms and consumes less than 4mW.
我们报告了基于配对的加密的IP核的实现。该核心在域GF(2251)上执行称为Tate配对的椭圆曲线加密操作。在本文中,我们描述了该设计在台积电65nm GP CMOS标准电池中的实现,以及为低功耗工作所做的优化。由此产生的核心在1.5ms内计算配对,功耗低于4mW。
{"title":"A low-power pairing-based cryptographic accelerator for embedded security applications","authors":"T. English, Maurice Keller, K. L. Man, E. Popovici, M. Schellekens, W. Marnane","doi":"10.1109/SOCCON.2009.5398017","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398017","url":null,"abstract":"We report on the implementation of an IP core for Pairing-based cryptography. The core performs an elliptic curve cryptographic operation called the Tate Pairing over the field GF(2251). In this paper, we describe the implementation of the design in TSMC 65nm GP CMOS standard cells and the optimisations made for low-power operation. The resulting core computes the pairing in 1.5ms and consumes less than 4mW.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129991036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398103
S. Demirsoy, Kellie Marks
An SoC framework is presented, comprising of a plug-and-play infrastructure where the system communication is abstracted from the processing elements. A software scheduler is used with a hardware modelling environment for latency analysis. Using the framework, an LTE uplink data channel (PUSCH) receiver design is shown to meet the stringent latency targets.
{"title":"SoC framework for FPGA: A case study of LTE PUSCH receiver","authors":"S. Demirsoy, Kellie Marks","doi":"10.1109/SOCCON.2009.5398103","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398103","url":null,"abstract":"An SoC framework is presented, comprising of a plug-and-play infrastructure where the system communication is abstracted from the processing elements. A software scheduler is used with a hardware modelling environment for latency analysis. Using the framework, an LTE uplink data channel (PUSCH) receiver design is shown to meet the stringent latency targets.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"479 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127560474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398064
Seungwon Lee, Tae-Ho Kim, Jae-Wook Yoo, Jin-Ku Kang
This paper describes a clock and data recovery (CDR) circuit that support dual data rates of 2.7Gbps and 1.62Gbps for DisplayPort standard. The proposed CDR has a dual mode voltage-controlled oscillator (VCO) that changes the operating frequency with a “Mode” switch control. The chip has been implemented using 0.18μm CMOS process. Measured results show the circuit exhibits peak-to-peak jitters of 37ps(@2.7Gbps) and 27ps(@1.62Gbps) in the recovered data. The power dissipation is 80mW at 2.7Gbps rate from a 1.8V supply.
{"title":"A 2.7Gbps & 1.62Gbps dual-mode clock and data recovery for DisplayPort in 0.18μm CMOS","authors":"Seungwon Lee, Tae-Ho Kim, Jae-Wook Yoo, Jin-Ku Kang","doi":"10.1109/SOCCON.2009.5398064","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398064","url":null,"abstract":"This paper describes a clock and data recovery (CDR) circuit that support dual data rates of 2.7Gbps and 1.62Gbps for DisplayPort standard. The proposed CDR has a dual mode voltage-controlled oscillator (VCO) that changes the operating frequency with a “Mode” switch control. The chip has been implemented using 0.18μm CMOS process. Measured results show the circuit exhibits peak-to-peak jitters of 37ps(@2.7Gbps) and 27ps(@1.62Gbps) in the recovered data. The power dissipation is 80mW at 2.7Gbps rate from a 1.8V supply.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121370872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398071
G. Vikas, J. Kuri, Kuruvilla Varghese
A large part of today's multi-core chips is interconnect. Increasing communication complexity has made essential new strategies for interconnects, such as Network on Chip. Power dissipation in interconnects has become a substantial part of the total power dissipation. Techniques to reduce interconnect power have thus become a necessity. In this paper, we present a design methodology that gives values of bus width for interconnect links, frequency of operation for routers, in Network on Chip scenario that satisfy required throughput and dissipate minimal switching power. We develop closed form analytical expressions for the power dissipation, with bus width and frequency as variables and then use Lagrange multiplier method to arrive at the optimal values. We present a 4 port router in 90 nm technology library as case study. The results obtained from analysis are discussed.
{"title":"Power optimal Network-on-Chip interconnect design","authors":"G. Vikas, J. Kuri, Kuruvilla Varghese","doi":"10.1109/SOCCON.2009.5398071","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398071","url":null,"abstract":"A large part of today's multi-core chips is interconnect. Increasing communication complexity has made essential new strategies for interconnects, such as Network on Chip. Power dissipation in interconnects has become a substantial part of the total power dissipation. Techniques to reduce interconnect power have thus become a necessity. In this paper, we present a design methodology that gives values of bus width for interconnect links, frequency of operation for routers, in Network on Chip scenario that satisfy required throughput and dissipate minimal switching power. We develop closed form analytical expressions for the power dissipation, with bus width and frequency as variables and then use Lagrange multiplier method to arrive at the optimal values. We present a 4 port router in 90 nm technology library as case study. The results obtained from analysis are discussed.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398049
Chun-Fei Hsu, Mong-Kai Ku, Li-Yen Liu
This paper presents a video shot boundary detection system based on support vector machine (SVM) classification method. A hardware fully-parallel digital Support Vector Machine (SVM) classifier is used to detect the shot boundary in a continuous video stream. The throughput is increased by employing a pipelined architecture in the feature extraction stage. Hardware SVM can detect both cut and gradual transition in the video stream. Random pseudo-sampling techniques are employed to solve the class imbalance problem in SVM training. The internal wordlength is optimized for performance and hardware complexity. The threshold method in the postprocessing stage merges small subshots to reduce false alarms. The complete system is demonstrated on Xilinx Virtex IV XC4VSX35 FPGA platform to achieve 256 frames per second.
提出了一种基于支持向量机(SVM)分类方法的视频镜头边界检测系统。采用硬件全并行数字支持向量机(SVM)分类器检测连续视频流中的镜头边界。在特征提取阶段采用流水线架构,提高了吞吐量。硬件支持向量机可以同时检测视频流中的剪切和渐变。采用随机伪抽样技术解决支持向量机训练中的类不平衡问题。内部字长针对性能和硬件复杂性进行了优化。后处理阶段的阈值方法通过合并小的子镜头来减少误报。完整的系统在Xilinx Virtex IV XC4VSX35 FPGA平台上进行了演示,达到每秒256帧。
{"title":"Support vector machine FPGA implementation for video shot boundary detection application","authors":"Chun-Fei Hsu, Mong-Kai Ku, Li-Yen Liu","doi":"10.1109/SOCCON.2009.5398049","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398049","url":null,"abstract":"This paper presents a video shot boundary detection system based on support vector machine (SVM) classification method. A hardware fully-parallel digital Support Vector Machine (SVM) classifier is used to detect the shot boundary in a continuous video stream. The throughput is increased by employing a pipelined architecture in the feature extraction stage. Hardware SVM can detect both cut and gradual transition in the video stream. Random pseudo-sampling techniques are employed to solve the class imbalance problem in SVM training. The internal wordlength is optimized for performance and hardware complexity. The threshold method in the postprocessing stage merges small subshots to reduce false alarms. The complete system is demonstrated on Xilinx Virtex IV XC4VSX35 FPGA platform to achieve 256 frames per second.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124820945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398076
K. Xuan, K. Tsang, Shu‐Chuen Lee, W. Lee
A high gain and low noise mixer based on current bleeding topology is implemented. The high performance is attributed to the effect of current injection and local oscillator (LO) amplification. The conversion gain of the mixer is 17.5 dB at −14 dBm LO power and the noise figure is 10.5 dB. The proposed topology dramatically relieves the typically high power requirement of LO. The mixer is implemented by a 0.18-μm CMOS process. The operating frequency is 2.4 GHz with 10 MHz intermediate frequency. The circuit drains 12 mA current from a 1.5 V supply voltage.
{"title":"A current bleeding mixer based on Gilbert-cell featuring LO amplification","authors":"K. Xuan, K. Tsang, Shu‐Chuen Lee, W. Lee","doi":"10.1109/SOCCON.2009.5398076","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398076","url":null,"abstract":"A high gain and low noise mixer based on current bleeding topology is implemented. The high performance is attributed to the effect of current injection and local oscillator (LO) amplification. The conversion gain of the mixer is 17.5 dB at −14 dBm LO power and the noise figure is 10.5 dB. The proposed topology dramatically relieves the typically high power requirement of LO. The mixer is implemented by a 0.18-μm CMOS process. The operating frequency is 2.4 GHz with 10 MHz intermediate frequency. The circuit drains 12 mA current from a 1.5 V supply voltage.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123446384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398069
Mohamed A. Abd El-Ghany, M. El-Moursy, M. Ismail
High Throughput Chip-Level Integration of Communicating Heterogeneous Elements (CLICHÉ) architecture to achieve high performance Networks on Chip (NoC) is proposed. The architecture increases the throughput of the network by 40% while preserving the average latency. The area of High Throughput CLICHÉ switch is decreased by 18% as compared to CLICHÉ switch. The total metal resources required to implement High Throughput CLICHÉ design is increased by 7% as compared to the total metal resources required to implement CLICHÉ design. The extra power consumption required to achieve the proposed architecture is 8% of the total power consumption of the CLICHÉ architecture.
{"title":"High throughput architecture for CLICHÉ Network on Chip","authors":"Mohamed A. Abd El-Ghany, M. El-Moursy, M. Ismail","doi":"10.1109/SOCCON.2009.5398069","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398069","url":null,"abstract":"High Throughput Chip-Level Integration of Communicating Heterogeneous Elements (CLICHÉ) architecture to achieve high performance Networks on Chip (NoC) is proposed. The architecture increases the throughput of the network by 40% while preserving the average latency. The area of High Throughput CLICHÉ switch is decreased by 18% as compared to CLICHÉ switch. The total metal resources required to implement High Throughput CLICHÉ design is increased by 7% as compared to the total metal resources required to implement CLICHÉ design. The extra power consumption required to achieve the proposed architecture is 8% of the total power consumption of the CLICHÉ architecture.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127250400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398086
Jihi-Yu Lin, Ming-Hsien Tu, Ming-Chien Tsai, S. Jou, C. Chuang
In this paper, asymmetrical Write-assist cell virtual ground biasing and positive feedback sensing keeper schemes are proposed to improve the Read Static Noise Margin (RSNM), Write Margin (WM), and operation speed of a single-ended Read/Write 8T SRAM cell. A 4Kbit SRAM implemented in 90nm CMOS technology achieves 1uW/bit average power consumption at 6MHz, Vmin of 410mV at 6MHz, and 234MHz maximum operation frequency at 600mV.
{"title":"Asymmetrical Write-assist for single-ended SRAM operation","authors":"Jihi-Yu Lin, Ming-Hsien Tu, Ming-Chien Tsai, S. Jou, C. Chuang","doi":"10.1109/SOCCON.2009.5398086","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398086","url":null,"abstract":"In this paper, asymmetrical Write-assist cell virtual ground biasing and positive feedback sensing keeper schemes are proposed to improve the Read Static Noise Margin (RSNM), Write Margin (WM), and operation speed of a single-ended Read/Write 8T SRAM cell. A 4Kbit SRAM implemented in 90nm CMOS technology achieves 1uW/bit average power consumption at 6MHz, Vmin of 410mV at 6MHz, and 234MHz maximum operation frequency at 600mV.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"9 36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133516441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398029
Tariq E. L. Motassadeq, V. Sarathi, Syed Thameem, Mohamed Nijam
With shrinking geometries and increasing complexity of the designs, the use of SPICE simulator (SPICE) is a must to perform accurate timing analysis of the critical paths. This also improves the signoff confidence of the design. However, in this process designers may discover a miscorrelation between Static Timing Analysis (STA) and SPICE. There are articles that provide in-depth descriptions of STA-SPICE correlation flows [3]. This paper addresses key challenges and offers useful tips in timing and noise correlation.
{"title":"SPICE versus STA tools: Challenges and tips for better correlation","authors":"Tariq E. L. Motassadeq, V. Sarathi, Syed Thameem, Mohamed Nijam","doi":"10.1109/SOCCON.2009.5398029","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398029","url":null,"abstract":"With shrinking geometries and increasing complexity of the designs, the use of SPICE simulator (SPICE) is a must to perform accurate timing analysis of the critical paths. This also improves the signoff confidence of the design. However, in this process designers may discover a miscorrelation between Static Timing Analysis (STA) and SPICE. There are articles that provide in-depth descriptions of STA-SPICE correlation flows [3]. This paper addresses key challenges and offers useful tips in timing and noise correlation.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"24 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132513035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-01DOI: 10.1109/SOCCON.2009.5398028
Shinyu Ninomiya, M. Hashimoto
Statistical timing analysis for manufacturing variability requires modeling of spatially-correlated variation. Common grid-based modeling for spatially-correlated variability involves a trade-off between accuracy and computational cost, especially for PCA (principal component analysis). This paper proposes to spatially interpolate variation coefficients for improving accuracy instead of fining spatial grids. Experimental results show that the spatial interpolation realizes a continuous expression of spatial correlation, and reduces the maximum error of timing estimates that originates from sparse spatial grids For attaining the same accuracy, the proposed interpolation reduced CPU time for PCA by 97.7% in a test case.
{"title":"Enhancement of grid-based spatially-correlated variability modeling for improving SSTA accuracy","authors":"Shinyu Ninomiya, M. Hashimoto","doi":"10.1109/SOCCON.2009.5398028","DOIUrl":"https://doi.org/10.1109/SOCCON.2009.5398028","url":null,"abstract":"Statistical timing analysis for manufacturing variability requires modeling of spatially-correlated variation. Common grid-based modeling for spatially-correlated variability involves a trade-off between accuracy and computational cost, especially for PCA (principal component analysis). This paper proposes to spatially interpolate variation coefficients for improving accuracy instead of fining spatial grids. Experimental results show that the spatial interpolation realizes a continuous expression of spatial correlation, and reduces the maximum error of timing estimates that originates from sparse spatial grids For attaining the same accuracy, the proposed interpolation reduced CPU time for PCA by 97.7% in a test case.","PeriodicalId":303505,"journal":{"name":"2009 IEEE International SOC Conference (SOCC)","volume":"9 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132724756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}