Pub Date : 2008-03-21DOI: 10.1109/ASPDAC.2008.4484061
G. Martin
We all know that Moore's law is good for at least a few more generations of silicon process, and this will give rise to many integrated circuits having billions of transistors on them. The leading 45 nm processors being announced are getting close to a billion transistors as of 2007. But how can we best use these devices in the future? Integrating more and more features and functions onto SoCs may not be the optimal use for all of these billions of resources. Indeed, to even have a working device at 45, 32, 22 and 16 nm may require new architectures and new structures to be incorporated. Among the many ideas that can be advanced to best use the 'billions and billions served' are: (1) multicore and multiprocessor systems (2) yet more memory, to hold the embedded software and data required by multiprocessor architectures (3) more and more elaborate on-chip interconnect and network structures (4) redundant structures for defect tolerance (5) structures and architectures for dynamic error recovery (6) a variety of schemes to allow lower and lower power and energy consumption At the same time, billions of transistors on a chip will pose increasing challenges to our design methodologies, integration approaches and design tools. How can we best conceive of, architect, design, integrate, verify and manufacture such devices? This panel draws on several academic and industry experts who will discuss their views on the best things to integrate into future ICs, and the best ways to do that integration. It will give an excellent opportunity to the audience to challenge and discuss these ideas and to advocate their own views. As well as considering the 'best' ways to use these resources, the panel will also be a good opportunity to discuss the 'worst' ways to proceed. What architectural dead-ends should be avoided as we move through each silicon process generation?
{"title":"Panel: Best ways to use billions of devices on a chip","authors":"G. Martin","doi":"10.1109/ASPDAC.2008.4484061","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484061","url":null,"abstract":"We all know that Moore's law is good for at least a few more generations of silicon process, and this will give rise to many integrated circuits having billions of transistors on them. The leading 45 nm processors being announced are getting close to a billion transistors as of 2007. But how can we best use these devices in the future? Integrating more and more features and functions onto SoCs may not be the optimal use for all of these billions of resources. Indeed, to even have a working device at 45, 32, 22 and 16 nm may require new architectures and new structures to be incorporated. Among the many ideas that can be advanced to best use the 'billions and billions served' are: (1) multicore and multiprocessor systems (2) yet more memory, to hold the embedded software and data required by multiprocessor architectures (3) more and more elaborate on-chip interconnect and network structures (4) redundant structures for defect tolerance (5) structures and architectures for dynamic error recovery (6) a variety of schemes to allow lower and lower power and energy consumption At the same time, billions of transistors on a chip will pose increasing challenges to our design methodologies, integration approaches and design tools. How can we best conceive of, architect, design, integrate, verify and manufacture such devices? This panel draws on several academic and industry experts who will discuss their views on the best things to integrate into future ICs, and the best ways to do that integration. It will give an excellent opportunity to the audience to challenge and discuss these ideas and to advocate their own views. As well as considering the 'best' ways to use these resources, the panel will also be a good opportunity to discuss the 'worst' ways to proceed. What architectural dead-ends should be avoided as we move through each silicon process generation?","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4484027
Shuai Li, Jin Shi, Yici Cai, Xianlong Hong
In multi-layered power/ground (P/G) networks, to connect the whole network together, vertical vias are usually placed at intersections between metal wires of adjoining layers. In this paper, a deep study about the design of vertical vias is presented. First we present an efficient heuristic algorithm based on sensitivity analysis to optimize via allocation in early design stage. Compared with even allocation, averagely our algorithm is capable of reducing worst voltage drop by 8.43% while using the same or even less number of vias. Also, adjoint network method is utilized and significantly improves the efficiency of our algorithm. Next, we demonstrate that by linking metal wires of nonadjacent layers, cross-layer vias are powerful in eliminating "hot" areas which suffer from large voltage drop on bottom layer. A similar heuristic algorithm is also developed for the addition of cross-layer vias.
{"title":"Vertical via design techniques for multi-layered P/G networks","authors":"Shuai Li, Jin Shi, Yici Cai, Xianlong Hong","doi":"10.1109/ASPDAC.2008.4484027","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484027","url":null,"abstract":"In multi-layered power/ground (P/G) networks, to connect the whole network together, vertical vias are usually placed at intersections between metal wires of adjoining layers. In this paper, a deep study about the design of vertical vias is presented. First we present an efficient heuristic algorithm based on sensitivity analysis to optimize via allocation in early design stage. Compared with even allocation, averagely our algorithm is capable of reducing worst voltage drop by 8.43% while using the same or even less number of vias. Also, adjoint network method is utilized and significantly improves the efficiency of our algorithm. Next, we demonstrate that by linking metal wires of nonadjacent layers, cross-layer vias are powerful in eliminating \"hot\" areas which suffer from large voltage drop on bottom layer. A similar heuristic algorithm is also developed for the addition of cross-layer vias.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115439186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4484041
Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, K. Shimizu, Youhua Shi, N. Togawa, M. Yanagisawa, T. Ohtsuki
Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23 mus.
{"title":"Scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n)","authors":"Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, K. Shimizu, Youhua Shi, N. Togawa, M. Yanagisawa, T. Ohtsuki","doi":"10.1109/ASPDAC.2008.4484041","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484041","url":null,"abstract":"Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23 mus.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123151926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4484069
A. Mineyama, Hiroyuki Ito, T. Ishii, K. Okada, K. Masu
This paper demonstrates a low voltage differential signaling (LVDS)-type on-chip transmission line (TL) interconnect to solve delay issues on global interconnects. The proposed on-chip TL interconnect can achieve 10.5 Gbps signaling and has smaller delay, smaller delay variation and better power efficiency than conventional on-chip interconnects at high-frequencies.
{"title":"LVDS-type on-chip transmision line interconnect with passive equalizers in 90nm CMOS process","authors":"A. Mineyama, Hiroyuki Ito, T. Ishii, K. Okada, K. Masu","doi":"10.1109/ASPDAC.2008.4484069","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484069","url":null,"abstract":"This paper demonstrates a low voltage differential signaling (LVDS)-type on-chip transmission line (TL) interconnect to solve delay issues on global interconnects. The proposed on-chip TL interconnect can achieve 10.5 Gbps signaling and has smaller delay, smaller delay variation and better power efficiency than conventional on-chip interconnects at high-frequencies.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117117929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4483919
C. Hsieh, J. Cong, Zhiru Zhang, Shih-Chieh Chang
In this paper we discuss optimizing the interconnect power of designs implemented in FPGA platforms. In particular, we reduce the glitch power on interconnects associated with the output of functional units in a design. The idea is to activate unused flip-flops to block the propagation of glitches, which takes advantage of the abundant flip-flops in modern FPGA structures. Since the activation of additional flip-flops may cause data hazard problems, we develop several effective behavioral synthesis techniques to prevent such data hazards. We also study the optimality of our techniques. The experimental results show that on average, our methods lead to a 28% reduction in dynamic power in the Xilinx Virtex-II platform.
{"title":"Behavioral synthesis with activating unused flip-flops for reducing glitch power in FPGA","authors":"C. Hsieh, J. Cong, Zhiru Zhang, Shih-Chieh Chang","doi":"10.1109/ASPDAC.2008.4483919","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483919","url":null,"abstract":"In this paper we discuss optimizing the interconnect power of designs implemented in FPGA platforms. In particular, we reduce the glitch power on interconnects associated with the output of functional units in a design. The idea is to activate unused flip-flops to block the propagation of glitches, which takes advantage of the abundant flip-flops in modern FPGA structures. Since the activation of additional flip-flops may cause data hazard problems, we develop several effective behavioral synthesis techniques to prevent such data hazards. We also study the optimality of our techniques. The experimental results show that on average, our methods lead to a 28% reduction in dynamic power in the Xilinx Virtex-II platform.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117148159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4483970
A. K. Verma, P. Brisk, P. Ienne
Nowadays many customised embedded processors offer the possibility of speeding up an application by implementing it using application-specific functional units (AFUs). However, the AFUs must satisfy certain constraints in terms of read and write ports between AFU and processor register file. Due to these restrictions the size and complexity of AFUs remain small. However, in recent some work has been done on relaxing the register file port constraints by serialising register file access (i.e., by allowing multi cycle read and write). This makes the problem of selecting best AFU significantly more complex. Most previous approaches use a two staged process to solve this problem, i.e., first selecting AFUs under some higher I/O constraints and then serialise them under the actual register file port constraints. Not only these methods are complex but also lead to suboptimal solutions. In this paper we formulate the AFU selection problem as an integer linear programming and solve it optimally. We show experimentally that our methodology produces significantly better results compared to state of art techniques.
{"title":"Fast, quasi-optimal, and pipelined instruction-set extensions","authors":"A. K. Verma, P. Brisk, P. Ienne","doi":"10.1109/ASPDAC.2008.4483970","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483970","url":null,"abstract":"Nowadays many customised embedded processors offer the possibility of speeding up an application by implementing it using application-specific functional units (AFUs). However, the AFUs must satisfy certain constraints in terms of read and write ports between AFU and processor register file. Due to these restrictions the size and complexity of AFUs remain small. However, in recent some work has been done on relaxing the register file port constraints by serialising register file access (i.e., by allowing multi cycle read and write). This makes the problem of selecting best AFU significantly more complex. Most previous approaches use a two staged process to solve this problem, i.e., first selecting AFUs under some higher I/O constraints and then serialise them under the actual register file port constraints. Not only these methods are complex but also lead to suboptimal solutions. In this paper we formulate the AFU selection problem as an integer linear programming and solve it optimally. We show experimentally that our methodology produces significantly better results compared to state of art techniques.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"6 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120935849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4483972
T. Luo, D. Pan
Nowadays a placement problem often involves multi-million objects and excessive fixed blockages. We present a new global placement algorithm that scales well to the modern large-scale circuit placement problems. We simulate the natural diffusion process to spread cells smoothly over the placement region, and use both analytical and discrete techniques to improve the wire length. Although any analytical wire length technique can be used in our new framework, by using the quadratic wire length model, the hessian of our formulation is extremely sparse compared with conventional formulations, which brings 24x speed up on quadratic solver. We also propose a wire linearization technique that transform quadratic star model into HPWL exactly. The overall runtime of our tool is close to the fastest placement tool in existing literature and significantly better than others. And meanwhile, we obtain competitive wire length results to the best known ones. The average total wire length is 2.2% higher than mPL6, 0.2%, 3.1%, and 9.1% better than FastPlace3.0, APlace2.0, and Capol0.2 respectively.
{"title":"DPlace2.0: A stable and efficient analytical placement based on diffusion","authors":"T. Luo, D. Pan","doi":"10.1109/ASPDAC.2008.4483972","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483972","url":null,"abstract":"Nowadays a placement problem often involves multi-million objects and excessive fixed blockages. We present a new global placement algorithm that scales well to the modern large-scale circuit placement problems. We simulate the natural diffusion process to spread cells smoothly over the placement region, and use both analytical and discrete techniques to improve the wire length. Although any analytical wire length technique can be used in our new framework, by using the quadratic wire length model, the hessian of our formulation is extremely sparse compared with conventional formulations, which brings 24x speed up on quadratic solver. We also propose a wire linearization technique that transform quadratic star model into HPWL exactly. The overall runtime of our tool is close to the fastest placement tool in existing literature and significantly better than others. And meanwhile, we obtain competitive wire length results to the best known ones. The average total wire length is 2.2% higher than mPL6, 0.2%, 3.1%, and 9.1% better than FastPlace3.0, APlace2.0, and Capol0.2 respectively.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4484032
Jia Li, Q. Xu, Yu Hu, Xiaowei Li
Power consumption in scan-based testing is a major concern nowadays. In this paper, we present a new X-fllling technique to reduce both shift power and capture power during scan tests, namely LSC-filling. The basic idea is to use as few as possible X-bits to keep the capture power under the peak power limit of the circuit under test (CUT), while using the remaining X-bits to reduce the shift power to cut down the CUT's average power consumption during scan tests as much as possible. In addition, by carefully selecting the X-filling order, our X-filling technique is able to achieve lower capture power when compared to existing methods. Experimental results on ISCAS'89 benchmark circuits show the effectiveness of the proposed methodology.
{"title":"On reducing both shift and capture power for scan-based testing","authors":"Jia Li, Q. Xu, Yu Hu, Xiaowei Li","doi":"10.1109/ASPDAC.2008.4484032","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484032","url":null,"abstract":"Power consumption in scan-based testing is a major concern nowadays. In this paper, we present a new X-fllling technique to reduce both shift power and capture power during scan tests, namely LSC-filling. The basic idea is to use as few as possible X-bits to keep the capture power under the peak power limit of the circuit under test (CUT), while using the remaining X-bits to reduce the shift power to cut down the CUT's average power consumption during scan tests as much as possible. In addition, by carefully selecting the X-filling order, our X-filling technique is able to achieve lower capture power when compared to existing methods. Experimental results on ISCAS'89 benchmark circuits show the effectiveness of the proposed methodology.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124998224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4483938
Tilen Ma, Evangeline F. Y. Young
In this paper, the problem of bus driven floor-planning is addressed. Given a set of modules and bus specifications, a floorplan solution including the bus routes will be generated with the floorplan area and total bus area minimized. Some previous works have addressed this problem with restricted bus shapes of 0-bend, 1-bend or 2-bend (Law, 2005). However, in this paper, we address this bus driven floorplanning without any limitations on the shapes of the buses. We solve this problem by a simulated annealing based floorplanner using the transitive closure graph (TCG) representation (Lin, 2001). Experimental results show that we can improve over (Law, 2005) significantly in terms of both run time and quality, since there are more flexibilities in routing the buses and complex shape validation steps are not needed. For data sets with buses connecting a large number of blocks, our approach can still generate high quality solutions effectively, while the approach (Law, 2005) of restricting to 2-bend buses often cannot give any feasible solutions.
{"title":"TCG-based multi-bend bus driven floorplanning","authors":"Tilen Ma, Evangeline F. Y. Young","doi":"10.1109/ASPDAC.2008.4483938","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483938","url":null,"abstract":"In this paper, the problem of bus driven floor-planning is addressed. Given a set of modules and bus specifications, a floorplan solution including the bus routes will be generated with the floorplan area and total bus area minimized. Some previous works have addressed this problem with restricted bus shapes of 0-bend, 1-bend or 2-bend (Law, 2005). However, in this paper, we address this bus driven floorplanning without any limitations on the shapes of the buses. We solve this problem by a simulated annealing based floorplanner using the transitive closure graph (TCG) representation (Lin, 2001). Experimental results show that we can improve over (Law, 2005) significantly in terms of both run time and quality, since there are more flexibilities in routing the buses and complex shape validation steps are not needed. For data sets with buses connecting a large number of blocks, our approach can still generate high quality solutions effectively, while the approach (Law, 2005) of restricting to 2-bend buses often cannot give any feasible solutions.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125742697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-01-21DOI: 10.1109/ASPDAC.2008.4483986
Shan Tang, Qiang Xu
Existing SoC debug techniques mainly target bus-based systems. They are not readily applicable to the emerging system that use network-on-chip (NoC) as on-chip communication scheme. In this paper, we present the detailed design of a novel debug probe (DP) inserted between the core under debug (CUD) and the NoC. With embedded configurable triggers, delay control and timestamping mechanism, the proposed DP is very effective for inter-core transaction analysis as well as controlling embedded cores' debug processes. Experimental results show the functionalities of the proposed DP and its area overhead.
{"title":"A debug probe for concurrently debugging multiple embedded cores and inter-core transactions in NoC-based systems","authors":"Shan Tang, Qiang Xu","doi":"10.1109/ASPDAC.2008.4483986","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483986","url":null,"abstract":"Existing SoC debug techniques mainly target bus-based systems. They are not readily applicable to the emerging system that use network-on-chip (NoC) as on-chip communication scheme. In this paper, we present the detailed design of a novel debug probe (DP) inserted between the core under debug (CUD) and the NoC. With embedded configurable triggers, delay control and timestamping mechanism, the proposed DP is very effective for inter-core transaction analysis as well as controlling embedded cores' debug processes. Experimental results show the functionalities of the proposed DP and its area overhead.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125333757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}