This paper describes an accelerating technique for SAT based ATPG (automatic test pattern generation). The main idea of the proposed algorithm is representing more than one test generation problems as one CNF formula with introducing control variables, which reduces CNF generation time. Furthermore, learnt clauses of previously solved problems are effectively shared for other problems solving, so that the SAT solving time is also reduced. Experimental results show that the proposed algorithm runs more than 3 times faster than the original SAT-based ATPG algorithm.
{"title":"An Accelerating Technique for SAT-based ATPG","authors":"Y. Matsunaga","doi":"10.2197/ipsjtsldm.10.39","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.10.39","url":null,"abstract":"This paper describes an accelerating technique for SAT based ATPG (automatic test pattern generation). The main idea of the proposed algorithm is representing more than one test generation problems as one CNF formula with introducing control variables, which reduces CNF generation time. Furthermore, learnt clauses of previously solved problems are effectively shared for other problems solving, so that the SAT solving time is also reduced. Experimental results show that the proposed algorithm runs more than 3 times faster than the original SAT-based ATPG algorithm.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89839650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integrated circuits (ICs) are at the heart of modern electronics, which rely heavily on the state-of-the-art semiconductor manufacturing technology. The key to pushing forward semiconductor technology is IC feature-size miniaturization. However, this brings ever-increasing design complexities and manufacturing challenges to the $350 billion semiconductor industry. The manufacturing of two-dimensional layout on high-density metal layers depends on complex design-for-manufacturing techniques and sophisticated empirical optimizations, which introduces huge amounts of turnaround time and yield loss in advanced technology nodes. Our study reveals that unidirectional layout design can significantly reduce the manufacturing complexities and improve the yield, which is becoming increasingly adopted in semiconductor industry [1, 2]. Despite the manufacturing benefits, unidirectional layout leads to more restrictive solution space and brings significant impacts on the IC design automation flow for routing closure. Notably, unidirectional routing limits the standard cell pin accessibility, which further exacerbates the resource competitions during routing. Moreover, for post-routing optimization, traditional redundant-via insertion has become obsolete under unidirectional routing style, which makes the yield enhancement task extremely challenging. Our research objective is to invent novel CAD algorithms and methodologies for fast and high-quality unidirectional routing closure, which ultimately reduces the design cycle and manufacturing cost of IC design in advanced technology nodes.
{"title":"Toward Unidirectional Routing Closure in Advanced Technology Nodes","authors":"Xiaoqing Xu, D. Pan","doi":"10.2197/ipsjtsldm.10.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.10.2","url":null,"abstract":"Integrated circuits (ICs) are at the heart of modern electronics, which rely heavily on the state-of-the-art semiconductor manufacturing technology. The key to pushing forward semiconductor technology is IC feature-size miniaturization. However, this brings ever-increasing design complexities and manufacturing challenges to the $350 billion semiconductor industry. The manufacturing of two-dimensional layout on high-density metal layers depends on complex design-for-manufacturing techniques and sophisticated empirical optimizations, which introduces huge amounts of turnaround time and yield loss in advanced technology nodes. Our study reveals that unidirectional layout design can significantly reduce the manufacturing complexities and improve the yield, which is becoming increasingly adopted in semiconductor industry [1, 2]. Despite the manufacturing benefits, unidirectional layout leads to more restrictive solution space and brings significant impacts on the IC design automation flow for routing closure. Notably, unidirectional routing limits the standard cell pin accessibility, which further exacerbates the resource competitions during routing. Moreover, for post-routing optimization, traditional redundant-via insertion has become obsolete under unidirectional routing style, which makes the yield enhancement task extremely challenging. Our research objective is to invent novel CAD algorithms and methodologies for fast and high-quality unidirectional routing closure, which ultimately reduces the design cycle and manufacturing cost of IC design in advanced technology nodes.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72632016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optical Proximity Correction (OPC) is still nominated as a main stream in printing Sub-16 nm technology nodes in optical micro-lithography. However, long computation time is required to generate mask solutions with acceptable wafer image quality. Intensity Difference Map (IDM) has been recently proposed as a fast methodology to shorten OPC computation time with preserving acceptable wafer image quality. However, IDM has been evaluated only under a relatively relaxed Edge Placement Error (EPE) constraint of the final mask solution. Such an evaluation does not provide a satisfactory confirmation of the effectiveness of IDM if strict EPE constraints are imposed. In this paper, the accuracy of IDM is deeply analyzed to confirm its validity in terms of wafer image estimation accuracy along with its efficiency in shortening computation time. Thereafter, the stability of IDM accuracy against the increase in pattern area/density is confirmed. Finally, the regions suffering from lack of accuracy are analyzed for further enhancement. Experimental results show that congestion in the mask pattern forms a cardinal source of the lack of accuracy which is compensated through optimized selection of the kernels included in IDM.
{"title":"Intensity Difference Map (IDM) Accuracy Analysis for OPC Efficiency Verification and Further Enhancement","authors":"Ahmed Awad, A. Takahashi, S. Tanaka, C. Kodama","doi":"10.2197/ipsjtsldm.10.28","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.10.28","url":null,"abstract":"Optical Proximity Correction (OPC) is still nominated as a main stream in printing Sub-16 nm technology nodes in optical micro-lithography. However, long computation time is required to generate mask solutions with acceptable wafer image quality. Intensity Difference Map (IDM) has been recently proposed as a fast methodology to shorten OPC computation time with preserving acceptable wafer image quality. However, IDM has been evaluated only under a relatively relaxed Edge Placement Error (EPE) constraint of the final mask solution. Such an evaluation does not provide a satisfactory confirmation of the effectiveness of IDM if strict EPE constraints are imposed. In this paper, the accuracy of IDM is deeply analyzed to confirm its validity in terms of wafer image estimation accuracy along with its efficiency in shortening computation time. Thereafter, the stability of IDM accuracy against the increase in pattern area/density is confirmed. Finally, the regions suffering from lack of accuracy are analyzed for further enhancement. Experimental results show that congestion in the mask pattern forms a cardinal source of the lack of accuracy which is compensated through optimized selection of the kernels included in IDM.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84192884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qian Zhao, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi
Field-programmable gate array (FPGA) is a promising technology for the implementing of highperformance and power-efficient cloud computing by serving dedicated hardware as co-processor to accelerate loads on CPUs. However, developing an FPGA-based system is challenging because the complexity of the hardware and software co-design. In this paper, we propose a platform named hCODE to simplify the design, share, and deployment of FPGA accelerators. First, we adopt a shell-and-IP design pattern to improve the reusability and the portability of accelerator designs. Second, we implement an open accelerator repository to bridge hardware development and software development on one platform. On the hCODE platform, hardware developers can provide designs that follow hCODE specifications, which allowing software engineers to easily search, download, and integrate accelerators in their applications without caring about the hardware details.
{"title":"Towards Open-HW: A Platform to Design, Share and Deploy FPGA Accelerators in Low Cost","authors":"Qian Zhao, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi","doi":"10.2197/ipsjtsldm.10.63","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.10.63","url":null,"abstract":"Field-programmable gate array (FPGA) is a promising technology for the implementing of highperformance and power-efficient cloud computing by serving dedicated hardware as co-processor to accelerate loads on CPUs. However, developing an FPGA-based system is challenging because the complexity of the hardware and software co-design. In this paper, we propose a platform named hCODE to simplify the design, share, and deployment of FPGA accelerators. First, we adopt a shell-and-IP design pattern to improve the reusability and the portability of accelerator designs. Second, we implement an open accelerator repository to bridge hardware development and software development on one platform. On the hCODE platform, hardware developers can provide designs that follow hCODE specifications, which allowing software engineers to easily search, download, and integrate accelerators in their applications without caring about the hardware details.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86133559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.
{"title":"An Accurate and Fast Trace-aware Performance Estimation Model For Prioritized MPSoC Bus With Multiple Interfering Bus-Masters","authors":"F. Shafiq, T. Isshiki, Dongju Li","doi":"10.2197/ipsjtsldm.10.13","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.10.13","url":null,"abstract":"Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82507579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Knechtel, O. Sinanoglu, I. Elfadel, J. Lienig, C. Sze
{"title":"Large-Scale 3D Chips: Challenges and Solutions for Design Automation, Testing, and Trustworthy Integration","authors":"J. Knechtel, O. Sinanoglu, I. Elfadel, J. Lienig, C. Sze","doi":"10.2197/IPSJTSLDM.10.45","DOIUrl":"https://doi.org/10.2197/IPSJTSLDM.10.45","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79963914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Editor-in-Chief","authors":"N. Togawa","doi":"10.2197/ipsjtsldm.10.1","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.10.1","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81309081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amro Awad, Ganesh Balakrishnan, Yipeng Wang, Yan Solihin
While customizing the memory system design or picking the most fitting design for applications is very critical, many software vendors refrain from releasing their software for several reasons. First, many applications are proprietary, hence releasing them to hardware architects or vendors is not desired. Second, applications such as defense and nuclear simulations are very sensitive, hence accessing them is very restricted. Nonetheless, customizing the hardware for such applications is still important and highly desired. Workload cloning is the technique of generating synthetic clones from the original workload. The clones mimic the memory access behavior of the original workloads, hence enable exploring the design space with high level of accuracy. In this article, we survey the state-of-art cloning techniques of the memory access behavior and their uses.
{"title":"Accurate Cloning of the Memory Access Behavior","authors":"Amro Awad, Ganesh Balakrishnan, Yipeng Wang, Yan Solihin","doi":"10.2197/ipsjtsldm.9.49","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.9.49","url":null,"abstract":"While customizing the memory system design or picking the most fitting design for applications is very critical, many software vendors refrain from releasing their software for several reasons. First, many applications are proprietary, hence releasing them to hardware architects or vendors is not desired. Second, applications such as defense and nuclear simulations are very sensitive, hence accessing them is very restricted. Nonetheless, customizing the hardware for such applications is still important and highly desired. Workload cloning is the technique of generating synthetic clones from the original workload. The clones mimic the memory access behavior of the original workloads, hence enable exploring the design space with high level of accuracy. In this article, we survey the state-of-art cloning techniques of the memory access behavior and their uses.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80636635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Higami, Senling Wang, Hiroshi Takahashi, Shin-ya Kobayashi, K. Saluja
For the purpose of analyzing the cause of delay in modern digital circuits, efficient diagnosis methods for delay faults need to be developed. This paper presents diagnosis methods for gate delay faults by using a fault dictionary approach. Although a fault dictionary is created by fault simulation and for a specific amount of delay, the proposed method using it can deduce candidate faults successfully even when the amount of delay in a circuit under diagnosis is different from that of the delay assumed during the fault simulation. In this paper, we target diagnosing the presence of single gate delay faults and double gate delay faults. Experimental results for benchmark circuits demonstrate the effectiveness of the proposed methods.
{"title":"Diagnosis Methods for Gate Delay Faults with Various Amounts of Delays","authors":"Y. Higami, Senling Wang, Hiroshi Takahashi, Shin-ya Kobayashi, K. Saluja","doi":"10.2197/ipsjtsldm.9.13","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.9.13","url":null,"abstract":"For the purpose of analyzing the cause of delay in modern digital circuits, efficient diagnosis methods for delay faults need to be developed. This paper presents diagnosis methods for gate delay faults by using a fault dictionary approach. Although a fault dictionary is created by fault simulation and for a specific amount of delay, the proposed method using it can deduce candidate faults successfully even when the amount of delay in a circuit under diagnosis is different from that of the delay assumed during the fault simulation. In this paper, we target diagnosing the presence of single gate delay faults and double gate delay faults. Experimental results for benchmark circuits demonstrate the effectiveness of the proposed methods.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86583277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Editor-in-Chief","authors":"N. Togawa","doi":"10.2197/ipsjtsldm.9.1","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.9.1","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86949883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}