Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378652
Greg Chadwick, S. Moore
In this paper we describe Mamba, an architecture designed for multi-core systems. Mamba has two major aims: (i) make on-chip communication explicit to the programmer so they can optimize for it and (ii) support many threads and supply very lightweight communication and synchronization primitives for them. These aims are based on the observations that: (i) as feature sizes shrink, on-chip communication becomes relatively more expensive than computation and (ii) as we go increasingly multi-core we need highly scalable approaches to inter-thread communication and synchronization. We employ a network of processors where a given memory access will always go to the same cache, removing the need for a coherence protocol and allowing the program explicit control over all communication. A presence bit associated with each word provides a very lightweight, finegrained synchronization primitive. We demonstrate an FPGA implementation with micro-benchmarks of standard spinlock and FIFO implementations and show that presence bit based implementations provide more efficient locking, and lower latency FIFO communications compared to a conventional shared memory implementation whilst also requiring fewer memory accesses. We also show that Mamba performance is insensitive to total thread count, allowing the use of as many threads as desired.
{"title":"Mamba: A scalable communication centric multi-threaded processor architecture","authors":"Greg Chadwick, S. Moore","doi":"10.1109/ICCD.2012.6378652","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378652","url":null,"abstract":"In this paper we describe Mamba, an architecture designed for multi-core systems. Mamba has two major aims: (i) make on-chip communication explicit to the programmer so they can optimize for it and (ii) support many threads and supply very lightweight communication and synchronization primitives for them. These aims are based on the observations that: (i) as feature sizes shrink, on-chip communication becomes relatively more expensive than computation and (ii) as we go increasingly multi-core we need highly scalable approaches to inter-thread communication and synchronization. We employ a network of processors where a given memory access will always go to the same cache, removing the need for a coherence protocol and allowing the program explicit control over all communication. A presence bit associated with each word provides a very lightweight, finegrained synchronization primitive. We demonstrate an FPGA implementation with micro-benchmarks of standard spinlock and FIFO implementations and show that presence bit based implementations provide more efficient locking, and lower latency FIFO communications compared to a conventional shared memory implementation whilst also requiring fewer memory accesses. We also show that Mamba performance is insensitive to total thread count, allowing the use of as many threads as desired.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132944411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378639
Yi-Ling Hsieh, Tsung-Yi Ho, K. Chakrabarty
Recent advances in digital microfluidic biochips have led to a promising future for miniaturized laboratories, with the associated advantages of high sensitivity and reconfigurability. As one of the front-end operations on digital microfluidic biochips, sample preparation plays an important role in biochemical assays and applications. For fast and high-throughput biochemical applications, it is critical to develop an automated design methodology for sample preparation. Prior work in this area does not provide solutions to the problem of design automation for sample preparation. Moreover, it is critical to ensure the correctness of droplets and recover from errors efficiently during sample preparation. Published work on error recovery is inefficient and impractical for sample preparation. Therefore, in this paper, we present an automated design methodology for sample preparation, including architectural synthesis, layout synthesis, and dynamic error recovery. The proposed algorithm is evaluated on real-life biochemical applications to demonstrate its effectiveness and efficiency. Compared to prior work, the proposed algorithm can achieve up to 48.39% reduction in sample preparation time.
{"title":"Design methodology for sample preparation on digital microfluidic biochips","authors":"Yi-Ling Hsieh, Tsung-Yi Ho, K. Chakrabarty","doi":"10.1109/ICCD.2012.6378639","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378639","url":null,"abstract":"Recent advances in digital microfluidic biochips have led to a promising future for miniaturized laboratories, with the associated advantages of high sensitivity and reconfigurability. As one of the front-end operations on digital microfluidic biochips, sample preparation plays an important role in biochemical assays and applications. For fast and high-throughput biochemical applications, it is critical to develop an automated design methodology for sample preparation. Prior work in this area does not provide solutions to the problem of design automation for sample preparation. Moreover, it is critical to ensure the correctness of droplets and recover from errors efficiently during sample preparation. Published work on error recovery is inefficient and impractical for sample preparation. Therefore, in this paper, we present an automated design methodology for sample preparation, including architectural synthesis, layout synthesis, and dynamic error recovery. The proposed algorithm is evaluated on real-life biochemical applications to demonstrate its effectiveness and efficiency. Compared to prior work, the proposed algorithm can achieve up to 48.39% reduction in sample preparation time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378676
A. Azevedo, B. Vermeulen, K. Goossens
In this paper, we describe and analyze the architecture of the proposed Debug Event Distribution Interconnect (EDI). The EDI transmits debug events, which are 1-bit signals, between debug entities in different areas of the Network-on-Chip based Multi-Processor System-on-Chip. The EDI replicates the NoC topology with an EDI node instantiated for each underlying NoC data module. Contention in the EDI node is handled by replicating the EDI in layers. The EDI generation is automatic, and uses as input the cross-triggering patterns that are not required to follow the communication patterns in the NoC. The generation and routing tool is also presented in this paper. The EDI is evaluated with four different implementations varying complexity and handling of contention. The area of a single EDI Layer is around 0.9% of the area occupied by the tested NoCs, using the lower area implementation. These results show that the proposed implementation of the EDI incurs low cost on the overall system.
{"title":"Architecture and design flow for a debug event distribution interconnect","authors":"A. Azevedo, B. Vermeulen, K. Goossens","doi":"10.1109/ICCD.2012.6378676","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378676","url":null,"abstract":"In this paper, we describe and analyze the architecture of the proposed Debug Event Distribution Interconnect (EDI). The EDI transmits debug events, which are 1-bit signals, between debug entities in different areas of the Network-on-Chip based Multi-Processor System-on-Chip. The EDI replicates the NoC topology with an EDI node instantiated for each underlying NoC data module. Contention in the EDI node is handled by replicating the EDI in layers. The EDI generation is automatic, and uses as input the cross-triggering patterns that are not required to follow the communication patterns in the NoC. The generation and routing tool is also presented in this paper. The EDI is evaluated with four different implementations varying complexity and handling of contention. The area of a single EDI Layer is around 0.9% of the area occupied by the tested NoCs, using the lower area implementation. These results show that the proposed implementation of the EDI incurs low cost on the overall system.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122934992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378695
Jin-Tai Yan, Zhi-Wei Chen
Based on the equilibrium concept of inserting load in a physical balance, the insertion of redundant wires can be used to minimize the clock skew in an OPE-predicted clock tree. For five tested benchmarks, the experimental results show that our proposed algorithm only increases 2.8% of the total load on the average for the insertion of OPE-predicted redundant wires and decreases 30.85 ps of the clock skew on the average to obtain the near zero-skew result in reasonable CPU time.
{"title":"Post-layout OPE-predicted redundant wire insertion for clock skew minimization","authors":"Jin-Tai Yan, Zhi-Wei Chen","doi":"10.1109/ICCD.2012.6378695","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378695","url":null,"abstract":"Based on the equilibrium concept of inserting load in a physical balance, the insertion of redundant wires can be used to minimize the clock skew in an OPE-predicted clock tree. For five tested benchmarks, the experimental results show that our proposed algorithm only increases 2.8% of the total load on the average for the insertion of OPE-predicted redundant wires and decreases 30.85 ps of the clock skew on the average to obtain the near zero-skew result in reasonable CPU time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126038018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378696
Qiong Zhao, Jiang Hu
Track assignment is a critical step between global routing and detailed routing in modern VLSI chip designs. Crosstalk, which is largely decided by wire adjacency, has significant impact on interconnect delay and circuit performance. Therefore, the amount of crosstalk should be restrained in order to satisfy timing constraints. In this work, a novel track assignment algorithm is proposed to reduce crosstalk-induced performance degradation. The problem is formulated as a Traveling Salesman Problem (TSP) and solved by a graph-based heuristic. Experimental results on the ISPD2011 benchmark circuits show that the violations on crosstalk bounds can be reduced by up to 99.56% compared to the conventional non-constraint-based heuristics.
{"title":"Track assignment considering crosstalk-induced performance degradation","authors":"Qiong Zhao, Jiang Hu","doi":"10.1109/ICCD.2012.6378696","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378696","url":null,"abstract":"Track assignment is a critical step between global routing and detailed routing in modern VLSI chip designs. Crosstalk, which is largely decided by wire adjacency, has significant impact on interconnect delay and circuit performance. Therefore, the amount of crosstalk should be restrained in order to satisfy timing constraints. In this work, a novel track assignment algorithm is proposed to reduce crosstalk-induced performance degradation. The problem is formulated as a Traveling Salesman Problem (TSP) and solved by a graph-based heuristic. Experimental results on the ISPD2011 benchmark circuits show that the violations on crosstalk bounds can be reduced by up to 99.56% compared to the conventional non-constraint-based heuristics.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121855916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378653
Tosiron Adegbija, A. Gordon-Ross, Arslan Munir
Phase-based tuning specializes a system's tunable parameters to the varying runtime requirements of an application's different phases of execution to meet optimization goals. Since the design space for tunable systems can be very large, one of the major challenges in phase-based tuning is determining the best configuration for each phase without incurring significant tuning overhead (e.g., energy and/or performance) during design space exploration. In this paper, we propose phase distance mapping, which directly determines the best configuration for a phase, thereby eliminating design space exploration. Phase distance mapping applies the correlation between a known phase's characteristics and best configuration to determine a new phase's best configuration based on the new phase's characteristics. Experimental results verify that our phase distance mapping approach determines configurations within 3% of the optimal configurations on average and yields an energy delay product savings of 26% on average.
{"title":"Dynamic phase-based tuning for embedded systems using phase distance mapping","authors":"Tosiron Adegbija, A. Gordon-Ross, Arslan Munir","doi":"10.1109/ICCD.2012.6378653","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378653","url":null,"abstract":"Phase-based tuning specializes a system's tunable parameters to the varying runtime requirements of an application's different phases of execution to meet optimization goals. Since the design space for tunable systems can be very large, one of the major challenges in phase-based tuning is determining the best configuration for each phase without incurring significant tuning overhead (e.g., energy and/or performance) during design space exploration. In this paper, we propose phase distance mapping, which directly determines the best configuration for a phase, thereby eliminating design space exploration. Phase distance mapping applies the correlation between a known phase's characteristics and best configuration to determine a new phase's best configuration based on the new phase's characteristics. Experimental results verify that our phase distance mapping approach determines configurations within 3% of the optimal configurations on average and yields an energy delay product savings of 26% on average.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114288113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378661
Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, O. Mutlu
Phase change memory (PCM) is a promising technology that can offer higher capacity than DRAM. Unfortunately, PCM's access latency and energy are higher than DRAM's and its endurance is lower. Many DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, to achieve the low access latency and energy, and high endurance of DRAM, while taking advantage of PCM's large capacity. A key question is what data to cache in DRAM to best exploit the advantages of each technology while avoiding its disadvantages as much as possible. We propose a new caching policy that improves hybrid memory performance and energy efficiency. Our observation is that both DRAM and PCM banks employ row buffers that act as a cache for the most recently accessed memory row. Accesses that are row buffer hits incur similar latencies (and energy consumption) in DRAM and PCM, whereas accesses that are row buffer misses incur longer latencies (and higher energy consumption) in PCM. To exploit this, we devise a policy that avoids accessing in PCM data that frequently causes row buffer misses because such accesses are costly in terms of both latency and energy. Our policy tracks the row buffer miss counts of recently used rows in PCM, and caches in DRAM the rows that are predicted to incur frequent row buffer misses. Our proposed caching policy also takes into account the high write latencies of PCM, in addition to row buffer locality. Compared to a conventional DRAM-PCM hybrid memory system, our row buffer locality-aware caching policy improves system performance by 14% and energy efficiency by 10% on data-intensive server and cloud-type workloads. The proposed policy achieves 31% performance gain over an all-PCM memory system, and comes within 29% of the performance of an allDRAM memory system (not taking PCM's capacity benefit into account) on evaluated workloads.
{"title":"Row buffer locality aware caching policies for hybrid memories","authors":"Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, O. Mutlu","doi":"10.1109/ICCD.2012.6378661","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378661","url":null,"abstract":"Phase change memory (PCM) is a promising technology that can offer higher capacity than DRAM. Unfortunately, PCM's access latency and energy are higher than DRAM's and its endurance is lower. Many DRAM-PCM hybrid memory systems use DRAM as a cache to PCM, to achieve the low access latency and energy, and high endurance of DRAM, while taking advantage of PCM's large capacity. A key question is what data to cache in DRAM to best exploit the advantages of each technology while avoiding its disadvantages as much as possible. We propose a new caching policy that improves hybrid memory performance and energy efficiency. Our observation is that both DRAM and PCM banks employ row buffers that act as a cache for the most recently accessed memory row. Accesses that are row buffer hits incur similar latencies (and energy consumption) in DRAM and PCM, whereas accesses that are row buffer misses incur longer latencies (and higher energy consumption) in PCM. To exploit this, we devise a policy that avoids accessing in PCM data that frequently causes row buffer misses because such accesses are costly in terms of both latency and energy. Our policy tracks the row buffer miss counts of recently used rows in PCM, and caches in DRAM the rows that are predicted to incur frequent row buffer misses. Our proposed caching policy also takes into account the high write latencies of PCM, in addition to row buffer locality. Compared to a conventional DRAM-PCM hybrid memory system, our row buffer locality-aware caching policy improves system performance by 14% and energy efficiency by 10% on data-intensive server and cloud-type workloads. The proposed policy achieves 31% performance gain over an all-PCM memory system, and comes within 29% of the performance of an allDRAM memory system (not taking PCM's capacity benefit into account) on evaluated workloads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378688
Alena Simalatsar, Liangpeng Guo, M. Bozga, R. Passerone
Design correctness and performance are major issues which are usually considered separately, and with different emphasis, by traditional system design flows. In this paper we show that one can meaningfully connect and benefit from the advantages of two design frameworks, with different design goals. We consider BIP for high-level rigorous design and correct-by-construction implementation, and metroII, for low-level platform-based design and performance evaluation.
{"title":"Integration of correct-by-construction BIP models into the MetroII design space exploration flow","authors":"Alena Simalatsar, Liangpeng Guo, M. Bozga, R. Passerone","doi":"10.1109/ICCD.2012.6378688","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378688","url":null,"abstract":"Design correctness and performance are major issues which are usually considered separately, and with different emphasis, by traditional system design flows. In this paper we show that one can meaningfully connect and benefit from the advantages of two design frameworks, with different design goals. We consider BIP for high-level rigorous design and correct-by-construction implementation, and metroII, for low-level platform-based design and performance evaluation.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114251520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378654
Zhen-Hao Zhang, Dong Tong, Xiaoyin Wang, Jiangfang Yi, Keyi Wang
Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor scalability and high energy consumption. Recently proposals only focus on improving the LSQ scalability to increase the in-flight instruction capacity, but with poor performance improvement and energy efficiency. This paper presents a novel speculative store-load forwarding mechanism, named SOLE (speculative one-cycle load execution)1. Firstly, SOLE uses address identifiers to determine the memory disambiguation, rather than the exact memory addresses as the traditional LSQ does. Since the address identifier is just simple hash from the address base and offset, the speculative store-load forwarding could be advanced earlier to reduce the load execution latency and avoid unnecessary energy consumption by filtering unnecessary accesses to the data cache. Secondly, SOLE enlarges the forwarding communication range by using SSN (store sequential number) to determine the age order between stores, which further improves the performance. Finally, the implementation of SOLE all uses set-associative structures that avoid the non-scalable problem of CAM-based LSQ. Experiments show that performance of SOLE outperforms the traditional LSQ by 13.57% in terms of performance, with only 75.2% execution energy consumption of the loads and stores.
{"title":"SOLE: Speculative one-cycle load execution with scalability, high-performance and energy-efficiency","authors":"Zhen-Hao Zhang, Dong Tong, Xiaoyin Wang, Jiangfang Yi, Keyi Wang","doi":"10.1109/ICCD.2012.6378654","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378654","url":null,"abstract":"Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor scalability and high energy consumption. Recently proposals only focus on improving the LSQ scalability to increase the in-flight instruction capacity, but with poor performance improvement and energy efficiency. This paper presents a novel speculative store-load forwarding mechanism, named SOLE (speculative one-cycle load execution)1. Firstly, SOLE uses address identifiers to determine the memory disambiguation, rather than the exact memory addresses as the traditional LSQ does. Since the address identifier is just simple hash from the address base and offset, the speculative store-load forwarding could be advanced earlier to reduce the load execution latency and avoid unnecessary energy consumption by filtering unnecessary accesses to the data cache. Secondly, SOLE enlarges the forwarding communication range by using SSN (store sequential number) to determine the age order between stores, which further improves the performance. Finally, the implementation of SOLE all uses set-associative structures that avoid the non-scalable problem of CAM-based LSQ. Experiments show that performance of SOLE outperforms the traditional LSQ by 13.57% in terms of performance, with only 75.2% execution energy consumption of the loads and stores.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126938621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378645
Pavlos M. Mattheakis, C. Sotiriou, P. Beerel
FSM and PTnet control models are pertinent in both software and hardware applications as both specification and implementation models. The state-based, monolithic FSM model is directly implementable in software or hardware, but cannot model concurrency without state explosion. Interacting FSM models have so far lacked the formal rigor for expressing the synchronising interactions between different FSMs. The event-based, PTnet model is able to model both concurrency and choice within the same model, however lacks a polynomial time flow to implementation, as current methods of exposing the event state space require a potentially exponential number of states. In this work, we present a polynomial complexity flow for transforming a Free-Choice PTnet into a new formalism for Interacting FSMs, i.e Multiple, Synchronised FSMs (MSFSMs), a compact Interacting FSMs model, potentially implementable using any existing monolithic FSM implementation method. We believe that such a flow can in the long term bridge the event and state-based models. We present execution time and state space results of exercising our flow on 25 large PTnet specifications, describing asynchronous control circuits, and contrast our results to the popular Petrify tool for PTnet state space exploration and circuit implementation. Our results indicate a very significant reduction in both state space size and execution time.
{"title":"A polynomial time flow for implementing free-choice Petri-nets","authors":"Pavlos M. Mattheakis, C. Sotiriou, P. Beerel","doi":"10.1109/ICCD.2012.6378645","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378645","url":null,"abstract":"FSM and PTnet control models are pertinent in both software and hardware applications as both specification and implementation models. The state-based, monolithic FSM model is directly implementable in software or hardware, but cannot model concurrency without state explosion. Interacting FSM models have so far lacked the formal rigor for expressing the synchronising interactions between different FSMs. The event-based, PTnet model is able to model both concurrency and choice within the same model, however lacks a polynomial time flow to implementation, as current methods of exposing the event state space require a potentially exponential number of states. In this work, we present a polynomial complexity flow for transforming a Free-Choice PTnet into a new formalism for Interacting FSMs, i.e Multiple, Synchronised FSMs (MSFSMs), a compact Interacting FSMs model, potentially implementable using any existing monolithic FSM implementation method. We believe that such a flow can in the long term bridge the event and state-based models. We present execution time and state space results of exercising our flow on 25 large PTnet specifications, describing asynchronous control circuits, and contrast our results to the popular Petrify tool for PTnet state space exploration and circuit implementation. Our results indicate a very significant reduction in both state space size and execution time.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124508465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}