StarSS is a parallel programming model that eases the task of the programmer. He or she has to identify the tasks that can potentially be executed in parallel and the inputs and outputs of these tasks, while the runtime system takes care of the difficult issues of determining inter task dependencies, synchronization, load balancing, scheduling to optimize data locality, etc. Given these issues, however, the runtime system might become a bottleneck that limits the scalability of the system. The contribution of this paper is two-fold. First, we analyze the scalability of the current software runtime system for several synthetic benchmarks with different dependency patterns and task sizes. We show that for fine-grained tasks the system does not scale beyond five cores. Furthermore, we identify the main scalability bottlenecks of the runtime system. Second, we present the design of Nexus, a hardware support system for StarSS applications, that greatly reduces the task management overhead.
{"title":"A Case for Hardware Task Management Support for the StarSS Programming Model","authors":"C. Meenderinck, B. Juurlink","doi":"10.1109/DSD.2010.63","DOIUrl":"https://doi.org/10.1109/DSD.2010.63","url":null,"abstract":"StarSS is a parallel programming model that eases the task of the programmer. He or she has to identify the tasks that can potentially be executed in parallel and the inputs and outputs of these tasks, while the runtime system takes care of the difficult issues of determining inter task dependencies, synchronization, load balancing, scheduling to optimize data locality, etc. Given these issues, however, the runtime system might become a bottleneck that limits the scalability of the system. The contribution of this paper is two-fold. First, we analyze the scalability of the current software runtime system for several synthetic benchmarks with different dependency patterns and task sizes. We show that for fine-grained tasks the system does not scale beyond five cores. Furthermore, we identify the main scalability bottlenecks of the runtime system. Second, we present the design of Nexus, a hardware support system for StarSS applications, that greatly reduces the task management overhead.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.
{"title":"Low Power FPGA Implementations of 256-bit Luffa Hash Function","authors":"P. Kitsos, N. Sklavos, A. Skodras","doi":"10.1109/DSD.2010.19","DOIUrl":"https://doi.org/10.1109/DSD.2010.19","url":null,"abstract":"Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121624968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the huge potential, value predictors have not been used in modern processors. This is partially due to the complex structures associated with such predictors. In this paper we study value predictors and investigate solutions to reduce storage requirements while imposing negligible coverage cost. Our solutions build on the observation that conventional value predictors do not utilize storage efficiently as they allocate too much space for small and frequently appearing values. We measure data width requirement and entropy in a subset of predictor resources and show that values stored in predictors show limited sizes and very small entropy. We exploit this behavior and suggest different bit sharing solutions for predictors storing single byte values.
{"title":"Storage-Aware Value Prediction","authors":"M. Salehi, A. Baniasadi","doi":"10.1109/DSD.2010.70","DOIUrl":"https://doi.org/10.1109/DSD.2010.70","url":null,"abstract":"Despite the huge potential, value predictors have not been used in modern processors. This is partially due to the complex structures associated with such predictors. In this paper we study value predictors and investigate solutions to reduce storage requirements while imposing negligible coverage cost. Our solutions build on the observation that conventional value predictors do not utilize storage efficiently as they allocate too much space for small and frequently appearing values. We measure data width requirement and entropy in a subset of predictor resources and show that values stored in predictors show limited sizes and very small entropy. We exploit this behavior and suggest different bit sharing solutions for predictors storing single byte values.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121721063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Application dependent FPGA testing can reduce time and memory requirements comparing with the tests that exercise complete FPGA structure. This paper describes a methodology of FPGA testing that does not require reconfiguration of the tested hardware and thus it preserves conditions that caused erroneous behavior of the FPGA during its function. We show that the tested part of the FPGA can be efficiently tested by deterministic test patters even in case if we have no precise information about the internal FPGA structure. It is too hardware consuming to store uncompressed deterministic test patterns on the FPGA. From this reason we propose to compress the deterministic test patterns with the help of COMPAS – a compression system that uses scan chains for pattern decompression. COMPAS is well suited for current FPGAs as they can store the scan chain content in the LUT based shift registers. The COMPAS test compression system is based on test pattern overlapping, we propose an improved version of it. Application of overlapped test patterns requires additional shift registers for saving test patterns during test response recording into the internal scan chains. The neighborhood of the tested part of the FPGA can be dynamically reconfigured into shift registers and ORA. The shift registers contain compressed test sequence and allow fast test pattern decompression. Experimental results given in the paper demonstrate efficiency of the proposed FPGA tetste testing method.
{"title":"Application Dependent FPGA Testing Method","authors":"M. Rozkovec, Jiri Jenícek, O. Novák","doi":"10.1109/DSD.2010.65","DOIUrl":"https://doi.org/10.1109/DSD.2010.65","url":null,"abstract":"Application dependent FPGA testing can reduce time and memory requirements comparing with the tests that exercise complete FPGA structure. This paper describes a methodology of FPGA testing that does not require reconfiguration of the tested hardware and thus it preserves conditions that caused erroneous behavior of the FPGA during its function. We show that the tested part of the FPGA can be efficiently tested by deterministic test patters even in case if we have no precise information about the internal FPGA structure. It is too hardware consuming to store uncompressed deterministic test patterns on the FPGA. From this reason we propose to compress the deterministic test patterns with the help of COMPAS – a compression system that uses scan chains for pattern decompression. COMPAS is well suited for current FPGAs as they can store the scan chain content in the LUT based shift registers. The COMPAS test compression system is based on test pattern overlapping, we propose an improved version of it. Application of overlapped test patterns requires additional shift registers for saving test patterns during test response recording into the internal scan chains. The neighborhood of the tested part of the FPGA can be dynamically reconfigured into shift registers and ORA. The shift registers contain compressed test sequence and allow fast test pattern decompression. Experimental results given in the paper demonstrate efficiency of the proposed FPGA tetste testing method.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128060111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sameer D. Sahasrabuddhe, S. Subramanian, Kunal P. Ghosh, K. Arya, M. Desai
We present a high-level synthesis flow for mapping an algorithm description (in C) to a provably equivalent register transfer level (RTL) description of hardware. This flow uses an intermediate representation which is an orthogonal factorization of the program behavior into control, data and memory aspects, and is suitable for the description of large systems. We show that optimizations such as arbiter-less resource sharing can be efficiently computed on this representation. We apply the flow to a wide range of examples ranging from stream ciphers to database and linear algebra applications. The resulting RTL is then put through a standard ASIC tool chain (synthesis followed by automatic place-and-route), and the performance and power dissipation of the resulting layout is computed. We observe that the energy consumption (per completed task) of each resulting circuit is considerably lower than that of an equivalent executable running on a low-power processor, indicating that this C-to-RTL flow offers an energy efficient alternative to the use of embedded processors in mapping algorithms to digital VLSI systems.
{"title":"A C-to-RTL Flow as an Energy Efficient Alternative to Embedded Processors in Digital Systems","authors":"Sameer D. Sahasrabuddhe, S. Subramanian, Kunal P. Ghosh, K. Arya, M. Desai","doi":"10.1109/DSD.2010.52","DOIUrl":"https://doi.org/10.1109/DSD.2010.52","url":null,"abstract":"We present a high-level synthesis flow for mapping an algorithm description (in C) to a provably equivalent register transfer level (RTL) description of hardware. This flow uses an intermediate representation which is an orthogonal factorization of the program behavior into control, data and memory aspects, and is suitable for the description of large systems. We show that optimizations such as arbiter-less resource sharing can be efficiently computed on this representation. We apply the flow to a wide range of examples ranging from stream ciphers to database and linear algebra applications. The resulting RTL is then put through a standard ASIC tool chain (synthesis followed by automatic place-and-route), and the performance and power dissipation of the resulting layout is computed. We observe that the energy consumption (per completed task) of each resulting circuit is considerably lower than that of an equivalent executable running on a low-power processor, indicating that this C-to-RTL flow offers an energy efficient alternative to the use of embedded processors in mapping algorithms to digital VLSI systems.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127047891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Delay faults in content addressable memories (CAMs) is a major concern in many applications such as network routers, IP filters, longest prefix matching (LPM) search engines and cache tags where high speed data search is significant. It creates the need for analysis of critical paths and detecting associated faults using a minimum number of test patterns. This paper proposes a test method to detect critical path delay faults in CAM systems using a newly proposed low power TCAM cell structure. The proposed complement bit walk (CBW) algorithms are using low time complexity such as 3m+n and 2m+2n operations. The fault simulation of the given TCAM system provides 100% fault coverage for the write, search and pseudo logic faults.
{"title":"Path-Delay Fault Testing in Embedded Content Addressable Memories","authors":"P. Manikandan, Bjørn B. Larsen, E. Aas","doi":"10.1109/DSD.2010.48","DOIUrl":"https://doi.org/10.1109/DSD.2010.48","url":null,"abstract":"Delay faults in content addressable memories (CAMs) is a major concern in many applications such as network routers, IP filters, longest prefix matching (LPM) search engines and cache tags where high speed data search is significant. It creates the need for analysis of critical paths and detecting associated faults using a minimum number of test patterns. This paper proposes a test method to detect critical path delay faults in CAM systems using a newly proposed low power TCAM cell structure. The proposed complement bit walk (CBW) algorithms are using low time complexity such as 3m+n and 2m+2n operations. The fault simulation of the given TCAM system provides 100% fault coverage for the write, search and pseudo logic faults.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126034672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.
{"title":"Re-NUCA: Boosting CMP Performance Through Block Replication","authors":"P. Foglia, C. Prete, M. Solinas, Giovanna Monni","doi":"10.1109/DSD.2010.41","DOIUrl":"https://doi.org/10.1109/DSD.2010.41","url":null,"abstract":"Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127292852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Rutgers, P. T. Wolkotte, P. Hölzenspies, J. Kuper, G. Smit
This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.
{"title":"An Approximate Maximum Common Subgraph Algorithm for Large Digital Circuits","authors":"J. Rutgers, P. T. Wolkotte, P. Hölzenspies, J. Kuper, G. Smit","doi":"10.1109/DSD.2010.29","DOIUrl":"https://doi.org/10.1109/DSD.2010.29","url":null,"abstract":"This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121316926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Cornelius, Philipp Gorski, S. Kubisch, D. Timmermann
Several alternatives of mesh-type topologies have been published for the use in Networks-on-Chip. Due to their regularity, mesh-type topologies often serve as a foundation to investigate new ideas or to customize the topology to application-specific needs. This paper analyzes existing mesh-type topologies and compares their characteristics in terms of communication and implementation costs. Furthermore, this paper proposes BEAM (Border-Enhanced Mesh) − a mesh-type topology for Networks-on-Chip. BEAM uses concentration while necessitating only low-radix routers. Thereto, additional resources are connected to the outer boundaries of a conventional mesh. As a result, overall bandwidth is traded off against hardware overhead. In conclusion, simulation and synthesis results show that the conventional mesh stands out due to its communication performance, whereas clustered and concentrated topologies offer the least hardware overhead. BEAM ranges in between and is an option to balance hardware costs and communication performance.
{"title":"Trading Hardware Overhead for Communication Performance in Mesh-Type Topologies","authors":"C. Cornelius, Philipp Gorski, S. Kubisch, D. Timmermann","doi":"10.1109/DSD.2010.67","DOIUrl":"https://doi.org/10.1109/DSD.2010.67","url":null,"abstract":"Several alternatives of mesh-type topologies have been published for the use in Networks-on-Chip. Due to their regularity, mesh-type topologies often serve as a foundation to investigate new ideas or to customize the topology to application-specific needs. This paper analyzes existing mesh-type topologies and compares their characteristics in terms of communication and implementation costs. Furthermore, this paper proposes BEAM (Border-Enhanced Mesh) − a mesh-type topology for Networks-on-Chip. BEAM uses concentration while necessitating only low-radix routers. Thereto, additional resources are connected to the outer boundaries of a conventional mesh. As a result, overall bandwidth is traded off against hardware overhead. In conclusion, simulation and synthesis results show that the conventional mesh stands out due to its communication performance, whereas clustered and concentrated topologies offer the least hardware overhead. BEAM ranges in between and is an option to balance hardware costs and communication performance.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126169875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is going to address the topic of hardware/software systems co-design. The paper will develop two points of view. First, it provides a system-theoretical layout on the problem of designing hardware-software systems. This layout will enable the designer to proceed systematically in optimizing the tradeoff between the desired functionality, available resources and operating conditions. Second, the paper will describe an application of some of the theoretical principles to the design of an embedded automotive system built on a low-cost FPGA.
{"title":"A Design Process for Hardware/Software System Co-design and its Application to Designing a Reconfigurable FPGA","authors":"F. Moreno, I. López, R. Sanz","doi":"10.1109/DSD.2010.43","DOIUrl":"https://doi.org/10.1109/DSD.2010.43","url":null,"abstract":"This paper is going to address the topic of hardware/software systems co-design. The paper will develop two points of view. First, it provides a system-theoretical layout on the problem of designing hardware-software systems. This layout will enable the designer to proceed systematically in optimizing the tradeoff between the desired functionality, available resources and operating conditions. Second, the paper will describe an application of some of the theoretical principles to the design of an embedded automotive system built on a low-cost FPGA.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128447527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}