Pub Date : 2001-01-18DOI: 10.1109/IWIA.2001.955192
A.-M. Badulescu, A. Veidenbaum
The paper focuses on reducing power in instruction cache by eliminating the fetching of instructions that are not needed from a cache line. We propose a mechanism that predicts which instructions are going to be used out of a cache line before that line is fetched into the instruction buffer. The average instruction cache power savings obtained by using our fetch predictor is 22% for SPEC95 benchmark suite.
{"title":"Power efficient instruction cache for wide-issue processors","authors":"A.-M. Badulescu, A. Veidenbaum","doi":"10.1109/IWIA.2001.955192","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955192","url":null,"abstract":"The paper focuses on reducing power in instruction cache by eliminating the fetching of instructions that are not needed from a cache line. We propose a mechanism that predicts which instructions are going to be used out of a cache line before that line is fetched into the instruction buffer. The average instruction cache power savings obtained by using our fetch predictor is 22% for SPEC95 benchmark suite.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121327867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955200
Robert A. van Engelen, K. Gallivan
The complexity of Digital Signal Processing (DSP) applications has been steadily increasing due to advances in hardware design for embedded processors. To meet critical power consumption and timing constraints, many DSP applications are hand-coded in assembly. Because the cost of hand-coding is becoming prohibitive for developing an embedded system, there is a trend toward the use of high-level programming languages, particularly C, and the use of optimizing compilers for software development. Consequently, more than ever there is a need for compilers to optimize DSP application to make effective use of the available hardware resources. Existing DSP codes are often riddled with pointer-based data accesses, because DSP programmers have the mistaken belief that a compiler will always generate better target code. The use of extensive pointer arithmetic makes analysis and optimization difficult for compilers for modern DSPs with regular architectures and large homogeneous registers sets. In this paper, we present a novel algorithm for converting pointer-based code to code with explicit array accesses. The conversion enables a compiler to perform data flow analysis and loop optimizations on DSP codes.
{"title":"An efficient algorithm for pointer-to-array access conversion for compiling and optimizing DSP applications","authors":"Robert A. van Engelen, K. Gallivan","doi":"10.1109/IWIA.2001.955200","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955200","url":null,"abstract":"The complexity of Digital Signal Processing (DSP) applications has been steadily increasing due to advances in hardware design for embedded processors. To meet critical power consumption and timing constraints, many DSP applications are hand-coded in assembly. Because the cost of hand-coding is becoming prohibitive for developing an embedded system, there is a trend toward the use of high-level programming languages, particularly C, and the use of optimizing compilers for software development. Consequently, more than ever there is a need for compilers to optimize DSP application to make effective use of the available hardware resources. Existing DSP codes are often riddled with pointer-based data accesses, because DSP programmers have the mistaken belief that a compiler will always generate better target code. The use of extensive pointer arithmetic makes analysis and optimization difficult for compilers for modern DSPs with regular architectures and large homogeneous registers sets. In this paper, we present a novel algorithm for converting pointer-based code to code with explicit array accesses. The conversion enables a compiler to perform data flow analysis and loop optimizations on DSP codes.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128224906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955201
M. Yokokawa
The Earth Simulator is an ultra high-speed supercomputer. The research and development of the Earth Simulator was initiated in 1997 as one of the approaches in the Earth Simulator project which aims at promotion of research and development for understanding and prediction of global environmental changes. The Earth Simulator is a distributed memory parallel system which consists of 640 processor nodes connected by a single-stage full crossbar switch. Each processor node is a shared memory system composed of eight vector processors. The total peak performance and main memory capacity are 40Tflop/s and 10TB, respectively. In this paper, a concept of the Earth Simulator and an outline of the Earth Simulator system are described.
{"title":"Present status of development of the Earth Simulator","authors":"M. Yokokawa","doi":"10.1109/IWIA.2001.955201","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955201","url":null,"abstract":"The Earth Simulator is an ultra high-speed supercomputer. The research and development of the Earth Simulator was initiated in 1997 as one of the approaches in the Earth Simulator project which aims at promotion of research and development for understanding and prediction of global environmental changes. The Earth Simulator is a distributed memory parallel system which consists of 640 processor nodes connected by a single-stage full crossbar switch. Each processor node is a shared memory system composed of eight vector processors. The total peak performance and main memory capacity are 40Tflop/s and 10TB, respectively. In this paper, a concept of the Earth Simulator and an outline of the Earth Simulator system are described.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130340797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955198
S. Hiroyuki, Y. Teruhiko
Recently loop unrolling has been shown in a new light from the superscalar architectural point of view. In this paper, we show that in addition to superscalar effect and scalar replacement effect, loop unrolling can hide memory latency, and that the combination of those effects improve the performance of loop unrolling. A major contribution of this paper is that the analysis is done symbolically and quantitatively. Although they have been known as major reasons that affect the performance of loop unrolling, no quantitative approach has not been tried. Our analysis can make clear the behaviour of superscalar functions and memory latency hiding in loop unrolling.
{"title":"Characteristics of loop unrolling effect: software pipelining and memory latency hiding","authors":"S. Hiroyuki, Y. Teruhiko","doi":"10.1109/IWIA.2001.955198","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955198","url":null,"abstract":"Recently loop unrolling has been shown in a new light from the superscalar architectural point of view. In this paper, we show that in addition to superscalar effect and scalar replacement effect, loop unrolling can hide memory latency, and that the combination of those effects improve the performance of loop unrolling. A major contribution of this paper is that the analysis is done symbolically and quantitatively. Although they have been known as major reasons that affect the performance of loop unrolling, no quantitative approach has not been tried. Our analysis can make clear the behaviour of superscalar functions and memory latency hiding in loop unrolling.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133865980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes an operating system CEFOS (Communication and Execution Fusion OS), which fuses interprocessor communication and intra processor computation. Fusion of communications and internal executions is achieved both in executions and function interfaces in CEFOS. CEFOS is based on a fine-grain multi-threading approach. In the fine-grain thread control, one of the major problems is how to reduce the frequency of context switching and communication between user process and CEFOS Kernel. The important point to resolve this problem is to design the environment which supports an efficient cooperation environment between user process and CEFOS kernel. We propose a Display Request and Data (DRD) function and Wrapped System Call (WSC) mechanism with DRD function, which provide efficient cooperation between user process and CEFOS Kernel. DRD and WSC reduce the number of invocations from user process threads to the OS kernel and achieve high speeds thread switching.
本文提出一种融合处理器间通信和处理器内计算的操作系统CEFOS (Communication and Execution Fusion OS)。CEFOS在执行和功能接口上都实现了通信和内部执行的融合。CEFOS基于细粒度多线程方法。在细粒度线程控制中,如何减少用户进程与CEFOS内核之间的上下文切换和通信频率是一个主要问题。解决这一问题的关键在于设计一个支持用户进程和CEFOS内核之间高效协作的环境。我们提出了显示请求和数据(DRD)函数和带DRD函数的包装系统调用(WSC)机制,提供了用户进程和CEFOS内核之间的高效协作。DRD和WSC减少了从用户进程线程到操作系统内核的调用次数,实现了高速线程切换。
{"title":"Wrapped system call in communication and execution fusion OS: CEFOS","authors":"Hiroshi Nakayama, Takuya Tanabayashi, Makoto Amamiya","doi":"10.1109/IWIA.2001.955199","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955199","url":null,"abstract":"This paper proposes an operating system CEFOS (Communication and Execution Fusion OS), which fuses interprocessor communication and intra processor computation. Fusion of communications and internal executions is achieved both in executions and function interfaces in CEFOS. CEFOS is based on a fine-grain multi-threading approach. In the fine-grain thread control, one of the major problems is how to reduce the frequency of context switching and communication between user process and CEFOS Kernel. The important point to resolve this problem is to design the environment which supports an efficient cooperation environment between user process and CEFOS kernel. We propose a Display Request and Data (DRD) function and Wrapped System Call (WSC) mechanism with DRD function, which provide efficient cooperation between user process and CEFOS Kernel. DRD and WSC reduce the number of invocations from user process threads to the OS kernel and achieve high speeds thread switching.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122754439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955191
J. T. Zawodny, P. Kogge
The new technology of Processing-In-Memory now allows relatively large DRAM memory macros to be positioned on the same die with processing logic. Despite the high bandwidth and low latency possible with such macros, more of both is always better. Classical techniques such as caching are typically used for such performance gains, but at the cost of high power. The paper summarizes some recent work into the potential of utilizing structures within such memory macros as cache substitutes, and under what conditions power savings may result.
{"title":"Cache-In-Memory","authors":"J. T. Zawodny, P. Kogge","doi":"10.1109/IWIA.2001.955191","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955191","url":null,"abstract":"The new technology of Processing-In-Memory now allows relatively large DRAM memory macros to be positioned on the same die with processing logic. Despite the high bandwidth and low latency possible with such macros, more of both is always better. Classical techniques such as caching are typically used for such performance gains, but at the cost of high power. The paper summarizes some recent work into the potential of utilizing structures within such memory macros as cache substitutes, and under what conditions power savings may result.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116524904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955202
T. Matsuzaki, H. Tomiyasu, M. Amamiya
This paper proposes an on-chip-memory processor architecture: FUCE. FUCE means Fusion of Communication and Execution. The goal of the FUCE processor project is fusing the intra processor execution and inter processor communication. In order to achieve this goal, the FUCE processor integrates the processor units, memory units and communication units into a chip. FUCE Processor provides a next generation memory system architecture. In this architecture, no data cache memory is required, since memory access latency can be hidden due to the simultaneous multithreading mechanism and the on-chip-memory system with broad-bandwidth low latency internal bus of FUCE Processor. This approach can reduce the performance gap between instruction execution, and memory and network accesses.
{"title":"An architecture of on-chip-memory multi-threading processor","authors":"T. Matsuzaki, H. Tomiyasu, M. Amamiya","doi":"10.1109/IWIA.2001.955202","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955202","url":null,"abstract":"This paper proposes an on-chip-memory processor architecture: FUCE. FUCE means Fusion of Communication and Execution. The goal of the FUCE processor project is fusing the intra processor execution and inter processor communication. In order to achieve this goal, the FUCE processor integrates the processor units, memory units and communication units into a chip. FUCE Processor provides a next generation memory system architecture. In this architecture, no data cache memory is required, since memory access latency can be hidden due to the simultaneous multithreading mechanism and the on-chip-memory system with broad-bandwidth low latency internal bus of FUCE Processor. This approach can reduce the performance gap between instruction execution, and memory and network accesses.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127144391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955195
G. Bilardi, E. Peserico
The evolution of computing technology towards the ultimate physical limits makes communication the dominant cost of computing. It would then be desirable to have a framework for the study of locality, which we define as the property of an algorithm that enables implementations with reduced communication overheads. We discuss the issue of useful characterizations of the locality of an algorithm with reference to both single machines and classes of machines. We then consider the question of portability of locality. We illustrate the proposed approach with its application to the study of temporal locality, the property of an algorithm that enables efficient implementations on machines where memory accesses have a variable latency, depending on the location being accessed. We discuss how, for a fixed operation schedule, temporal locality can be characterized for interesting classes of uniform hierarchical machines by a set of metrics, the width lengths of the schedule. Moreover, a portable memory management of any schedule can be obtained for such classes of machines. The situation becomes more complex when the schedule is a degree of freedom of the implementation. Then, while some computations do admit a single schedule, optimal across many machines, this is not always the case. Thus, in general, only the less stringent notion of portability based on parametrized schedules can be pursued. Correspondingly, a concise characterization of temporal locality becomes harder to achieve and still remains an open problem.
{"title":"An approach towards an analytical characterization of locality and its portability","authors":"G. Bilardi, E. Peserico","doi":"10.1109/IWIA.2001.955195","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955195","url":null,"abstract":"The evolution of computing technology towards the ultimate physical limits makes communication the dominant cost of computing. It would then be desirable to have a framework for the study of locality, which we define as the property of an algorithm that enables implementations with reduced communication overheads. We discuss the issue of useful characterizations of the locality of an algorithm with reference to both single machines and classes of machines. We then consider the question of portability of locality. We illustrate the proposed approach with its application to the study of temporal locality, the property of an algorithm that enables efficient implementations on machines where memory accesses have a variable latency, depending on the location being accessed. We discuss how, for a fixed operation schedule, temporal locality can be characterized for interesting classes of uniform hierarchical machines by a set of metrics, the width lengths of the schedule. Moreover, a portable memory management of any schedule can be obtained for such classes of machines. The situation becomes more complex when the schedule is a degree of freedom of the implementation. Then, while some computations do admit a single schedule, optimal across many machines, this is not always the case. Thus, in general, only the less stringent notion of portability based on parametrized schedules can be pursued. Correspondingly, a concise characterization of temporal locality becomes harder to achieve and still remains an open problem.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121158874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955196
G. Bilardi, K. Ekanadham, P. Pattnaik
The time to perform a random access to main memory has been increasing for decades relative to processor speed and is currently of the order of a few hundred cycles. To alleviate this problem, one resorts to memory organizations that are hierarchical to exploit locality of the computation, and pipelinable to exploit parallelism. The goal of the study is to begin a systematic exploration of the performance advantages of such memories, achieving scalability even when the underlying principles are pushed to the limit permitted by physical laws. First, we propose memory organizations with the ability to accept requests at a constant rate without significantly affecting the latency of individual requests, which is within a constant factor of the minimum value achievable under fundamental physical constraints. Second, we discuss how the pipeline capability can be effectively exploited by memory management techniques in order to reduce execution time for applications. We conclude by outlining the issues that require further work in order to pursue systematically the potential of pipelined hierarchical memories.
{"title":"Pipelined memory hierarchies: scalable organizations and application performance","authors":"G. Bilardi, K. Ekanadham, P. Pattnaik","doi":"10.1109/IWIA.2001.955196","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955196","url":null,"abstract":"The time to perform a random access to main memory has been increasing for decades relative to processor speed and is currently of the order of a few hundred cycles. To alleviate this problem, one resorts to memory organizations that are hierarchical to exploit locality of the computation, and pipelinable to exploit parallelism. The goal of the study is to begin a systematic exploration of the performance advantages of such memories, achieving scalability even when the underlying principles are pushed to the limit permitted by physical laws. First, we propose memory organizations with the ability to accept requests at a constant rate without significantly affecting the latency of individual requests, which is within a constant factor of the minimum value achievable under fundamental physical constraints. Second, we discuss how the pipeline capability can be effectively exploited by memory management techniques in order to reduce execution time for applications. We conclude by outlining the issues that require further work in order to pursue systematically the potential of pipelined hierarchical memories.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121684687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/IWIA.2001.955193
Dmitry Ponomarev, Gurhan Kucuk, K. Ghose
We show by simulating the execution of SPEC 95 benchmarks on a true hardware-level, cycle by cycle simulator for a superscalar CPU that about half of the bytes of operands flowing on the datapath, particularly the leading bytes, are all zeros. Furthermore, a significant number of the bits within the non-zero part of the data flowing on the various paths within the processor do not change from their prior value. We show how these two facts, attesting to the lack of a high level of entropy in the data streams, can be exploited to reduce power dissipation within all explicit and implicit storage components of a typical superscalar datapath such as register files, dispatch buffers, reorder buffers, as well as interconnections such as buses and direct links. Our simulation results and SPICE measurements from representative VLSI layouts show power savings of about 25% on the average over all SPEC 95 benchmarks.
{"title":"Power reduction in superscalar datapaths through dynamic bit-slice activation","authors":"Dmitry Ponomarev, Gurhan Kucuk, K. Ghose","doi":"10.1109/IWIA.2001.955193","DOIUrl":"https://doi.org/10.1109/IWIA.2001.955193","url":null,"abstract":"We show by simulating the execution of SPEC 95 benchmarks on a true hardware-level, cycle by cycle simulator for a superscalar CPU that about half of the bytes of operands flowing on the datapath, particularly the leading bytes, are all zeros. Furthermore, a significant number of the bits within the non-zero part of the data flowing on the various paths within the processor do not change from their prior value. We show how these two facts, attesting to the lack of a high level of entropy in the data streams, can be exploited to reduce power dissipation within all explicit and implicit storage components of a typical superscalar datapath such as register files, dispatch buffers, reorder buffers, as well as interconnections such as buses and direct links. Our simulation results and SPICE measurements from representative VLSI layouts show power savings of about 25% on the average over all SPEC 95 benchmarks.","PeriodicalId":388942,"journal":{"name":"2001 Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128067132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}