E. Arnould, F. Bitz, Eric C. Cooper, H. T. Kung, Robert D. Sansom, P. Steenkiste
Nectar is a “network backplane” for use in heterogeneous multicomputers. The initial system consists of a star-shaped fiber-optic network with an aggregate bandwidth of 1.6 gigabits/second and a switching latency of 700 nanoseconds. The system can be scaled up by connecting hundreds of these networks together. The Nectar architecture provides a flexible way to handle heterogeneity and task-level parallelism. A wide variety of machines can be connected as Nectar nodes and the Nectar system software allows applications to communicate at a high level. Protocol processing is off-loaded to powerful communication processors so that nodes do not have to support a suite of network protocols. We have designed and built a prototype Nectar system that has been operational since November 1988. This paper presents the motivation and goals for Nectar and describes its hardware and software. The presentation emphasizes how the goals influenced the design decisions and led to the novel aspects of Nectar.
{"title":"The design of nectar: a network backplane for heterogeneous multicomputers","authors":"E. Arnould, F. Bitz, Eric C. Cooper, H. T. Kung, Robert D. Sansom, P. Steenkiste","doi":"10.1145/70082.68202","DOIUrl":"https://doi.org/10.1145/70082.68202","url":null,"abstract":"Nectar is a “network backplane” for use in heterogeneous multicomputers. The initial system consists of a star-shaped fiber-optic network with an aggregate bandwidth of 1.6 gigabits/second and a switching latency of 700 nanoseconds. The system can be scaled up by connecting hundreds of these networks together.\u0000The Nectar architecture provides a flexible way to handle heterogeneity and task-level parallelism. A wide variety of machines can be connected as Nectar nodes and the Nectar system software allows applications to communicate at a high level. Protocol processing is off-loaded to powerful communication processors so that nodes do not have to support a suite of network protocols.\u0000We have designed and built a prototype Nectar system that has been operational since November 1988. This paper presents the motivation and goals for Nectar and describes its hardware and software. The presentation emphasizes how the goals influenced the design decisions and led to the novel aspects of Nectar.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126182874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and full knowledge of the future. For best results, the cache should have a mechanism for excluding certain instructions designated by the compiler. This paper first presents a reduced form of the algorithm. This form is shown to produce an optimal miss rate for programs without conditionals and with a tree call graph, assuming basic blocks can be reordered at will. If conditionals are allowed, but there are no loops within conditionals, the algorithm does as well as an optimal cache for the worst case execution of the program consistent with the profile information. Next, the algorithm is extended with heuristics for general programs. The effectiveness of these heuristics are demonstrated with empirical results for a set of 10 programs for various cache sizes. The improvement depends on cache size. For a 512 word cache, miss rates for a direct-mapped instruction cache are halved. For an 8K word cache, miss rates fall by over 75%. Over a wide range of cache sizes the algorithm is as effective as increasing the cache size by a factor of 3 times. For 512 words, the algorithm generates only 32% more misses than an optimal cache. Optimized programs on a direct-mapped cache have lower miss rates than unoptimized programs on set-associative caches of the same size.
{"title":"Program optimization for instruction caches","authors":"S. McFarling","doi":"10.1145/70082.68200","DOIUrl":"https://doi.org/10.1145/70082.68200","url":null,"abstract":"This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and full knowledge of the future. For best results, the cache should have a mechanism for excluding certain instructions designated by the compiler. This paper first presents a reduced form of the algorithm. This form is shown to produce an optimal miss rate for programs without conditionals and with a tree call graph, assuming basic blocks can be reordered at will. If conditionals are allowed, but there are no loops within conditionals, the algorithm does as well as an optimal cache for the worst case execution of the program consistent with the profile information. Next, the algorithm is extended with heuristics for general programs. The effectiveness of these heuristics are demonstrated with empirical results for a set of 10 programs for various cache sizes. The improvement depends on cache size. For a 512 word cache, miss rates for a direct-mapped instruction cache are halved. For an 8K word cache, miss rates fall by over 75%. Over a wide range of cache sizes the algorithm is as effective as increasing the cache size by a factor of 3 times. For 512 words, the algorithm generates only 32% more misses than an optimal cache. Optimized programs on a direct-mapped cache have lower miss rates than unoptimized programs on set-associative caches of the same size.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114202069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address the problem of paged main memory management in the local/remote architecture subclass of shared memory multiprocessors. We consider the case where the operating system has primary responsibility and uses page migration as its main tool. We identify some of the key issues with respect to architectural support (reference history maintenance, and page size), and operating system mechanism (duration between daemon passes, and number of migration daemons). The experiments were conducted using software implemented page tables on 32-node BBN Butterfly Plus#8482;. Several numeral programs with both synthetic and real data were used as the workload. The primary conclusion is that for the cases considered migration was at best marginally effective. On the other hand, practical migration mechanisms were robust and never significantly degraded performance. The specific results include: 1) Referenced bits with aging can closely approximate Usage fields, 2) larger page sizes are beneficial except when the page is large enough to include locality sets of two processes, and 3) multiple migration daemons can be useful. Only small regions of the space of architectural, system, and workload parameters were explored. Further investigation of other parameter combinations is clearly warranted.
{"title":"Reference history, page size, and migration daemons in local/remote architectures","authors":"M. A. Holliday","doi":"10.1145/70082.68192","DOIUrl":"https://doi.org/10.1145/70082.68192","url":null,"abstract":"We address the problem of paged main memory management in the local/remote architecture subclass of shared memory multiprocessors. We consider the case where the operating system has primary responsibility and uses page migration as its main tool. We identify some of the key issues with respect to architectural support (reference history maintenance, and page size), and operating system mechanism (duration between daemon passes, and number of migration daemons).\u0000The experiments were conducted using software implemented page tables on 32-node BBN Butterfly Plus#8482;. Several numeral programs with both synthetic and real data were used as the workload. The primary conclusion is that for the cases considered migration was at best marginally effective. On the other hand, practical migration mechanisms were robust and never significantly degraded performance. The specific results include: 1) Referenced bits with aging can closely approximate Usage fields, 2) larger page sizes are beneficial except when the page is large enough to include locality sets of two processes, and 3) multiple migration daemons can be useful.\u0000Only small regions of the space of architectural, system, and workload parameters were explored. Further investigation of other parameter combinations is clearly warranted.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115932463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
1. Introduction An architectural evaluation must be based upon real programs in an actual operating environment. The ubiquitous IBM** personal computer running MS DOS@ represents an excellent test bed for architectural evaluation of Intel@ 8086 systems. There are many programs and tools available to evaluate the performance of IBM Personal Computers and compatibles; these evaluation tools are intended to relate the performance of one machine to another. Very little data is available on dynamic instruction traces in systems using an 8086. This paper reports on dynamic traces of 8086/88 programs obtained using software tracing tools (described below). The objective of this work is to analyze instruction usage and addressing modes used in actual software. The system used to obtain the dynamic instruction frequencies was a compatible running MS DOS* 3.1 with a Softpatch@ BIOS. To illustrate the RISC argument that only a few instruction types are sufficient, the 8086 results are compared with similar studies on the Motorola* 68ooO and the Digital Equipment VAX-1 l@'.
{"title":"An analysis of 8086 instruction set usage in MS DOS programs","authors":"T. L. Adams, R. E. Zimmerman","doi":"10.1145/70082.68197","DOIUrl":"https://doi.org/10.1145/70082.68197","url":null,"abstract":"1. Introduction An architectural evaluation must be based upon real programs in an actual operating environment. The ubiquitous IBM** personal computer running MS DOS@ represents an excellent test bed for architectural evaluation of Intel@ 8086 systems. There are many programs and tools available to evaluate the performance of IBM Personal Computers and compatibles; these evaluation tools are intended to relate the performance of one machine to another. Very little data is available on dynamic instruction traces in systems using an 8086. This paper reports on dynamic traces of 8086/88 programs obtained using software tracing tools (described below). The objective of this work is to analyze instruction usage and addressing modes used in actual software. The system used to obtain the dynamic instruction frequencies was a compatible running MS DOS* 3.1 with a Softpatch@ BIOS. To illustrate the RISC argument that only a few instruction types are sufficient, the 8086 results are compared with similar studies on the Motorola* 68ooO and the Digital Equipment VAX-1 l@'.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116104001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a new technique to improve the performance of cross-domain calls and returns in a capability-based computer system. Using register optimization information obtained from the compiler, a trusted linker can minimize the number of registers that must be saved, restored, or cleared when changing from one protection domain to another. The size of the performance gain depends on the level of trust between the calling and called protection domains. The paper presents alternate implementations for an extended VAX architecture and for a RISC architecture and reports performance measurements done on a re-microprogrammed VAX-11/730 processor.
{"title":"Using registers to optimize cross-domain call performance","authors":"P. Karger","doi":"10.1145/70082.68201","DOIUrl":"https://doi.org/10.1145/70082.68201","url":null,"abstract":"This paper describes a new technique to improve the performance of cross-domain calls and returns in a capability-based computer system. Using register optimization information obtained from the compiler, a trusted linker can minimize the number of registers that must be saved, restored, or cleared when changing from one protection domain to another. The size of the performance gain depends on the level of trust between the calling and called protection domains. The paper presents alternate implementations for an extended VAX architecture and for a RISC architecture and reports performance measurements done on a re-microprogrammed VAX-11/730 processor.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130097327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In this study, we use traces of parallel programs to evaluate the cache and bus performance of shared memory multiprocessors, in which coherency is maintained by a write-invalidate protocol. In particular, we analyze the effect of sharing overhead on cache miss ratio and bus utilization. Our studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine-grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.
{"title":"The effect of sharing on the cache and bus performance of parallel programs","authors":"S. Eggers, R. Katz","doi":"10.1145/70082.68206","DOIUrl":"https://doi.org/10.1145/70082.68206","url":null,"abstract":"Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In this study, we use traces of parallel programs to evaluate the cache and bus performance of shared memory multiprocessors, in which coherency is maintained by a write-invalidate protocol. In particular, we analyze the effect of sharing overhead on cache miss ratio and bus utilization.\u0000Our studies show that parallel programs incur substantially higher miss ratios and bus utilization than comparable uniprocessor programs. The sharing component of these metrics proportionally increases with both cache and block size, and for some cache configurations determines both their magnitude and trend. The amount of overhead depends on the memory reference pattern to the shared data. Programs that exhibit good per-processor-locality perform better than those with fine-grain-sharing. This suggests that parallel software writers and better compiler technology can improve program performance through better memory organization of shared data.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128266235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data-dependency, branch, and memory-access penalties are main constraints on the performance of high-speed microprocessors. The memory-access penalties concern both penalties imposed by external memory (e.g. cache) or by under utilization of the local processor memory (e.g. registers). This paper focuses solely on methods of increasing the utilization of data memory, local to the processor (registers or register-oriented buffers). A utilization increase of local processor memory is possible by means of compile-time software, run-time hardware, or a combination of both. This paper looks at data buffers which perform solely because of the compile-time software (single register sets); those which operate mainly through hardware but with possible software assistance (multiple register sets); and those intended to operate transparently with main memory implying no software assistance whatsoever (stack buffers). This paper shows that hardware buffering schemes cannot replace compile-time effort, but at most can reduce the complexity of this effort. It shows the utility increase of applying register allocation for multiple register sets. The paper also shows a potential utility decrease inherent to stack buffers. The observation that a single register set, allocated by means of interprocedural allocation, performs competitively with both multiple register set and stack buffer emphasizes the significance of the conclusion
{"title":"Data buffering: run-time versus compile-time support","authors":"Hans M. Mulder","doi":"10.1145/70082.68196","DOIUrl":"https://doi.org/10.1145/70082.68196","url":null,"abstract":"Data-dependency, branch, and memory-access penalties are main constraints on the performance of high-speed microprocessors. The memory-access penalties concern both penalties imposed by external memory (e.g. cache) or by under utilization of the local processor memory (e.g. registers). This paper focuses solely on methods of increasing the utilization of data memory, local to the processor (registers or register-oriented buffers).\u0000A utilization increase of local processor memory is possible by means of compile-time software, run-time hardware, or a combination of both. This paper looks at data buffers which perform solely because of the compile-time software (single register sets); those which operate mainly through hardware but with possible software assistance (multiple register sets); and those intended to operate transparently with main memory implying no software assistance whatsoever (stack buffers). This paper shows that hardware buffering schemes cannot replace compile-time effort, but at most can reduce the complexity of this effort. It shows the utility increase of applying register allocation for multiple register sets. The paper also shows a potential utility decrease inherent to stack buffers. The observation that a single register set, allocated by means of interprocedural allocation, performs competitively with both multiple register set and stack buffer emphasizes the significance of the conclusion","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124623636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Garth A. Gibson, L. Hellerstein, R. Karp, R. Katz, D. Patterson
The ever increasing need for I/O bandwidth will be met with ever larger arrays of disks. These arrays require redundancy to protect against data loss. This paper examines alternative choices for encodings, or codes, that reliably store information in disk arrays. Codes are selected to maximize mean time to data loss or minimize disks containing redundant data, but are all constrained to minimize performance penalties associated with updating information or recovering from catastrophic disk failures. We also codes that give highly reliable data storage with low redundant data overhead for arrays of 1000 information disks.
{"title":"Failure correction techniques for large disk arrays","authors":"Garth A. Gibson, L. Hellerstein, R. Karp, R. Katz, D. Patterson","doi":"10.1145/70082.68194","DOIUrl":"https://doi.org/10.1145/70082.68194","url":null,"abstract":"The ever increasing need for I/O bandwidth will be met with ever larger arrays of disks. These arrays require redundancy to protect against data loss. This paper examines alternative choices for encodings, or codes, that reliably store information in disk arrays. Codes are selected to maximize mean time to data loss or minimize disks containing redundant data, but are all constrained to minimize performance penalties associated with updating information or recovering from catastrophic disk failures. We also codes that give highly reliable data storage with low redundant data overhead for arrays of 1000 information disks.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"32 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131291498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a unified approach to vector and scalar computation, using a single register file for both scalar operands and vector elements. The goal of this architecture is to yield improved scalar performance while broadening the range of vectorizable applications. For example, reduction operations and recurrences can be expressed in vector form in this architecture. This approach results in greater overall performance for most applications than does the approach of emphasizing peak vector performance. The hardware required to support the enhanced vector capability is insignificant, but allows the execution of two operations per cycle for vectorized code. Moreover, the size of the unified vector/scalar register file required for peak performance is an order of magnitude smaller than traditional vector register files, allowing efficient on-chip VLSI implementation. The results of simulations of the Livermore Loops and Linpack using this architecture are presented.
{"title":"A unified vector/scalar floating-point architecture","authors":"N. Jouppi, J. Bertoni, D. W. Wall","doi":"10.1145/70082.68195","DOIUrl":"https://doi.org/10.1145/70082.68195","url":null,"abstract":"In this paper we present a unified approach to vector and scalar computation, using a single register file for both scalar operands and vector elements. The goal of this architecture is to yield improved scalar performance while broadening the range of vectorizable applications. For example, reduction operations and recurrences can be expressed in vector form in this architecture. This approach results in greater overall performance for most applications than does the approach of emphasizing peak vector performance. The hardware required to support the enhanced vector capability is insignificant, but allows the execution of two operations per cycle for vectorized code. Moreover, the size of the unified vector/scalar register file required for peak performance is an order of magnitude smaller than traditional vector register files, allowing efficient on-chip VLSI implementation. The results of simulations of the Livermore Loops and Linpack using this architecture are presented.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123543729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David L. Black, R. Rashid, D. Golub, C. R. Hill, R. Baron
We discuss the translation lookaside buffer (TLB) consistency problem for multiprocessors, and introduce the Mach shootdown algorithm for maintaining TLB consistency in software. This algorithm has been implemented on several multiprocessors, and is in regular production use. Performance evaluations establish the basic costs of the algorithm and show that it has minimal impact on application performance. As a result, TLB consistency does not pose an insurmountable obstacle to multiprocessors with several hundred processors. We also discuss hardware support options for TLB consistency ranging from a minor interrupt structure modification to complete hardware implementations. Features are identified in current hardware that compound the TLB consistency problem; removal or correction of these features can simplify and/or reduce the overhead of maintaining TLB consistency in software.
{"title":"Translation lookaside buffer consistency: a software approach","authors":"David L. Black, R. Rashid, D. Golub, C. R. Hill, R. Baron","doi":"10.1145/70082.68193","DOIUrl":"https://doi.org/10.1145/70082.68193","url":null,"abstract":"We discuss the translation lookaside buffer (TLB) consistency problem for multiprocessors, and introduce the Mach shootdown algorithm for maintaining TLB consistency in software. This algorithm has been implemented on several multiprocessors, and is in regular production use. Performance evaluations establish the basic costs of the algorithm and show that it has minimal impact on application performance. As a result, TLB consistency does not pose an insurmountable obstacle to multiprocessors with several hundred processors. We also discuss hardware support options for TLB consistency ranging from a minor interrupt structure modification to complete hardware implementations. Features are identified in current hardware that compound the TLB consistency problem; removal or correction of these features can simplify and/or reduce the overhead of maintaining TLB consistency in software.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132004876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}