Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580901
Hiroaki Fujii, Y. Yasuda, Hideya Akashi, Y. Inagami, Makoto Koga, Osamu Ishihara, M. Kashiyama, Hideo Wada, Tsutomu Sumimoto
RISC-based Massively Parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication performance. Hitachi's SR2201, an MPP scalable up to 2048 processors and 600 GFLOPS peak performance, overcomes these problems by introducing three novel features. First, its processor the 150 MHz HARP-IE, solves the cache miss penalty by "pseudo vector processing" (PVP). In PVP, data is loaded by prefetching to a special register bank, bypassing the cache. Second, a multi-bank memory architecture that operates like a pipeline eliminates the memory system bottleneck. Third, the inter-processor communication achieves high performance on the three-dimensional crossbar network, using a "remote DMA transfer" protocol and a hardware-based cache coherency. As the result of these improvements, the SR2201 achieved 220.4 GFLOPS with 1024 processors in the LINPACK benchmark, which is almost 72% of the peak performance.
{"title":"Architecture and performance of the Hitachi SR2201 massively parallel processor system","authors":"Hiroaki Fujii, Y. Yasuda, Hideya Akashi, Y. Inagami, Makoto Koga, Osamu Ishihara, M. Kashiyama, Hideo Wada, Tsutomu Sumimoto","doi":"10.1109/IPPS.1997.580901","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580901","url":null,"abstract":"RISC-based Massively Parallel Processors (MPPs) often show low efficiency in real-world applications because of cache miss penalty, insufficient throughput of the memory system, and poor inter-processor communication performance. Hitachi's SR2201, an MPP scalable up to 2048 processors and 600 GFLOPS peak performance, overcomes these problems by introducing three novel features. First, its processor the 150 MHz HARP-IE, solves the cache miss penalty by \"pseudo vector processing\" (PVP). In PVP, data is loaded by prefetching to a special register bank, bypassing the cache. Second, a multi-bank memory architecture that operates like a pipeline eliminates the memory system bottleneck. Third, the inter-processor communication achieves high performance on the three-dimensional crossbar network, using a \"remote DMA transfer\" protocol and a hardware-based cache coherency. As the result of these improvements, the SR2201 achieved 220.4 GFLOPS with 1024 processors in the LINPACK benchmark, which is almost 72% of the peak performance.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124767918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580931
C. Dubnicki, A. Bilas, Kai Li
Describes the design and implementation of the Virtual Memory-Mapped Communication (VMMC) model on a Myrinet network of PCI-based PCs. VMMC has been designed and implemented for the SHRIMP multicomputer, where it delivers user-to-user latency and bandwidth close to the limits imposed by the underlying hardware. The goal of this work is: to provide an implementation of VMMC on a commercially available hardware platform; to determine whether the benefits of VMMC can be realized on the new hardware; and to investigate network interface design tradeoffs by comparing SHRIMP with Myrinet and its respective VMMC implementation. Our Myrinet implementation of VMMC achieves 9.8 /spl mu/s one-way latency and provides 108.4 MByte/s user-to-user bandwidth. Compared to SHRIMP, the Myrinet implementation of VMMC incurs relatively higher overhead and demands more network interface resources (LANai processor, on-board SRAM) but requires less operating system support.
{"title":"Design and implementation of virtual memory-mapped communication on Myrinet","authors":"C. Dubnicki, A. Bilas, Kai Li","doi":"10.1109/IPPS.1997.580931","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580931","url":null,"abstract":"Describes the design and implementation of the Virtual Memory-Mapped Communication (VMMC) model on a Myrinet network of PCI-based PCs. VMMC has been designed and implemented for the SHRIMP multicomputer, where it delivers user-to-user latency and bandwidth close to the limits imposed by the underlying hardware. The goal of this work is: to provide an implementation of VMMC on a commercially available hardware platform; to determine whether the benefits of VMMC can be realized on the new hardware; and to investigate network interface design tradeoffs by comparing SHRIMP with Myrinet and its respective VMMC implementation. Our Myrinet implementation of VMMC achieves 9.8 /spl mu/s one-way latency and provides 108.4 MByte/s user-to-user bandwidth. Compared to SHRIMP, the Myrinet implementation of VMMC incurs relatively higher overhead and demands more network interface resources (LANai processor, on-board SRAM) but requires less operating system support.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125179798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580945
P. Keleher, C. Tseng
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: (1) combine synchronization and parallelism information communication on parallel task invocation, (2) employ customized routines for evaluating reduction operations, and (3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good eight processor speedups on an IBM SP-2 and DEC Alpha cluster usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Flushing updates, in particular, eliminates almost all nonlocal memory misses and improves performance by 13% on average.
{"title":"Enhancing software DSM for compiler-parallelized applications","authors":"P. Keleher, C. Tseng","doi":"10.1109/IPPS.1997.580945","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580945","url":null,"abstract":"Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful shared-memory parallelizing compilers with software distributed shared-memory (DSM) systems. We demonstrate such a system by combining the SUIF parallelizing compiler and the CVM software DSM. Innovations of the system include compiler-directed techniques that: (1) combine synchronization and parallelism information communication on parallel task invocation, (2) employ customized routines for evaluating reduction operations, and (3) select a hybrid update protocol that pre-sends data by flushing updates at barriers. For applications with sufficient granularity of parallelism, these optimizations yield very good eight processor speedups on an IBM SP-2 and DEC Alpha cluster usually matching or exceeding the speedup of equivalent HPF and message-passing versions of each program. Flushing updates, in particular, eliminates almost all nonlocal memory misses and improves performance by 13% on average.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125447251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580936
Adrian Brüngger, Ambros Marzetta, J. Clausen, Michael Perregaard
Program libraries are one way to make the cooperation between specialists from various fields successful: the separation of application-specific knowledge from application independent tasks ensures portability, maintenance, extensibility, and flexibility. This paper demonstrates the success in combining problem-specific knowledge for the quadratic assignment problem (QAP) with the raw computing power offered by contemporary parallel hardware by using the library of parallel search algorithms ZRAM. The solutions of 10 previously unsolved large standard test-instances of the QAP are presented.
{"title":"Joining forces in solving large-scale quadratic assignment problems in parallel","authors":"Adrian Brüngger, Ambros Marzetta, J. Clausen, Michael Perregaard","doi":"10.1109/IPPS.1997.580936","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580936","url":null,"abstract":"Program libraries are one way to make the cooperation between specialists from various fields successful: the separation of application-specific knowledge from application independent tasks ensures portability, maintenance, extensibility, and flexibility. This paper demonstrates the success in combining problem-specific knowledge for the quadratic assignment problem (QAP) with the raw computing power offered by contemporary parallel hardware by using the library of parallel search algorithms ZRAM. The solutions of 10 previously unsolved large standard test-instances of the QAP are presented.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130379657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580935
Gadi Haber, Y. Ben-Asher
A new type of recurrence equations called "indexed recurrences" (IR) is defined in which the common notion of X[i]=op(X[i],X[i-1]) i=1...n is generalized to X[g(i)]=op(X[f(i)],X[h(i)]) f,g,h:{1...n}/spl rarr/{1...m}. This enables us to model sequential loops of the form for i=1 to n do begin X[g(i)]:=op(X[f(i)],X[h(i)];) as IR equations. Thus, a parallel algorithm that solves a set of IR equations is in fact a way to transform sequential loops into parallel ones. Note that the circuit evaluation problem (CVP) can also be expressed as a set of IR equations. Therefore an efficient parallel solution to the general IR problem is not likely to be found. As such solution would also solve the CVP, showing that P/spl sube/NC. In this paper we introduce parallel algorithms for two variants of the IR equations problem: An O(log n) greedy algorithm for solving IR equations where g(i) is distinct and h(i)=g(i) using O(n) processors. An O(log/sup 2/ n) algorithm with no restriction on f, g or h, using up to O(n/sup 2/) processors. However we show that for general IR, op must be commutative so that a parallel computation can be used.
{"title":"Parallel solutions of indexed recurrence equations","authors":"Gadi Haber, Y. Ben-Asher","doi":"10.1109/IPPS.1997.580935","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580935","url":null,"abstract":"A new type of recurrence equations called \"indexed recurrences\" (IR) is defined in which the common notion of X[i]=op(X[i],X[i-1]) i=1...n is generalized to X[g(i)]=op(X[f(i)],X[h(i)]) f,g,h:{1...n}/spl rarr/{1...m}. This enables us to model sequential loops of the form for i=1 to n do begin X[g(i)]:=op(X[f(i)],X[h(i)];) as IR equations. Thus, a parallel algorithm that solves a set of IR equations is in fact a way to transform sequential loops into parallel ones. Note that the circuit evaluation problem (CVP) can also be expressed as a set of IR equations. Therefore an efficient parallel solution to the general IR problem is not likely to be found. As such solution would also solve the CVP, showing that P/spl sube/NC. In this paper we introduce parallel algorithms for two variants of the IR equations problem: An O(log n) greedy algorithm for solving IR equations where g(i) is distinct and h(i)=g(i) using O(n) processors. An O(log/sup 2/ n) algorithm with no restriction on f, g or h, using up to O(n/sup 2/) processors. However we show that for general IR, op must be commutative so that a parallel computation can be used.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129684663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580899
S. Nemawarkar, G. Gao
Multithreaded multiprocessor systems (MMS) have been proposed to tolerate long latencies for communication. This paper provides an analytical framework based on closed queueing networks to quantify and analyze the latency tolerance of multithreaded systems. We introduce a new metric, called the tolerance index, which quantifies the closeness of performance of the system to that of an ideal system. We characterize the latency tolerance with the changes in the architectural and program workload parameters. We show how an analysis of the latency tolerance provides an insight to the performance optimizations of fine grain parallel program workloads.
{"title":"Latency tolerance: a metric for performance analysis of multithreaded architectures","authors":"S. Nemawarkar, G. Gao","doi":"10.1109/IPPS.1997.580899","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580899","url":null,"abstract":"Multithreaded multiprocessor systems (MMS) have been proposed to tolerate long latencies for communication. This paper provides an analytical framework based on closed queueing networks to quantify and analyze the latency tolerance of multithreaded systems. We introduce a new metric, called the tolerance index, which quantifies the closeness of performance of the system to that of an ideal system. We characterize the latency tolerance with the changes in the architectural and program workload parameters. We show how an analysis of the latency tolerance provides an insight to the performance optimizations of fine grain parallel program workloads.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122619360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580903
Mustafa Uysal, A. Acharya, R. Bennett, J. Saltz
We present a customizable simulator called netsim for high performance point to point workstation networks that is accurate enough to be used for application level performance analysis, yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16 node IBM SP-2 with a multistage network and a 10 node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and less than 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm.
{"title":"A customizable simulator for workstation networks","authors":"Mustafa Uysal, A. Acharya, R. Bennett, J. Saltz","doi":"10.1109/IPPS.1997.580903","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580903","url":null,"abstract":"We present a customizable simulator called netsim for high performance point to point workstation networks that is accurate enough to be used for application level performance analysis, yet is easy enough to customize for multiple architectures and software configurations. Customization is accomplished without using any proprietary information, using only publicly available hardware specifications and information that can be readily determined using a suite of test programs. We customized netsim for two platforms: a 16 node IBM SP-2 with a multistage network and a 10 node DEC Alpha Farm with an ATM switch. We show that netsim successfully models these two architectures with a 2-6% error on the SP-2 and less than 10% error on the Alpha Farm for most test cases. It achieves this accuracy at the cost of a 7-36 fold simulation slowdown with respect to the SP-2 and a 3-8 fold slowdown with respect to the Alpha Farm.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115956282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580942
P. Lu
We introduce the all-software, standard C++-based Aurora distributed shared data system. As with related systems, it provides a shared data abstraction on distributed memory hardware. An innovation in Aurora is the use of scoped behaviour for per-context data sharing optimizations (i.e., portion of source code, such as a loop or phase). With scoped behaviour a new language scope (e.g., nested braces) can be used to optimize the data sharing behaviour of the selected source code. Different scopes and different shared data can be optimized in different ways. Thus, scoped behaviour provides a novel level of flexibility to incrementally tune the parallel performance of an application.
{"title":"Aurora: scoped behaviour for per-context optimized distributed data sharing","authors":"P. Lu","doi":"10.1109/IPPS.1997.580942","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580942","url":null,"abstract":"We introduce the all-software, standard C++-based Aurora distributed shared data system. As with related systems, it provides a shared data abstraction on distributed memory hardware. An innovation in Aurora is the use of scoped behaviour for per-context data sharing optimizations (i.e., portion of source code, such as a loop or phase). With scoped behaviour a new language scope (e.g., nested braces) can be used to optimize the data sharing behaviour of the selected source code. Different scopes and different shared data can be optimized in different ways. Thus, scoped behaviour provides a novel level of flexibility to incrementally tune the parallel performance of an application.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130914417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580916
J. Choi
The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.
{"title":"A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers","authors":"J. Choi","doi":"10.1109/IPPS.1997.580916","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580916","url":null,"abstract":"The author presents a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA (distribution-independent matrix multiplication algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor when the block size is too small as well as too large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114571268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580852
Sugath Warnakulasuriya, T. Pinkston
Deadlock-free routing algorithms have been developed recently without fully understanding the frequency and characteristics of deadlocks. Using a simulator capable of true deadlock detection, we measure a network's susceptibility to deadlock due to various design parameters. The effects of bidirectionality, routing adaptivity, virtual channels, buffer size and node degree on deadlock formation are studied. In the process, we provide insight into the frequency and characteristics of deadlocks and the relationship between routing flexibility blocked messages, resource dependencies and the degree of correlation needed to form deadlock.
{"title":"Characterization of deadlocks in interconnection networks","authors":"Sugath Warnakulasuriya, T. Pinkston","doi":"10.1109/IPPS.1997.580852","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580852","url":null,"abstract":"Deadlock-free routing algorithms have been developed recently without fully understanding the frequency and characteristics of deadlocks. Using a simulator capable of true deadlock detection, we measure a network's susceptibility to deadlock due to various design parameters. The effects of bidirectionality, routing adaptivity, virtual channels, buffer size and node degree on deadlock formation are studied. In the process, we provide insight into the frequency and characteristics of deadlocks and the relationship between routing flexibility blocked messages, resource dependencies and the degree of correlation needed to form deadlock.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114932383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}