Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386525
D. Lee
Interference among multiple vector streams that access memory concurrently is the major source of performance degradation in main memory of pipelined vector processors. While totally eliminating interference appears to be impossible, little is known on how to design a memory system that can reduce it. In this paper, we introduce a concept called memory access reordering for reducing interference. This technique reduces interference by means of making the multiple vector streams access memory in an orderly fashion. Effective algorithms for memory access reordering are presented and their efficient hardware implementations are described.<>
{"title":"Memory access reordering in vector processors","authors":"D. Lee","doi":"10.1109/HPCA.1995.386525","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386525","url":null,"abstract":"Interference among multiple vector streams that access memory concurrently is the major source of performance degradation in main memory of pipelined vector processors. While totally eliminating interference appears to be impossible, little is known on how to design a memory system that can reduce it. In this paper, we introduce a concept called memory access reordering for reducing interference. This technique reduces interference by means of making the multiple vector streams access memory in an orderly fashion. Effective algorithms for memory access reordering are presented and their efficient hardware implementations are described.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114838761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386535
Ashley Saulsbury, T. Wilkinson, J. Carter, A. Landin
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity-similarly to distributed virtual shared memory (DVSM) systems, leaving simpler hardware to maintain shared memory coherence at a cache line granularity. By reducing the hardware complexity, the machine cost and development time are reduced. We call the resulting hybrid hardware and software multiprocessor architecture Simple COMA. Preliminary results indicate that the performance of Simple COMA is comparable to that of more complex contemporary all hardware designs.<>
{"title":"An argument for simple COMA","authors":"Ashley Saulsbury, T. Wilkinson, J. Carter, A. Landin","doi":"10.1109/HPCA.1995.386535","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386535","url":null,"abstract":"We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity-similarly to distributed virtual shared memory (DVSM) systems, leaving simpler hardware to maintain shared memory coherence at a cache line granularity. By reducing the hardware complexity, the machine cost and development time are reduced. We call the resulting hybrid hardware and software multiprocessor architecture Simple COMA. Preliminary results indicate that the performance of Simple COMA is comparable to that of more complex contemporary all hardware designs.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386545
Younes M. Boura, C. Das
An analytical model for virtual channel flow control in n-dimensional hypercubes using the e-cube routing algorithm is developed. The model is based on determining the values of the different components that make up the average message latency. These components include the message transfer time, the blocking delay at each dimension, the multiplexing delay at each dimension, and the waiting delay at the source node. The first two components are determined using a probabilistic analysis. The average degree of multiplexing is determined using a Markov model, and the waiting delay at the source node is determined using an M/M/m queueing system. The model is fairly accurate in predicting the average message latency for different message sizes and a varying number of virtual channels per physical channel. It is demonstrated that wormhole switching along with virtual channel flow control make the average message latency insensitive to the network size when the network is relatively lightly loaded (message arrival rate is equal to 40% of channel capacity), and that the average message latency increases linearly with the average message size. The simplicity and accuracy of the analytical model make it an attractive and effective tool for predicting the behavior of n-dimensional hypercubes.<>
{"title":"Modeling virtual channel flow control in hypercubes","authors":"Younes M. Boura, C. Das","doi":"10.1109/HPCA.1995.386545","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386545","url":null,"abstract":"An analytical model for virtual channel flow control in n-dimensional hypercubes using the e-cube routing algorithm is developed. The model is based on determining the values of the different components that make up the average message latency. These components include the message transfer time, the blocking delay at each dimension, the multiplexing delay at each dimension, and the waiting delay at the source node. The first two components are determined using a probabilistic analysis. The average degree of multiplexing is determined using a Markov model, and the waiting delay at the source node is determined using an M/M/m queueing system. The model is fairly accurate in predicting the average message latency for different message sizes and a varying number of virtual channels per physical channel. It is demonstrated that wormhole switching along with virtual channel flow control make the average message latency insensitive to the network size when the network is relatively lightly loaded (message arrival rate is equal to 40% of channel capacity), and that the average message latency increases linearly with the average message size. The simplicity and accuracy of the analytical model make it an attractive and effective tool for predicting the behavior of n-dimensional hypercubes.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128069578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386528
V. Garg, D. Schimmel
This paper considers hardware support for the exploitation of control parallelism on data parallel architectures. It is well known that data parallel algorithms may also possess control parallel structure. However the splitting of control leads to data dependency and synchronization issues that were implicitly handled in conventional SIMD architectures. These include synchronization of access to scalar and parallel variables, and synchronization for parallel communication operations. We propose a sharing mechanism for scalar variables and identify a strategy which allows synchronization of scalar variables between multiple streams. The techniques considered are based on a bit-interleaved register file structure which allows fast copy between register sets. Hardware cost estimates and timing analyses are provided, and comparison with an alternate scheme is presented. The register file structure has been designed and simulated for the HP 0.8 /spl mu/m CMOS process, and circuit simulation indicates that access times are less than six nanoseconds. In addition, the impact of this structure on system performance is also studied.<>
{"title":"Architectural support for inter-stream communication in a MSIMD system","authors":"V. Garg, D. Schimmel","doi":"10.1109/HPCA.1995.386528","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386528","url":null,"abstract":"This paper considers hardware support for the exploitation of control parallelism on data parallel architectures. It is well known that data parallel algorithms may also possess control parallel structure. However the splitting of control leads to data dependency and synchronization issues that were implicitly handled in conventional SIMD architectures. These include synchronization of access to scalar and parallel variables, and synchronization for parallel communication operations. We propose a sharing mechanism for scalar variables and identify a strategy which allows synchronization of scalar variables between multiple streams. The techniques considered are based on a bit-interleaved register file structure which allows fast copy between register sets. Hardware cost estimates and timing analyses are provided, and comparison with an alternate scheme is presented. The register file structure has been designed and simulated for the HP 0.8 /spl mu/m CMOS process, and circuit simulation indicates that access times are less than six nanoseconds. In addition, the impact of this structure on system performance is also studied.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126898641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386556
F. Cappello, C. Germain
This paper discusses a new principle of interconnection network for massively parallel architectures in the field of numerical computation. The principle is motivated by an analysis of the application features and the need to design new kind of communication networks combining very high bandwidth, very low latency, performance independence to communication pattern or network load and a performance improvement proportional to the hardware performance improvement. Our approach is to associate compiled communications and a circuit switched interconnection network. This paper presents the motivations for this principle, the hardware and software issues and the design of a first prototype. The expected performance are a sustained aggregate bandwidth of more than 500 GBytes/s and an overall latency less than 270 ns, for a large implementation (4K inputs) with the current available technology.<>
{"title":"Toward high communication performance through compiled communications on a circuit switched interconnection network","authors":"F. Cappello, C. Germain","doi":"10.1109/HPCA.1995.386556","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386556","url":null,"abstract":"This paper discusses a new principle of interconnection network for massively parallel architectures in the field of numerical computation. The principle is motivated by an analysis of the application features and the need to design new kind of communication networks combining very high bandwidth, very low latency, performance independence to communication pattern or network load and a performance improvement proportional to the hardware performance improvement. Our approach is to associate compiled communications and a circuit switched interconnection network. This paper presents the motivations for this principle, the hardware and software issues and the design of a first prototype. The expected performance are a sustained aggregate bandwidth of more than 500 GBytes/s and an overall latency less than 270 ns, for a large implementation (4K inputs) with the current available technology.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133077849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386541
S. Fiske, W. Dally
Multiple-context processors provide register resources that allow rapid context switching between several threads as a means of tolerating long communication and synchronization latencies. When scheduling threads on such a processor, we must first decide which threads should have their state loaded into the multiple contexts, and second, which loaded thread is to execute instructions at any given time. In this paper we show that both decisions are important, and that incorrect choices can lead to serious performance degradation. We propose thread prioritization as a means of guiding both levels of scheduling. Each thread has a priority that can change dynamically, and that the scheduler uses to allocate as many computation resources as possible to critical threads. We briefly describe its implementation, and we show simulation performance results for a number of simple benchmarks in which synchronization performance is critical.<>
{"title":"Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors","authors":"S. Fiske, W. Dally","doi":"10.1109/HPCA.1995.386541","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386541","url":null,"abstract":"Multiple-context processors provide register resources that allow rapid context switching between several threads as a means of tolerating long communication and synchronization latencies. When scheduling threads on such a processor, we must first decide which threads should have their state loaded into the multiple contexts, and second, which loaded thread is to execute instructions at any given time. In this paper we show that both decisions are important, and that incorrect choices can lead to serious performance degradation. We propose thread prioritization as a means of guiding both levels of scheduling. Each thread has a priority that can change dynamically, and that the scheduler uses to allocate as many computation resources as possible to critical threads. We briefly describe its implementation, and we show simulation performance results for a number of simple benchmarks in which synchronization performance is critical.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386537
S. Mckee, W. Wulf
As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance, factor for many applications. Several approaches to bridging this performance gap have been suggested. This paper examines one approach, access ordering, and pushes its limits to determine bounds on memory performance. We present several access-ordering schemes, and compare their performance, developing analytic models and partially validating these with benchmark timings on the Intel i860XR.<>
{"title":"Access ordering and memory-conscious cache utilization","authors":"S. Mckee, W. Wulf","doi":"10.1109/HPCA.1995.386537","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386537","url":null,"abstract":"As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance, factor for many applications. Several approaches to bridging this performance gap have been suggested. This paper examines one approach, access ordering, and pushes its limits to determine bounds on memory performance. We present several access-ordering schemes, and compare their performance, developing analytic models and partially validating these with benchmark timings on the Intel i860XR.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114696525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386527
J. Torrellas, Chun Xia, Russell L. Daigle
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference within popular execution paths dominates instruction cache misses. Based on our observations, we propose an algorithm to expose these localities and reduce interference. For a range of cache sizes, associativities, lines sizes, and other organizations we show that we reduce total instruction miss rates by 31-86% (up to 2.9 absolute points). Using a simple model this corresponds to execution time reductions in the order of 12-26%. In addition, our optimized operating system combines well with optimized or unoptimized applications.<>
{"title":"Optimizing instruction cache performance for operating system intensive workloads","authors":"J. Torrellas, Chun Xia, Russell L. Daigle","doi":"10.1109/HPCA.1995.386527","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386527","url":null,"abstract":"High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference within popular execution paths dominates instruction cache misses. Based on our observations, we propose an algorithm to expose these localities and reduce interference. For a range of cache sizes, associativities, lines sizes, and other organizations we show that we reduce total instruction miss rates by 31-86% (up to 2.9 absolute points). Using a simple model this corresponds to execution time reductions in the order of 12-26%. In addition, our optimized operating system combines well with optimized or unoptimized applications.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129612994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386540
Maged M. Michael, M. Scott
In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and /spl Phi/, compare and swap, load linked, and store conditional on large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bets-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alternative hardware implementations of these primitives, and then analyze the performance of these implementations for various data sharing patterns. Our results indicate that good overall performance can be obtained by implementing compare and swap in the cache controllers, and by providing an additional instruction to load an exclusive copy of a cache line.<>
{"title":"Implementation of atomic primitives on distributed shared memory multiprocessors","authors":"Maged M. Michael, M. Scott","doi":"10.1109/HPCA.1995.386540","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386540","url":null,"abstract":"In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and /spl Phi/, compare and swap, load linked, and store conditional on large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bets-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alternative hardware implementations of these primitives, and then analyze the performance of these implementations for various data sharing patterns. Our results indicate that good overall performance can be obtained by implementing compare and swap in the cache controllers, and by providing an additional instruction to load an exclusive copy of a cache line.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129679700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-01-22DOI: 10.1109/HPCA.1995.386548
André Seznec
For many microprocessors, cache hit time determines the clock cycle. On the other hand, cache miss penalty(measured in instruction issue delays) becomes higher and higher. Conciliating low cache miss ratio with low cache hit time is an important issue. When caches are virtually indexed, the operating system (or some specific hardware) has to manage data consistency of caches and memory. Unfortunately, conciliating physical indexing of the cache and low cache hit time is very difficult. In this paper, we propose the Direct-mapped Access Set-associative Check cache (DASC) for addressing both difficulties. On a DASC cache, the cache array is direct-mapped, so the cache hit time is low. However the tag array is set-associative and the external miss ratio on a DASC cache is the same as the miss ratio on a set-associative cache. When the size of an associativity degree of the tag array is tied to the minimum page size, a virtually indexed but physically tagged DASC cache correctly handles all difficulties associated with cache consistency. Trace driven simulations show that, for cache sizes in the range of 16 to 64 Kbytes and for page sizes in the range 4 to 8 Kbytes, a DASC cache is a valuable trade-off allowing fast cache hit time and low cache miss ratio while cache consistency management is performed by hardware.<>
{"title":"DASC cache","authors":"André Seznec","doi":"10.1109/HPCA.1995.386548","DOIUrl":"https://doi.org/10.1109/HPCA.1995.386548","url":null,"abstract":"For many microprocessors, cache hit time determines the clock cycle. On the other hand, cache miss penalty(measured in instruction issue delays) becomes higher and higher. Conciliating low cache miss ratio with low cache hit time is an important issue. When caches are virtually indexed, the operating system (or some specific hardware) has to manage data consistency of caches and memory. Unfortunately, conciliating physical indexing of the cache and low cache hit time is very difficult. In this paper, we propose the Direct-mapped Access Set-associative Check cache (DASC) for addressing both difficulties. On a DASC cache, the cache array is direct-mapped, so the cache hit time is low. However the tag array is set-associative and the external miss ratio on a DASC cache is the same as the miss ratio on a set-associative cache. When the size of an associativity degree of the tag array is tied to the minimum page size, a virtually indexed but physically tagged DASC cache correctly handles all difficulties associated with cache consistency. Trace driven simulations show that, for cache sizes in the range of 16 to 64 Kbytes and for page sizes in the range 4 to 8 Kbytes, a DASC cache is a valuable trade-off allowing fast cache hit time and low cache miss ratio while cache consistency management is performed by hardware.<<ETX>>","PeriodicalId":330315,"journal":{"name":"Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125222499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}