Andreas Becher, Jorge Echavarria, Daniel Ziener, S. Wildermann, J. Teich
In this paper, we propose a novel approximate adder structure for LUT-based FPGA technology. Compared with a full featured accurate carry-ripple adder, the longest path is significantly shortened which enables the clocking with an increased clock frequency. By using the proposed adder structure, the throughput of an FPGA-based implementation can be significantly increased. On the other hand, the resulting average error can be reduced compared to similar approaches for ASIC implementations.
{"title":"A LUT-Based Approximate Adder","authors":"Andreas Becher, Jorge Echavarria, Daniel Ziener, S. Wildermann, J. Teich","doi":"10.1109/FCCM.2016.16","DOIUrl":"https://doi.org/10.1109/FCCM.2016.16","url":null,"abstract":"In this paper, we propose a novel approximate adder structure for LUT-based FPGA technology. Compared with a full featured accurate carry-ripple adder, the longest path is significantly shortened which enables the clocking with an increased clock frequency. By using the proposed adder structure, the throughput of an FPGA-based implementation can be significantly increased. On the other hand, the resulting average error can be reduced compared to similar approaches for ASIC implementations.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123039851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FPGA-based embedded soft vector processors can exceed the performance and energy-efficiency of embedded GPUs and DSPs for lightweight deep learning applications. For low complexity deep neural networks targeting resource constrained platforms, we develop optimized Caffe-compatible deep learning library routines that target a range of embedded accelerator-based systems between 4 -- 8 W power budgets such as the Xilinx Zedboard (with MXP soft vector processor), NVIDIA Jetson TK1 (GPU), InForce 6410 (DSP), TI EVM5432 (DSP) as well as the Adapteva Parallella board (custom multi-core with NoC). For MNIST (28×28 images) and CIFAR10 (32×32 images), the deep layer structure is amenable to MXP-enhanced FPGA mappings to deliver 1.4 -- 5× higher energy efficiency than all other platforms. Not surprisingly, embedded GPU works better for complex networks with large image resolutions.
{"title":"Evaluating Embedded FPGA Accelerators for Deep Learning Applications","authors":"Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Vamsi Buddha, Nachiket Kapre","doi":"10.1109/FCCM.2016.14","DOIUrl":"https://doi.org/10.1109/FCCM.2016.14","url":null,"abstract":"FPGA-based embedded soft vector processors can exceed the performance and energy-efficiency of embedded GPUs and DSPs for lightweight deep learning applications. For low complexity deep neural networks targeting resource constrained platforms, we develop optimized Caffe-compatible deep learning library routines that target a range of embedded accelerator-based systems between 4 -- 8 W power budgets such as the Xilinx Zedboard (with MXP soft vector processor), NVIDIA Jetson TK1 (GPU), InForce 6410 (DSP), TI EVM5432 (DSP) as well as the Adapteva Parallella board (custom multi-core with NoC). For MNIST (28×28 images) and CIFAR10 (32×32 images), the deep layer structure is amenable to MXP-enhanced FPGA mappings to deliver 1.4 -- 5× higher energy efficiency than all other platforms. Not surprisingly, embedded GPU works better for complex networks with large image resolutions.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125640498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As location sensing devices are becoming ubiquitous, overwhelming amounts of data are being produced by the Internet-of-Things-That-Move. Though analyzing this data presents significant business opportunities, new techniques are needed to attain adequate levels of processing performance. One example is the recently introduced geohash geographical coordinate system that is mainly used for indexing. While geohash codes provide useful inherent properties such as hierarchical and variable-precision coding, traditional spatial algorithms operate on data represented using the conventional latitude/longitude geographical coordinate system, and as such do not take advantage of geohash coding. This paper tackles the evaluation of spatial predicates on geometries defined in the geohash domain, as an alternative to the standard Dimensionally Extended Nine-Intersection Model (DE-9IM). We present the first hardware architecture to efficiently evaluate "contain" and "touch" (internal, external, corner) relations between streams of pairs of geohash codes, in a high throughput (no stall) fashion. Employing FPGAs for exploiting the bit-level granularity of geohash codes, experimental results show (end-to-end) speedup of more than 20× and 90× over highly optimized single-threaded DE-9IM implementations of the contain and touch predicates, respectively. Furthermore, the PCIe-bound FPGA-based solution outperforms a geohash-based multithreaded CPU implementation by ≈1.8× (touch predicate) while using minimal FPGA resources.
{"title":"Spatial Predicates Evaluation in the Geohash Domain Using Reconfigurable Hardware","authors":"Dajung Lee, R. Moussalli, S. Asaad, M. Srivatsa","doi":"10.1109/FCCM.2016.51","DOIUrl":"https://doi.org/10.1109/FCCM.2016.51","url":null,"abstract":"As location sensing devices are becoming ubiquitous, overwhelming amounts of data are being produced by the Internet-of-Things-That-Move. Though analyzing this data presents significant business opportunities, new techniques are needed to attain adequate levels of processing performance. One example is the recently introduced geohash geographical coordinate system that is mainly used for indexing. While geohash codes provide useful inherent properties such as hierarchical and variable-precision coding, traditional spatial algorithms operate on data represented using the conventional latitude/longitude geographical coordinate system, and as such do not take advantage of geohash coding. This paper tackles the evaluation of spatial predicates on geometries defined in the geohash domain, as an alternative to the standard Dimensionally Extended Nine-Intersection Model (DE-9IM). We present the first hardware architecture to efficiently evaluate \"contain\" and \"touch\" (internal, external, corner) relations between streams of pairs of geohash codes, in a high throughput (no stall) fashion. Employing FPGAs for exploiting the bit-level granularity of geohash codes, experimental results show (end-to-end) speedup of more than 20× and 90× over highly optimized single-threaded DE-9IM implementations of the contain and touch predicates, respectively. Furthermore, the PCIe-bound FPGA-based solution outperforms a geohash-based multithreaded CPU implementation by ≈1.8× (touch predicate) while using minimal FPGA resources.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114099172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Processing of parallel data streams requires permutation units for many algorithms where the streams are not independent. Such algorithms include transforms, multi-rate signal processing, and Viterbi decoding. The absolute order of data elements from the permutation is not important, only that data elements are located correctly for the next processing step. This paper describes a method to find permutations that require a minimum amount of memory and latency. The required permutations are generated based on the data dependencies of a computation set. Additional constraints are imposed so that the parallel streaming architecture processes the data without flow control. Results show agreement with brute force methods, which become computationally infeasible for large permutation sets.
{"title":"Finding Space-Time Stream Permutations for Minimum Memory and Latency","authors":"Thaddeus Koehn, P. Athanas","doi":"10.1109/FCCM.2016.54","DOIUrl":"https://doi.org/10.1109/FCCM.2016.54","url":null,"abstract":"Processing of parallel data streams requires permutation units for many algorithms where the streams are not independent. Such algorithms include transforms, multi-rate signal processing, and Viterbi decoding. The absolute order of data elements from the permutation is not important, only that data elements are located correctly for the next processing step. This paper describes a method to find permutations that require a minimum amount of memory and latency. The required permutations are generated based on the data dependencies of a computation set. Additional constraints are imposed so that the parallel streaming architecture processes the data without flow control. Results show agreement with brute force methods, which become computationally infeasible for large permutation sets.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115796634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hans Giesen, Benjamin Gojman, Raphael Rubin, Ji Kim, A. DeHon
We show that continuously monitoring on-chip delays at the LUT-to-LUT link level during operation allows an FPGA to detect and self-adapt to aging and environmental effects on timing. Using a lightweight (<;4% added area) mechanism for monitoring transition timing, a Difference Detector with First-Fail Latch, we can estimate the timing margin on circuits and identify the individual links that have degraded and whose delay is determining the worst-case circuit delay. Combined with Choose-Your-own-Adventure precomputed, fine-grained repair alternatives, we introduce a strategy for rapid, in-system incremental repair of links with degraded timing. We show that these techniques allow us to respond to a single aging event in less than 300 ms for the toronto20 benchmarks. The result is a step toward systems where adaptive reconfiguration on the time-scale of seconds is viable and beneficial.
{"title":"Continuous Online Self-Monitoring Introspection Circuitry for Timing Repair by Incremental Partial-Reconfiguration (COSMIC TRIP)","authors":"Hans Giesen, Benjamin Gojman, Raphael Rubin, Ji Kim, A. DeHon","doi":"10.1145/3158229","DOIUrl":"https://doi.org/10.1145/3158229","url":null,"abstract":"We show that continuously monitoring on-chip delays at the LUT-to-LUT link level during operation allows an FPGA to detect and self-adapt to aging and environmental effects on timing. Using a lightweight (<;4% added area) mechanism for monitoring transition timing, a Difference Detector with First-Fail Latch, we can estimate the timing margin on circuits and identify the individual links that have degraded and whose delay is determining the worst-case circuit delay. Combined with Choose-Your-own-Adventure precomputed, fine-grained repair alternatives, we introduce a strategy for rapid, in-system incremental repair of links with degraded timing. We show that these techniques allow us to respond to a single aging event in less than 300 ms for the toronto20 benchmarks. The result is a step toward systems where adaptive reconfiguration on the time-scale of seconds is viable and beneficial.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116072188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peipei Zhou, Hyunseok Park, Zhenman Fang, J. Cong, A. DeHon
Customized pipeline designs that minimize the pipeline initiation interval (II) maximize the throughput of FPGA accelerators designed with high-level synthesis (HLS). What is the impact of minimizing II on energy efficiency? Using a matrix-multiply accelerator, we show that matrix multiplies with II>1 can sometimes reduce dynamic energy below II=1 due to interconnect savings, but II=1 always achieves energy close to the minimum. We also identify sources of inefficient mapping in the commercial tool flow.
{"title":"Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication","authors":"Peipei Zhou, Hyunseok Park, Zhenman Fang, J. Cong, A. DeHon","doi":"10.1109/FCCM.2016.50","DOIUrl":"https://doi.org/10.1109/FCCM.2016.50","url":null,"abstract":"Customized pipeline designs that minimize the pipeline initiation interval (II) maximize the throughput of FPGA accelerators designed with high-level synthesis (HLS). What is the impact of minimizing II on energy efficiency? Using a matrix-multiply accelerator, we show that matrix multiplies with II>1 can sometimes reduce dynamic energy below II=1 due to interconnect savings, but II=1 always achieves energy close to the minimum. We also identify sources of inefficient mapping in the commercial tool flow.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123185976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FPGA-based hardware emulation platform runs significantly faster than software simulation for verifying complex circuit designs. However, the controllability and observability of circuit internal signals mapped onto FPGAs are restricted due to the limited chip pins. Scan chain-based technique is effective in providing full-chip controllability and observability, at the cost of large area overhead, especially for FPGAs. Therefore, partial scan has been proposed as an alternative way to improve the controllability and observability while reducing the area cost. However, the optimized partial scan solution with the minimum number of scan flip-flops is not always found. This paper formulates the classical balanced structure partial scan procedure in one step as an integer linear programming problem, leading to the optimized partial scan solution. In addition, partially used logic resources in FPGAs are exploited to implement the extra logic required by the scan chain, to further reduce the area cost. Experimental results show that our partial scan approach can reduce the area overhead by 78.6% and 16.6% compared to the full scan and the existing partial scan approach.
{"title":"Cost Effective Partial Scan for Hardware Emulation","authors":"Tao Li, Qiang Liu","doi":"10.1109/FCCM.2016.39","DOIUrl":"https://doi.org/10.1109/FCCM.2016.39","url":null,"abstract":"FPGA-based hardware emulation platform runs significantly faster than software simulation for verifying complex circuit designs. However, the controllability and observability of circuit internal signals mapped onto FPGAs are restricted due to the limited chip pins. Scan chain-based technique is effective in providing full-chip controllability and observability, at the cost of large area overhead, especially for FPGAs. Therefore, partial scan has been proposed as an alternative way to improve the controllability and observability while reducing the area cost. However, the optimized partial scan solution with the minimum number of scan flip-flops is not always found. This paper formulates the classical balanced structure partial scan procedure in one step as an integer linear programming problem, leading to the optimized partial scan solution. In addition, partially used logic resources in FPGAs are exploited to implement the extra logic required by the scan chain, to further reduce the area cost. Experimental results show that our partial scan approach can reduce the area overhead by 78.6% and 16.6% compared to the full scan and the existing partial scan approach.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"7 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120853874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ernst Houtgast, V. Sima, G. Marchiori, K. Bertels, Z. Al-Ars
We propose a novel FPGA-accelerated BWA-MEM implementation, a popular tool for genomic data mapping. The performance and power-efficiency of the FPGA implementation on the single Xilinx Virtex-7 Alpha Data add-in card is compared against a software-only baseline system. By offloading the Seed Extension phase onto the FPGA, a two-fold speedup in overall application-level performance is achieved and a 1.6x gain in power-efficiency. To facilitate platform and tool-agnostic comparisons, the base pairs per Joule unit is introduced as a measure of power-efficiency. The FPGA design is able to map up to 34 thousand base pairs per Joule.
{"title":"Power-Efficient Accelerated Genomic Short Read Mapping on Heterogeneous Computing Platforms","authors":"Ernst Houtgast, V. Sima, G. Marchiori, K. Bertels, Z. Al-Ars","doi":"10.1109/FCCM.2016.17","DOIUrl":"https://doi.org/10.1109/FCCM.2016.17","url":null,"abstract":"We propose a novel FPGA-accelerated BWA-MEM implementation, a popular tool for genomic data mapping. The performance and power-efficiency of the FPGA implementation on the single Xilinx Virtex-7 Alpha Data add-in card is compared against a software-only baseline system. By offloading the Seed Extension phase onto the FPGA, a two-fold speedup in overall application-level performance is achieved and a 1.6x gain in power-efficiency. To facilitate platform and tool-agnostic comparisons, the base pairs per Joule unit is introduced as a measure of power-efficiency. The FPGA design is able to map up to 34 thousand base pairs per Joule.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125688409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Past research and implementation efforts have shown that FPGAs are efficient at processing many graph algorithms. However, they are notoriously hard to program, leading to impractically long development times even for simple applications. We propose a vertex-centric framework for graph processing on FPGAs, providing a base execution model and distributed architecture so that developers need only write very small application kernels.
{"title":"Vertex-Centric Graph Processing on FPGA","authors":"Nina Engelhardt, Hayden Kwok-Hay So","doi":"10.1109/FCCM.2016.31","DOIUrl":"https://doi.org/10.1109/FCCM.2016.31","url":null,"abstract":"Past research and implementation efforts have shown that FPGAs are efficient at processing many graph algorithms. However, they are notoriously hard to program, leading to impractically long development times even for simple applications. We propose a vertex-centric framework for graph processing on FPGAs, providing a base execution model and distributed architecture so that developers need only write very small application kernels.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Jain, Xiangwei Li, P. Singhai, D. Maskell, Suhaib A. Fahmy
Coarse-grained FPGA overlay architectures paired with general purpose processors offer a number of advantages for general purpose hardware acceleration because of software-like programmability, fast compilation, application portability, and improved design productivity. However, the area overheads of these overlays, and in particular architectures with island-style interconnect, negate many of these advantages, preventing their use in practical FPGA-based systems. Crucially, the interconnect flexibility provided by these overlay architectures is normally over-provisioned for accelerators based on feed-forward pipelined datapaths, which in many cases have the general shape of inverted cones. We propose DeCO, a cone shaped cluster of FUs utilizing a simple linear interconnect between them. This reduces the area overheads for implementing compute kernels extracted from compute-intensive applications represented as directed acyclic dataflow graphs, while still allowing high data throughput. We perform design space exploration by modeling programmability overhead as a function of overlay design parameters, and compare to the programmability overhead of island-style overlays. We observe 87% savings in LUT requirements using the proposed approach compared to DSP block based island-style overlays. Our experimental evaluation shows that the proposed overlay exhibits an achievable frequency of 395 MHz, close to the DSP theoretical limit on the Xilinx Zynq. We also present an automated tool flow that provides a rapid and vendor-independent mapping of the high level compute kernel code to the proposed overlay.
{"title":"DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect","authors":"A. Jain, Xiangwei Li, P. Singhai, D. Maskell, Suhaib A. Fahmy","doi":"10.1109/FCCM.2016.10","DOIUrl":"https://doi.org/10.1109/FCCM.2016.10","url":null,"abstract":"Coarse-grained FPGA overlay architectures paired with general purpose processors offer a number of advantages for general purpose hardware acceleration because of software-like programmability, fast compilation, application portability, and improved design productivity. However, the area overheads of these overlays, and in particular architectures with island-style interconnect, negate many of these advantages, preventing their use in practical FPGA-based systems. Crucially, the interconnect flexibility provided by these overlay architectures is normally over-provisioned for accelerators based on feed-forward pipelined datapaths, which in many cases have the general shape of inverted cones. We propose DeCO, a cone shaped cluster of FUs utilizing a simple linear interconnect between them. This reduces the area overheads for implementing compute kernels extracted from compute-intensive applications represented as directed acyclic dataflow graphs, while still allowing high data throughput. We perform design space exploration by modeling programmability overhead as a function of overlay design parameters, and compare to the programmability overhead of island-style overlays. We observe 87% savings in LUT requirements using the proposed approach compared to DSP block based island-style overlays. Our experimental evaluation shows that the proposed overlay exhibits an achievable frequency of 395 MHz, close to the DSP theoretical limit on the Xilinx Zynq. We also present an automated tool flow that provides a rapid and vendor-independent mapping of the high level compute kernel code to the proposed overlay.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116889538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}