Performance variance becomes increasingly challenging on current large-scale HPC systems. Even using a fixed number of computing nodes, the execution time of several runs can vary significantly. Many parallel programs executing on supercomputers suffer from such variance. Performance variance not only causes unpredictable performance requirement violations, but also makes it unintuitive to understand the program behavior. Despite prior efforts, efficient on-line detection of performance variance remains an open problem. In this paper, we propose vSensor, a novel approach for light-weight and on-line performance variance detection. The key insight is that, instead of solely relying on an external detector, the source code of a program itself could reveal the runtime performance characteristics. Specifically, many parallel programs contain code snippets that are executed repeatedly with an invariant quantity of work. Based on this observation, we use compiler techniques to automatically identify these fixed-workload snippets and use them as performance variance sensors (v-sensors) that enable effective detection. We evaluate vSensor with a variety of parallel programs on the Tianhe-2 system. Results show that vSensor can effectively detect performance variance on HPC systems. The performance overhead is smaller than 4% with up to 16,384 processes. In particular, with vSensor, we found a bad node with slow memory that slowed a program's performance by 21%. As a showcase, we also detected a severe network performance problem that caused a 3.37X slowdown for an HPC kernel program on the Tianhe-2 system.
{"title":"vSensor","authors":"Xiongchao Tang, Jidong Zhai, Xuehai Qian, Bingsheng He, W. Xue, Wenguang Chen","doi":"10.1145/3200691.3178497","DOIUrl":"https://doi.org/10.1145/3200691.3178497","url":null,"abstract":"Performance variance becomes increasingly challenging on current large-scale HPC systems. Even using a fixed number of computing nodes, the execution time of several runs can vary significantly. Many parallel programs executing on supercomputers suffer from such variance. Performance variance not only causes unpredictable performance requirement violations, but also makes it unintuitive to understand the program behavior. Despite prior efforts, efficient on-line detection of performance variance remains an open problem. In this paper, we propose vSensor, a novel approach for light-weight and on-line performance variance detection. The key insight is that, instead of solely relying on an external detector, the source code of a program itself could reveal the runtime performance characteristics. Specifically, many parallel programs contain code snippets that are executed repeatedly with an invariant quantity of work. Based on this observation, we use compiler techniques to automatically identify these fixed-workload snippets and use them as performance variance sensors (v-sensors) that enable effective detection. We evaluate vSensor with a variety of parallel programs on the Tianhe-2 system. Results show that vSensor can effectively detect performance variance on HPC systems. The performance overhead is smaller than 4% with up to 16,384 processes. In particular, with vSensor, we found a bad node with slow memory that slowed a program's performance by 21%. As a showcase, we also detected a severe network performance problem that caused a 3.37X slowdown for an HPC kernel program on the Tianhe-2 system.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"17 1","pages":"124 - 136"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75306483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks. We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a parallel and concurrent library called PAM (Parallel Augmented Maps) that implements the interface. The interface includes a wide variety of functions on augmented maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, filtering, extracting ranges, splitting, and range-sums. We describe algorithms for these functions that are efficient both in theory and practice. As examples of the use of the interface and the performance of PAM we apply the library to four applications: simple range sums, interval trees, 2D range trees, and ranked word index searching. The interface greatly simplifies the implementation of these data structures over direct implementations. Sequentially the code achieves performance that matches or exceeds existing libraries designed specially for a single application, and in parallel our implementation gets speedups ranging from 40 to 90 on 72 cores with 2-way hyperthreading.
{"title":"PAM","authors":"Yihan Sun, Daniel Ferizovic, Guy E. Belloch","doi":"10.1145/3200691.3178509","DOIUrl":"https://doi.org/10.1145/3200691.3178509","url":null,"abstract":"Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks. We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a parallel and concurrent library called PAM (Parallel Augmented Maps) that implements the interface. The interface includes a wide variety of functions on augmented maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, filtering, extracting ranges, splitting, and range-sums. We describe algorithms for these functions that are efficient both in theory and practice. As examples of the use of the interface and the performance of PAM we apply the library to four applications: simple range sums, interval trees, 2D range trees, and ranked word index searching. The interface greatly simplifies the implementation of these data structures over direct implementations. Sequentially the code achieves performance that matches or exceeds existing libraries designed specially for a single application, and in parallel our implementation gets speedups ranging from 40 to 90 on 72 cores with 2-way hyperthreading.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"19 1","pages":"290 - 304"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88383125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shared-memory parallel programs routinely suffer from false sharing---a performance degradation caused by different threads accessing different variables that reside on the same CPU cacheline and a...
{"title":"Featherlight on-the-fly false-sharing detection","authors":"ChabbiMilind, WenShasha, Liuxu","doi":"10.1145/3200691.3178499","DOIUrl":"https://doi.org/10.1145/3200691.3178499","url":null,"abstract":"Shared-memory parallel programs routinely suffer from false sharing---a performance degradation caused by different threads accessing different variables that reside on the same CPU cacheline and a...","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3200691.3178499","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44819549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, Datalog has gained popularity for the implementation of advanced data analysis. Applications benefit from Datalog's high-level, declarative syntax, and availability of efficient al...
{"title":"Two concurrent data structures for efficient datalog query processing","authors":"JordanHerbert, ScholzBernhard, SubotićPavle","doi":"10.1145/3200691.3178525","DOIUrl":"https://doi.org/10.1145/3200691.3178525","url":null,"abstract":"In recent years, Datalog has gained popularity for the implementation of advanced data analysis. Applications benefit from Datalog's high-level, declarative syntax, and availability of efficient al...","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3200691.3178525","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49211524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Y. Liu, Xiaobing Feng
Replicas 1 of a vertex play an important role in existing distributed graph processing systems which make a single vertex to be parallel processed by multiple machines and access remote neighbors locally without any remote access. However, replicas of vertices introduce data coherency problem. Existing distributed graph systems treat replicas of a vertex v as an atomic and indivisible vertex, and use an eager data coherency approach to guarantee replicas atomicity. In eager data coherency approach, any changes to vertex data must be immediately communicated to all replicas of v, thus leading to frequent global synchronizations and communications. In this paper, we propose a lazy data coherency approach, called LazyAsync, which treats replicas of a vertex as independent vertices and maintains the data coherency by computations, rather than communications in existing eager approach. Our approach automatically selects some data coherency points from the graph algorithm, and maintains all replicas to share the same global view only at such points, which means the replicas are enabled to maintain different local views between any two adjacent data coherency points. Based on PowerGraph, we develop a distributed graph processing system LazyGraph to implement the LazyAsync approach and exploit graph-aware optimizations. On a 48-node EC2-like cluster, LazyGraph outperforms PowerGraph on four widely used graph algorithms across a variety of real-world graphs, with a speedup ranging from 1.25x to 10.69x.
{"title":"Lazygraph","authors":"Lei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Y. Liu, Xiaobing Feng","doi":"10.1145/3200691.3178508","DOIUrl":"https://doi.org/10.1145/3200691.3178508","url":null,"abstract":"Replicas 1 of a vertex play an important role in existing distributed graph processing systems which make a single vertex to be parallel processed by multiple machines and access remote neighbors locally without any remote access. However, replicas of vertices introduce data coherency problem. Existing distributed graph systems treat replicas of a vertex v as an atomic and indivisible vertex, and use an eager data coherency approach to guarantee replicas atomicity. In eager data coherency approach, any changes to vertex data must be immediately communicated to all replicas of v, thus leading to frequent global synchronizations and communications. In this paper, we propose a lazy data coherency approach, called LazyAsync, which treats replicas of a vertex as independent vertices and maintains the data coherency by computations, rather than communications in existing eager approach. Our approach automatically selects some data coherency points from the graph algorithm, and maintains all replicas to share the same global view only at such points, which means the replicas are enabled to maintain different local views between any two adjacent data coherency points. Based on PowerGraph, we develop a distributed graph processing system LazyGraph to implement the LazyAsync approach and exploit graph-aware optimizations. On a 48-node EC2-like cluster, LazyGraph outperforms PowerGraph on four widely used graph algorithms across a variety of real-world graphs, with a speedup ranging from 1.25x to 10.69x.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"20 1","pages":"276 - 289"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85016912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Liu, Wenbin Jiang, Hai Jin, Xuanhua Shi, Yang Ma
Growing accuracy and robustness of Deep Neural Networks (DNN) models are accompanied by growing model capacity (going deeper or wider). However, high memory requirements of those models make it difficult to execute the training process in one GPU. To address it, we first identify the memory usage characteristics for deep and wide convolutional networks, and demonstrate the opportunities of memory reuse on both intra-layer and inter-layer levels. We then present Layrub, a runtime data placement strategy that orchestrates the execution of training process. It achieves layer-centric reuse to reduce memory consumption for extreme-scale deep learning that cannot be run on one single GPU.
{"title":"Layrub","authors":"Bo Liu, Wenbin Jiang, Hai Jin, Xuanhua Shi, Yang Ma","doi":"10.1145/3200691.3178528","DOIUrl":"https://doi.org/10.1145/3200691.3178528","url":null,"abstract":"Growing accuracy and robustness of Deep Neural Networks (DNN) models are accompanied by growing model capacity (going deeper or wider). However, high memory requirements of those models make it difficult to execute the training process in one GPU. To address it, we first identify the memory usage characteristics for deep and wide convolutional networks, and demonstrate the opportunities of memory reuse on both intra-layer and inter-layer levels. We then present Layrub, a runtime data placement strategy that orchestrates the execution of training process. It achieves layer-centric reuse to reduce memory consumption for extreme-scale deep learning that cannot be run on one single GPU.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"8 1","pages":"405 - 406"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87543655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aggressive technology scaling trends have made the hardware of high performance computing (HPC) systems more susceptible to faults. Some of these faults can lead to silent data corruption (SDC), and represent a serious problem because they alter the HPC simulation results. In this paper, we present a full-coverage, systematic methodology called DisCVar to identify critical variables in HPC applications for protection against SDC. DisCVar uses automatic differentiation (AD) to determine the sensitivity of the simulation output to errors in program variables. We empirically validate our approach in identifying vulnerable variables by comparing the results against a full-coverage code-level fault injection campaign. We find that our DisCVar correctly identifies the variables that are critical to ensure application SDC resilience with a high degree of accuracy compared to the results of the fault injection campaign. Additionally, DisCVar requires only two executions of the target program to generate results, whereas in our experiments we needed to perform millions of executions to get the same information from a fault injection campaign.
{"title":"DisCVar","authors":"Harshitha Menon, K. Mohror","doi":"10.1145/3200691.3178502","DOIUrl":"https://doi.org/10.1145/3200691.3178502","url":null,"abstract":"Aggressive technology scaling trends have made the hardware of high performance computing (HPC) systems more susceptible to faults. Some of these faults can lead to silent data corruption (SDC), and represent a serious problem because they alter the HPC simulation results. In this paper, we present a full-coverage, systematic methodology called DisCVar to identify critical variables in HPC applications for protection against SDC. DisCVar uses automatic differentiation (AD) to determine the sensitivity of the simulation output to errors in program variables. We empirically validate our approach in identifying vulnerable variables by comparing the results against a full-coverage code-level fault injection campaign. We find that our DisCVar correctly identifies the variables that are critical to ensure application SDC resilience with a high degree of accuracy compared to the results of the fault injection campaign. Additionally, DisCVar requires only two executions of the target program to generate results, whereas in our experiments we needed to perform millions of executions to get the same information from a fault injection campaign.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"43 1","pages":"195 - 206"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82235541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may significantly impact the speedup by requiring global synchronization across multiprocessors (SMs) inside the GPU. To efficiently run applications with interblock data dependences, we need fine-granular task-based execution models that will treat SMs inside a GPU as stand-alone parallel processing units. Such a scheme will enable faster execution by utilizing all internal computation elements inside the GPU and eliminating unnecessary waits during device-wide global barriers. In this paper, we propose Juggler, a task-based execution scheme for GPU workloads with data dependences. The Juggler framework takes applications embedding OpenMP 4.5 tasks as input and executes them on the GPU via an efficient in-device runtime, hence eliminating the need for kernel-wide global synchronization. Juggler requires no or little modification to the source code, and once launched, the runtime entirely runs on the GPU without relying on the host through the entire execution. We have evaluated Juggler on an NVIDIA Tesla P100 GPU and obtained up to 31% performance improvement against global barrier based implementation, with minimal runtime overhead.
单指令多数据(SIMD)计算的科学应用程序在今天的图形处理单元(gpu)上运行时显示出相当大的性能改进。然而,跨线程块的数据依赖的存在可能会通过要求GPU内部跨多处理器(SMs)的全局同步来显著影响加速。为了有效地运行具有块间数据依赖的应用程序,我们需要细粒度的基于任务的执行模型,该模型将GPU内的SMs视为独立的并行处理单元。这样的方案将通过利用GPU内部的所有内部计算元素来实现更快的执行,并消除在设备范围的全局屏障期间不必要的等待。在本文中,我们提出了一种基于任务的执行方案,用于具有数据依赖性的GPU工作负载。Juggler框架将嵌入openmp4.5任务的应用程序作为输入,并通过高效的设备内运行时在GPU上执行它们,从而消除了对内核范围内全局同步的需求。Juggler不需要对源代码进行修改,一旦启动,运行时就完全在GPU上运行,而在整个执行过程中不依赖于主机。我们在NVIDIA Tesla P100 GPU上对Juggler进行了评估,与基于全局屏障的实现相比,性能提高了31%,运行时开销最小。
{"title":"Juggler","authors":"M. Belviranli, Seyong Lee, J. Vetter, L. Bhuyan","doi":"10.1145/3200691.3178492","DOIUrl":"https://doi.org/10.1145/3200691.3178492","url":null,"abstract":"Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may significantly impact the speedup by requiring global synchronization across multiprocessors (SMs) inside the GPU. To efficiently run applications with interblock data dependences, we need fine-granular task-based execution models that will treat SMs inside a GPU as stand-alone parallel processing units. Such a scheme will enable faster execution by utilizing all internal computation elements inside the GPU and eliminating unnecessary waits during device-wide global barriers. In this paper, we propose Juggler, a task-based execution scheme for GPU workloads with data dependences. The Juggler framework takes applications embedding OpenMP 4.5 tasks as input and executes them on the GPU via an efficient in-device runtime, hence eliminating the need for kernel-wide global synchronization. Juggler requires no or little modification to the source code, and once launched, the runtime entirely runs on the GPU without relying on the host through the entire execution. We have evaluated Juggler on an NVIDIA Tesla P100 GPU and obtained up to 31% performance improvement against global barrier based implementation, with minimal runtime overhead.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"55 1","pages":"54 - 67"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75693868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world applications. Currently, much research on parallel SpTRSV focuses on level-set construction for reducing the number of inter-level synchronizations. However, the out-of-control data reuse and high cost for global memory or shared cache access in inter-level synchronization have been largely neglected in existing work. In this paper, we propose a novel data layout called Sparse Level Tile to make all data reuse under control, and design a Producer-Consumer pairing method to make any inter-level synchronization only happen in very fast register communication. We implement our data layout and algorithms on an SW26010 many-core processor, which is the main building-block of the current world fastest supercomputer Sunway Taihulight. The experimental results of testing all 2057 square matrices from the Florida Matrix Collection show that our method achieves an average speedup of 6.9 and the best speedup of 38.5 over parallel level-set method. Our method also outperforms the latest methods on a KNC many-core processor in 1856 matrices and the latest methods on a K80 GPU in 1672 matrices, respectively.
{"title":"swSpTRSV","authors":"Xinliang Wang, Weifeng Liu, W. Xue, Li Wu","doi":"10.1145/3200691.3178513","DOIUrl":"https://doi.org/10.1145/3200691.3178513","url":null,"abstract":"Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world applications. Currently, much research on parallel SpTRSV focuses on level-set construction for reducing the number of inter-level synchronizations. However, the out-of-control data reuse and high cost for global memory or shared cache access in inter-level synchronization have been largely neglected in existing work. In this paper, we propose a novel data layout called Sparse Level Tile to make all data reuse under control, and design a Producer-Consumer pairing method to make any inter-level synchronization only happen in very fast register communication. We implement our data layout and algorithms on an SW26010 many-core processor, which is the main building-block of the current world fastest supercomputer Sunway Taihulight. The experimental results of testing all 2057 square matrices from the Florida Matrix Collection show that our method achieves an average speedup of 6.9 and the best speedup of 38.5 over parallel level-set method. Our method also outperforms the latest methods on a KNC many-core processor in 1856 matrices and the latest methods on a K80 GPU in 1672 matrices, respectively.","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":"35 1","pages":"338 - 353"},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91358275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Non-volatile memory is expected to coexist with (or even displace) volatile DRAM for main memory in upcoming architectures. This has led to increasing interest in the problem of designing and speci...
{"title":"A persistent lock-free queue for non-volatile memory","authors":"FriedmanMichal, HerlihyMaurice, MaratheVirendra, PetrankErez","doi":"10.1145/3200691.3178490","DOIUrl":"https://doi.org/10.1145/3200691.3178490","url":null,"abstract":"Non-volatile memory is expected to coexist with (or even displace) volatile DRAM for main memory in upcoming architectures. This has led to increasing interest in the problem of designing and speci...","PeriodicalId":50923,"journal":{"name":"ACM Sigplan Notices","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3200691.3178490","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47053303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}