Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00038
José Rivadeneira, Félix García Carballeira, J. Carretero, Francisco Javier García Blas
Nowadays, there are two main approaches for dealing with data-intensive applications: parallel file systems in classical High-Performance Computing (HPC) centers and Big Data like parallel file system for ensuring the data centric vision. Furthermore, there is a growing overlap between HPC and Big Data applications, given that Big Data paradigm is a growing consumer of HPC resources. HDFS is one of the most important file systems for data intensive applications while, from the parallel file systems point of view, MPI-IO is the most used interface for parallel I/O. In this paper, we propose a novel solution for taking advantage of HDFS through MPI-based parallel applications. To demonstrate its feasibility, we have included our approach in MIMIR, a MapReduce framework for MPI-based applications. We have optimized MIMIR framework by providing data locality features provided by our approach. The experimental evaluation demonstrates that our solution offers around 25% performance for map phase compared with the MIMIR baseline solution.
{"title":"Exposing data locality in HPC-based systems by using the HDFS backend","authors":"José Rivadeneira, Félix García Carballeira, J. Carretero, Francisco Javier García Blas","doi":"10.1109/HiPC50609.2020.00038","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00038","url":null,"abstract":"Nowadays, there are two main approaches for dealing with data-intensive applications: parallel file systems in classical High-Performance Computing (HPC) centers and Big Data like parallel file system for ensuring the data centric vision. Furthermore, there is a growing overlap between HPC and Big Data applications, given that Big Data paradigm is a growing consumer of HPC resources. HDFS is one of the most important file systems for data intensive applications while, from the parallel file systems point of view, MPI-IO is the most used interface for parallel I/O. In this paper, we propose a novel solution for taking advantage of HDFS through MPI-based parallel applications. To demonstrate its feasibility, we have included our approach in MIMIR, a MapReduce framework for MPI-based applications. We have optimized MIMIR framework by providing data locality features provided by our approach. The experimental evaluation demonstrates that our solution offers around 25% performance for map phase compared with the MIMIR baseline solution.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133645128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00030
Thomas Keller, P. Varman
Managing the trade-off between efficiency and fairness in a storage system is challenging due to high variability in workload behavior. Most workloads are made up of a mix of asymmetric operations (e.g. read/write, sequential/random, or striped/isolated I/Os) in different proportions, which places different resource demands on the storage device. The problem is to allocate device resources to the heterogeneous workloads fairly while maintaining high device throughput. In this paper, we present a new model for fair allocation of heterogeneous workloads with different ratios of asymmetric operations. We propose an adaptive scheme that chooses between two policies-the traditional Time-Balanced Allocation (TBA) and our proposed Bottleneck-Balanced Allocation (BBA)-based on workload characteristics. The fairness and throughput of these allocation policies are established through formal analysis. Our algorithms are tested with an adaptive, dynamic scheduler implemented in a simulation testbed, and the results validate the performance benefits of our approach.
{"title":"Fair Allocation of Asymmetric Operations in Storage Systems","authors":"Thomas Keller, P. Varman","doi":"10.1109/HiPC50609.2020.00030","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00030","url":null,"abstract":"Managing the trade-off between efficiency and fairness in a storage system is challenging due to high variability in workload behavior. Most workloads are made up of a mix of asymmetric operations (e.g. read/write, sequential/random, or striped/isolated I/Os) in different proportions, which places different resource demands on the storage device. The problem is to allocate device resources to the heterogeneous workloads fairly while maintaining high device throughput. In this paper, we present a new model for fair allocation of heterogeneous workloads with different ratios of asymmetric operations. We propose an adaptive scheme that chooses between two policies-the traditional Time-Balanced Allocation (TBA) and our proposed Bottleneck-Balanced Allocation (BBA)-based on workload characteristics. The fairness and throughput of these allocation policies are established through formal analysis. Our algorithms are tested with an adaptive, dynamic scheduler implemented in a simulation testbed, and the results validate the performance benefits of our approach.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129346112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00014
Chengshuo Xu, Abbas Mazloumi, Xiaolin Jiang, Rajiv Gupta
Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. Therefore in this paper we develop SimGQ, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices (e.g., multiple shortest path queries originating at different source vertices) and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. The performance benefits are achieved via batching and sharing. Batching fully utilizes system resources to evaluate a batch of queries and amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. Sharing dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. With four input power-law graphs and four graph algorithms SimGQ achieves speedups of up to 45.67 × with batch sizes of up to 512 queries over the baseline implementation that evaluates the queries one by one using the state of the art Ligra system. Moreover, both batching and sharing contribute substantially to the speedups.
{"title":"SimGQ: Simultaneously Evaluating Iterative Graph Queries","authors":"Chengshuo Xu, Abbas Mazloumi, Xiaolin Jiang, Rajiv Gupta","doi":"10.1109/HiPC50609.2020.00014","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00014","url":null,"abstract":"Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. Therefore in this paper we develop SimGQ, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices (e.g., multiple shortest path queries originating at different source vertices) and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. The performance benefits are achieved via batching and sharing. Batching fully utilizes system resources to evaluate a batch of queries and amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. Sharing dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. With four input power-law graphs and four graph algorithms SimGQ achieves speedups of up to 45.67 × with batch sizes of up to 512 queries over the baseline implementation that evaluates the queries one by one using the state of the art Ligra system. Moreover, both batching and sharing contribute substantially to the speedups.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115342422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/hipc50609.2020.00001
{"title":"[Title page]","authors":"","doi":"10.1109/hipc50609.2020.00001","DOIUrl":"https://doi.org/10.1109/hipc50609.2020.00001","url":null,"abstract":"","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128891764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00026
Yuta Nakamura, Raza Ahmad, T. Malik
Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.
{"title":"Content-defined Merkle Trees for Efficient Container Delivery","authors":"Yuta Nakamura, Raza Ahmad, T. Malik","doi":"10.1109/HiPC50609.2020.00026","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00026","url":null,"abstract":"Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122257709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/hipc50609.2020.00007
{"title":"HiPC 2020 ORGANIZATION","authors":"","doi":"10.1109/hipc50609.2020.00007","DOIUrl":"https://doi.org/10.1109/hipc50609.2020.00007","url":null,"abstract":"","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"35 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114108943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00020
Hemant Kumar Giri, Mridul Haque, D. Banerjee
PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.
{"title":"HyPR: Hybrid Page Ranking on Evolving Graphs","authors":"Hemant Kumar Giri, Mridul Haque, D. Banerjee","doi":"10.1109/HiPC50609.2020.00020","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00020","url":null,"abstract":"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132546601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00041
Ruihao Li, Shuang Song, Qinzhe Wu, L. John
In the big data domain, the visualization of graph systems provides users more intuitive experiences, especially in the field of social networks, transportation systems, and even medical and biological domains. Processing-in-Memory (PIM) has been a popular choice for deploying emerging applications as a result of its high parallelism and low energy consumption. Furthermore, memory cells of PIM platforms can serve as both compute units and storage units, making PIM solutions able to efficiently support visualizing graphs at different scales. In this paper, we focus on using the PIM platform to accelerate the Force-directed Graph Layout (FdGL) algorithm, which is one of the most fundamental algorithms in the field of visualization. We fully explore the parallelism inside the FdGL algorithm and integrate an algorithm level optimization strategy into our PIM system. In addition, we use programmable instruction sets to achieve more flexibility in our PIM system. Our PIM architecture can achieve 8.07× speedup compared with a GPU platform of the same peak throughput. Compared with state-of-the-art CPU and GPU platforms, our PIM system can achieve an average of 13.33× and 2.14× performance speedup with 74.51× and 14.30× energy consumption reduction on six real world graphs.
{"title":"Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture","authors":"Ruihao Li, Shuang Song, Qinzhe Wu, L. John","doi":"10.1109/HiPC50609.2020.00041","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00041","url":null,"abstract":"In the big data domain, the visualization of graph systems provides users more intuitive experiences, especially in the field of social networks, transportation systems, and even medical and biological domains. Processing-in-Memory (PIM) has been a popular choice for deploying emerging applications as a result of its high parallelism and low energy consumption. Furthermore, memory cells of PIM platforms can serve as both compute units and storage units, making PIM solutions able to efficiently support visualizing graphs at different scales. In this paper, we focus on using the PIM platform to accelerate the Force-directed Graph Layout (FdGL) algorithm, which is one of the most fundamental algorithms in the field of visualization. We fully explore the parallelism inside the FdGL algorithm and integrate an algorithm level optimization strategy into our PIM system. In addition, we use programmable instruction sets to achieve more flexibility in our PIM system. Our PIM architecture can achieve 8.07× speedup compared with a GPU platform of the same peak throughput. Compared with state-of-the-art CPU and GPU platforms, our PIM system can achieve an average of 13.33× and 2.14× performance speedup with 74.51× and 14.30× energy consumption reduction on six real world graphs.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131022316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00044
Keke Zhai, Tania Banerjee-Mishra, A. Wijayasiri, S. Ranka
We present a fine-tuned library, ZTMM, for batched small tensor-matrix multiplication on GPU architectures. Libraries performing optimized matrix-matrix multiplications involving large matrices are available for many architectures, including a GPU. However, these libraries do not provide optimal performance for applications requiring efficient multiplication of a matrix with a batch of small matrices or tensors. There has been recent interest in developing fine-tuned libraries for batched small matrix-matrix multiplication - these efforts are limited to square matrices. ZTMM supports both square and rectangular matrices. We experimentally demonstrate that our library has significantly higher performance than cuBLAS and Magma libraries. We demonstrate our library's use on a spectral element-based solver called CMT-nek that performs high-fidelity predictive simulations using compressible Navier-Stokes equations. CMT-nek involves three-dimensional tensors, but it is possible to apply the same techniques to higher dimensional tensors.
{"title":"Batched Small Tensor-Matrix Multiplications on GPUs","authors":"Keke Zhai, Tania Banerjee-Mishra, A. Wijayasiri, S. Ranka","doi":"10.1109/HiPC50609.2020.00044","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00044","url":null,"abstract":"We present a fine-tuned library, ZTMM, for batched small tensor-matrix multiplication on GPU architectures. Libraries performing optimized matrix-matrix multiplications involving large matrices are available for many architectures, including a GPU. However, these libraries do not provide optimal performance for applications requiring efficient multiplication of a matrix with a batch of small matrices or tensors. There has been recent interest in developing fine-tuned libraries for batched small matrix-matrix multiplication - these efforts are limited to square matrices. ZTMM supports both square and rectangular matrices. We experimentally demonstrate that our library has significantly higher performance than cuBLAS and Magma libraries. We demonstrate our library's use on a spectral element-based solver called CMT-nek that performs high-fidelity predictive simulations using compressible Navier-Stokes equations. CMT-nek involves three-dimensional tensors, but it is possible to apply the same techniques to higher dimensional tensors.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123108525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the Android platform, the cache-slots store applications upon their launch, which it later uses for prefetching. The Least Recently Used (LRU) based caching algorithm which governs these cache-slots can fail to maintain essential applications in the slot, especially in scenarios like memory-crunch, temporal-burst or volatile environment situations. The construction of these cache-slots can be ameliorated by selectively storing user critical applications before their launch. This reform would require a successful forecast of the user-app-launch pattern using intelligent machine learning agents without hindering the smooth execution of parallel processes. In this paper, we propose a sophisticated Temporal based Intelligent Process Management (TIPM) system, which learns to predict a Smart Application List (SAL) based on the usage pattern. Using SAL, we construct Intelligent LRU cache-slots, that retains essential user applications in the memory and provide improved launch rates. Our experimental results from testing TIPM with different users demonstrate significant improvement in cache-hit rate (95%) and yielding a gain of 26% to the current baseline (LRU), thereby making it a valuable enhancement to the platform.
{"title":"Temporal Based Intelligent LRU Cache Construction","authors":"Pavan Nittur, Anuradha Kanukotla, Narendra Mutyala","doi":"10.1109/HiPC50609.2020.00045","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00045","url":null,"abstract":"In the Android platform, the cache-slots store applications upon their launch, which it later uses for prefetching. The Least Recently Used (LRU) based caching algorithm which governs these cache-slots can fail to maintain essential applications in the slot, especially in scenarios like memory-crunch, temporal-burst or volatile environment situations. The construction of these cache-slots can be ameliorated by selectively storing user critical applications before their launch. This reform would require a successful forecast of the user-app-launch pattern using intelligent machine learning agents without hindering the smooth execution of parallel processes. In this paper, we propose a sophisticated Temporal based Intelligent Process Management (TIPM) system, which learns to predict a Smart Application List (SAL) based on the usage pattern. Using SAL, we construct Intelligent LRU cache-slots, that retains essential user applications in the memory and provide improved launch rates. Our experimental results from testing TIPM with different users demonstrate significant improvement in cache-hit rate (95%) and yielding a gain of 26% to the current baseline (LRU), thereby making it a valuable enhancement to the platform.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}