The main conference spans over three days from January 8 through January 10 and it is adjoined by two days of workshops before and after the main conference days. The first conference day will begin with two keynote speeches from two research leaders from the academia: Carla-Fabiana Chiasserini from Politecnico di Torino, Italy, and Gerhard P. Fettweis from TU-Dresden, Germany. The second day, January 9 will start with a newly-introduced fireside chat with the two COMSNETS lifetime achievement awardees! On the second day, we will also have a distinguished banquet speaker in the evening: Rahul Mangharam, from University of Pennsylvania, USA. On the third day of the main conference, we will have two distinguished keynote speakers from the industry: Sriram Rajamani. From Microsoft Research, India, and Saravanan Radhakrishnan, CISCO, India.
主要会议从1月8日持续到1月10日,为期三天,在主要会议日之前和之后分别有两天的研讨会。第一天的会议将以两位学术界研究领袖的主题演讲开始:来自意大利都灵理工大学的Carla-Fabiana Chiasserini和来自德国德累斯顿理工大学的Gerhard P. Fettweis。第二天,1月9日,我们将与两位COMSNETS终身成就奖获得者进行一次全新的炉边聊天!第二天晚上,我们还将邀请到一位杰出的宴会主讲人:来自美国宾夕法尼亚大学的Rahul Mangharam。在主会议的第三天,我们将邀请到来自业界的两位杰出的主题演讲者:Sriram Rajamani。来自印度微软研究院和印度思科公司的Saravanan Radhakrishnan。
{"title":"Message from the General Co-Chairs","authors":"A. Luque, Yousef Ibrahim, J. J. Rodríguez","doi":"10.1109/micro.2006.35","DOIUrl":"https://doi.org/10.1109/micro.2006.35","url":null,"abstract":"The main conference spans over three days from January 8 through January 10 and it is adjoined by two days of workshops before and after the main conference days. The first conference day will begin with two keynote speeches from two research leaders from the academia: Carla-Fabiana Chiasserini from Politecnico di Torino, Italy, and Gerhard P. Fettweis from TU-Dresden, Germany. The second day, January 9 will start with a newly-introduced fireside chat with the two COMSNETS lifetime achievement awardees! On the second day, we will also have a distinguished banquet speaker in the evening: Rahul Mangharam, from University of Pennsylvania, USA. On the third day of the main conference, we will have two distinguished keynote speakers from the industry: Sriram Rajamani. From Microsoft Research, India, and Saravanan Radhakrishnan, CISCO, India.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115467360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00038
José Rivadeneira, Félix García Carballeira, J. Carretero, Francisco Javier García Blas
Nowadays, there are two main approaches for dealing with data-intensive applications: parallel file systems in classical High-Performance Computing (HPC) centers and Big Data like parallel file system for ensuring the data centric vision. Furthermore, there is a growing overlap between HPC and Big Data applications, given that Big Data paradigm is a growing consumer of HPC resources. HDFS is one of the most important file systems for data intensive applications while, from the parallel file systems point of view, MPI-IO is the most used interface for parallel I/O. In this paper, we propose a novel solution for taking advantage of HDFS through MPI-based parallel applications. To demonstrate its feasibility, we have included our approach in MIMIR, a MapReduce framework for MPI-based applications. We have optimized MIMIR framework by providing data locality features provided by our approach. The experimental evaluation demonstrates that our solution offers around 25% performance for map phase compared with the MIMIR baseline solution.
{"title":"Exposing data locality in HPC-based systems by using the HDFS backend","authors":"José Rivadeneira, Félix García Carballeira, J. Carretero, Francisco Javier García Blas","doi":"10.1109/HiPC50609.2020.00038","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00038","url":null,"abstract":"Nowadays, there are two main approaches for dealing with data-intensive applications: parallel file systems in classical High-Performance Computing (HPC) centers and Big Data like parallel file system for ensuring the data centric vision. Furthermore, there is a growing overlap between HPC and Big Data applications, given that Big Data paradigm is a growing consumer of HPC resources. HDFS is one of the most important file systems for data intensive applications while, from the parallel file systems point of view, MPI-IO is the most used interface for parallel I/O. In this paper, we propose a novel solution for taking advantage of HDFS through MPI-based parallel applications. To demonstrate its feasibility, we have included our approach in MIMIR, a MapReduce framework for MPI-based applications. We have optimized MIMIR framework by providing data locality features provided by our approach. The experimental evaluation demonstrates that our solution offers around 25% performance for map phase compared with the MIMIR baseline solution.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133645128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00041
Ruihao Li, Shuang Song, Qinzhe Wu, L. John
In the big data domain, the visualization of graph systems provides users more intuitive experiences, especially in the field of social networks, transportation systems, and even medical and biological domains. Processing-in-Memory (PIM) has been a popular choice for deploying emerging applications as a result of its high parallelism and low energy consumption. Furthermore, memory cells of PIM platforms can serve as both compute units and storage units, making PIM solutions able to efficiently support visualizing graphs at different scales. In this paper, we focus on using the PIM platform to accelerate the Force-directed Graph Layout (FdGL) algorithm, which is one of the most fundamental algorithms in the field of visualization. We fully explore the parallelism inside the FdGL algorithm and integrate an algorithm level optimization strategy into our PIM system. In addition, we use programmable instruction sets to achieve more flexibility in our PIM system. Our PIM architecture can achieve 8.07× speedup compared with a GPU platform of the same peak throughput. Compared with state-of-the-art CPU and GPU platforms, our PIM system can achieve an average of 13.33× and 2.14× performance speedup with 74.51× and 14.30× energy consumption reduction on six real world graphs.
{"title":"Accelerating Force-directed Graph Layout with Processing-in-Memory Architecture","authors":"Ruihao Li, Shuang Song, Qinzhe Wu, L. John","doi":"10.1109/HiPC50609.2020.00041","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00041","url":null,"abstract":"In the big data domain, the visualization of graph systems provides users more intuitive experiences, especially in the field of social networks, transportation systems, and even medical and biological domains. Processing-in-Memory (PIM) has been a popular choice for deploying emerging applications as a result of its high parallelism and low energy consumption. Furthermore, memory cells of PIM platforms can serve as both compute units and storage units, making PIM solutions able to efficiently support visualizing graphs at different scales. In this paper, we focus on using the PIM platform to accelerate the Force-directed Graph Layout (FdGL) algorithm, which is one of the most fundamental algorithms in the field of visualization. We fully explore the parallelism inside the FdGL algorithm and integrate an algorithm level optimization strategy into our PIM system. In addition, we use programmable instruction sets to achieve more flexibility in our PIM system. Our PIM architecture can achieve 8.07× speedup compared with a GPU platform of the same peak throughput. Compared with state-of-the-art CPU and GPU platforms, our PIM system can achieve an average of 13.33× and 2.14× performance speedup with 74.51× and 14.30× energy consumption reduction on six real world graphs.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131022316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/hipc50609.2020.00007
{"title":"HiPC 2020 ORGANIZATION","authors":"","doi":"10.1109/hipc50609.2020.00007","DOIUrl":"https://doi.org/10.1109/hipc50609.2020.00007","url":null,"abstract":"","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"35 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114108943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00026
Yuta Nakamura, Raza Ahmad, T. Malik
Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.
{"title":"Content-defined Merkle Trees for Efficient Container Delivery","authors":"Yuta Nakamura, Raza Ahmad, T. Malik","doi":"10.1109/HiPC50609.2020.00026","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00026","url":null,"abstract":"Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (CDMT) over deduplicated storage in a container. CDMT indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. CDMT efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the CDMT index as new image versions are pushed. We show the scalability of CDMT over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122257709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/hipc50609.2020.00001
{"title":"[Title page]","authors":"","doi":"10.1109/hipc50609.2020.00001","DOIUrl":"https://doi.org/10.1109/hipc50609.2020.00001","url":null,"abstract":"","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128891764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00014
Chengshuo Xu, Abbas Mazloumi, Xiaolin Jiang, Rajiv Gupta
Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. Therefore in this paper we develop SimGQ, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices (e.g., multiple shortest path queries originating at different source vertices) and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. The performance benefits are achieved via batching and sharing. Batching fully utilizes system resources to evaluate a batch of queries and amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. Sharing dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. With four input power-law graphs and four graph algorithms SimGQ achieves speedups of up to 45.67 × with batch sizes of up to 512 queries over the baseline implementation that evaluates the queries one by one using the state of the art Ligra system. Moreover, both batching and sharing contribute substantially to the speedups.
{"title":"SimGQ: Simultaneously Evaluating Iterative Graph Queries","authors":"Chengshuo Xu, Abbas Mazloumi, Xiaolin Jiang, Rajiv Gupta","doi":"10.1109/HiPC50609.2020.00014","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00014","url":null,"abstract":"Graph processing frameworks are typically designed to optimize the evaluation of a single graph query. However, in practice, we often need to respond to multiple graph queries, either from different users or from a single user performing a complex analytics task. Therefore in this paper we develop SimGQ, a system that optimizes simultaneous evaluation of a group of vertex queries that originate at different source vertices (e.g., multiple shortest path queries originating at different source vertices) and delivers substantial speedups over a conventional framework that evaluates and responds to queries one by one. The performance benefits are achieved via batching and sharing. Batching fully utilizes system resources to evaluate a batch of queries and amortizes runtime overheads incurred due to fetching vertices and edge lists, synchronizing threads, and maintaining computation frontiers. Sharing dynamically identifies shared queries that substantially represent subcomputations in the evaluation of different queries in a batch, evaluates the shared queries, and then uses their results to accelerate the evaluation of all queries in the batch. With four input power-law graphs and four graph algorithms SimGQ achieves speedups of up to 45.67 × with batch sizes of up to 512 queries over the baseline implementation that evaluates the queries one by one using the state of the art Ligra system. Moreover, both batching and sharing contribute substantially to the speedups.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115342422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00020
Hemant Kumar Giri, Mridul Haque, D. Banerjee
PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.
{"title":"HyPR: Hybrid Page Ranking on Evolving Graphs","authors":"Hemant Kumar Giri, Mridul Haque, D. Banerjee","doi":"10.1109/HiPC50609.2020.00020","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00020","url":null,"abstract":"PageRank (PR) is the standard metric used by the Google search engine to compute the importance of a web page via modeling the entire web as a first order Markov chain. The challenge of computing PR efficiently and quickly has been already addressed by several works previously who have shown innovations in both algorithms and in the use of parallel computing. The standard method of computing PR is handled by modelling the web as a graph. The fast growing internet adds several new web pages everyday and hence more nodes (representing the web pages) and edges (the hyperlinks) are added to this graph in an incremental fashion. Computing PR on this evolving graph is now an emerging challenge since computations from scratch on the massive graph is time consuming and unscalable. In this work, we propose Hybrid Page Rank (HyPR), which computes PR on evolving graphs using collaborative executions on muti-core CPUs and massively parallel GPUs. We exploit data parallelism via efficiently partitioning the graph into different regions that are affected and unaffected by the new updates. The different partitions are then processed in an overlapped manner for PR updates. The novelty of our technique is in utilizing the hybrid platform to scale the solution to massive graphs. The technique also provides high performance through parallel processing of every batch of updates using a parallel algorithm. HyPR efficiently executes on a NVIDIA V100 GPU hosted on a 6th Gen Intel Xeon CPU and is able to update a graph with 640M edges with a single batch of 100,000 edges in 12 ms. HyPR outperforms other state of the art techniques for computing PR on evolving graphs [1] by 4.8x. Additionally HyPR provides 1.2x speedup over GPU only executions, and 95x speedup over CPU only parallel executions.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132546601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/HiPC50609.2020.00044
Keke Zhai, Tania Banerjee-Mishra, A. Wijayasiri, S. Ranka
We present a fine-tuned library, ZTMM, for batched small tensor-matrix multiplication on GPU architectures. Libraries performing optimized matrix-matrix multiplications involving large matrices are available for many architectures, including a GPU. However, these libraries do not provide optimal performance for applications requiring efficient multiplication of a matrix with a batch of small matrices or tensors. There has been recent interest in developing fine-tuned libraries for batched small matrix-matrix multiplication - these efforts are limited to square matrices. ZTMM supports both square and rectangular matrices. We experimentally demonstrate that our library has significantly higher performance than cuBLAS and Magma libraries. We demonstrate our library's use on a spectral element-based solver called CMT-nek that performs high-fidelity predictive simulations using compressible Navier-Stokes equations. CMT-nek involves three-dimensional tensors, but it is possible to apply the same techniques to higher dimensional tensors.
{"title":"Batched Small Tensor-Matrix Multiplications on GPUs","authors":"Keke Zhai, Tania Banerjee-Mishra, A. Wijayasiri, S. Ranka","doi":"10.1109/HiPC50609.2020.00044","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00044","url":null,"abstract":"We present a fine-tuned library, ZTMM, for batched small tensor-matrix multiplication on GPU architectures. Libraries performing optimized matrix-matrix multiplications involving large matrices are available for many architectures, including a GPU. However, these libraries do not provide optimal performance for applications requiring efficient multiplication of a matrix with a batch of small matrices or tensors. There has been recent interest in developing fine-tuned libraries for batched small matrix-matrix multiplication - these efforts are limited to square matrices. ZTMM supports both square and rectangular matrices. We experimentally demonstrate that our library has significantly higher performance than cuBLAS and Magma libraries. We demonstrate our library's use on a spectral element-based solver called CMT-nek that performs high-fidelity predictive simulations using compressible Navier-Stokes equations. CMT-nek involves three-dimensional tensors, but it is possible to apply the same techniques to higher dimensional tensors.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123108525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the Android platform, the cache-slots store applications upon their launch, which it later uses for prefetching. The Least Recently Used (LRU) based caching algorithm which governs these cache-slots can fail to maintain essential applications in the slot, especially in scenarios like memory-crunch, temporal-burst or volatile environment situations. The construction of these cache-slots can be ameliorated by selectively storing user critical applications before their launch. This reform would require a successful forecast of the user-app-launch pattern using intelligent machine learning agents without hindering the smooth execution of parallel processes. In this paper, we propose a sophisticated Temporal based Intelligent Process Management (TIPM) system, which learns to predict a Smart Application List (SAL) based on the usage pattern. Using SAL, we construct Intelligent LRU cache-slots, that retains essential user applications in the memory and provide improved launch rates. Our experimental results from testing TIPM with different users demonstrate significant improvement in cache-hit rate (95%) and yielding a gain of 26% to the current baseline (LRU), thereby making it a valuable enhancement to the platform.
{"title":"Temporal Based Intelligent LRU Cache Construction","authors":"Pavan Nittur, Anuradha Kanukotla, Narendra Mutyala","doi":"10.1109/HiPC50609.2020.00045","DOIUrl":"https://doi.org/10.1109/HiPC50609.2020.00045","url":null,"abstract":"In the Android platform, the cache-slots store applications upon their launch, which it later uses for prefetching. The Least Recently Used (LRU) based caching algorithm which governs these cache-slots can fail to maintain essential applications in the slot, especially in scenarios like memory-crunch, temporal-burst or volatile environment situations. The construction of these cache-slots can be ameliorated by selectively storing user critical applications before their launch. This reform would require a successful forecast of the user-app-launch pattern using intelligent machine learning agents without hindering the smooth execution of parallel processes. In this paper, we propose a sophisticated Temporal based Intelligent Process Management (TIPM) system, which learns to predict a Smart Application List (SAL) based on the usage pattern. Using SAL, we construct Intelligent LRU cache-slots, that retains essential user applications in the memory and provide improved launch rates. Our experimental results from testing TIPM with different users demonstrate significant improvement in cache-hit rate (95%) and yielding a gain of 26% to the current baseline (LRU), thereby making it a valuable enhancement to the platform.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133627930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}