In the Big Data regime, Dimensionality Reduction (DR) has a fundamental role towards facilitating useful analytics on the data. Quite recently, Johnson Lindenstrauss (JL) Lemma-based DR is actively researched from both theoretical and application perspectives. In this paper, we provide some preliminary results demonstrating the utility of the deterministic partial Fourier matrices with the rows picked according to an appropriate Cyclic Difference Set (CDS), for projecting the data vectors into the lower dimension. Apart from bringing out the fact that these matrices preserve the pair-wise distances among the vectors equally well as their random counterparts, results are also provided for their applicability in image classification and clustering.
{"title":"JL Lemma Based Dimensionality Reduction: On Using CDS Based Partial Fourier Matrices","authors":"Snigdha Tariyal, N. Narendra, M. Chandra","doi":"10.1109/HiPCW.2015.9","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.9","url":null,"abstract":"In the Big Data regime, Dimensionality Reduction (DR) has a fundamental role towards facilitating useful analytics on the data. Quite recently, Johnson Lindenstrauss (JL) Lemma-based DR is actively researched from both theoretical and application perspectives. In this paper, we provide some preliminary results demonstrating the utility of the deterministic partial Fourier matrices with the rows picked according to an appropriate Cyclic Difference Set (CDS), for projecting the data vectors into the lower dimension. Apart from bringing out the fact that these matrices preserve the pair-wise distances among the vectors equally well as their random counterparts, results are also provided for their applicability in image classification and clustering.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125160373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Volume of Fluid (VOF) method is widely used to simulate free surface flows. There are various interface tracking and capturing schemes available with this model. The explicit interface tracking scheme based on geometrical reconstruction of the interface is the most accurate but computationally expensive. On the other hand, there are interface capturing schemes based on algebraic formulation that are comparatively more diffusive but computationally less expensive. These interface capturing schemes can be used with implicit and explicit volume fraction formulations. For industrial strength cases, the use of implicit schemes is increasing as it is a good compromise between speed and accuracy. In meshing such complex geometries, there is a tradeoff between the cell count and the quality of mesh. While resolving the key areas with good quality mesh and keeping the cell count within acceptable limit, sometimes quality of mesh in certain regions suffers. Such regions may contain highly skewed cells, cells with high aspect ratio or high cell-jumps. Such mesh can have issues in convergence, worsen the issue of interfacial diffusion and lead to inaccurate results. In the present work, two numerical treatments called interfacial anti-diffusion and poor mesh numerics are developed and implemented in ANSYS Fluent R16. Interfacial anti-diffusion treatment helps reduce the numerical diffusion and sharpen the interface. Poor mesh numerics treatment identifies cells with bad quality and applies appropriate numerical treatment to help stability and convergence. Results of test cases and an industrial strength case are reported with and without these treatments. It is shown that using these treatment results in improvement of stability and accuracy. For the industrial strength case, the results are in good agreement with the experimental data.
{"title":"Development and Application of Interfacial Anti-Diffusion and Poor Mesh Numerics Treatments for Free Surface Flows","authors":"V. Gupta, Mohib Khan, H. Punekar","doi":"10.1109/HiPCW.2015.12","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.12","url":null,"abstract":"Volume of Fluid (VOF) method is widely used to simulate free surface flows. There are various interface tracking and capturing schemes available with this model. The explicit interface tracking scheme based on geometrical reconstruction of the interface is the most accurate but computationally expensive. On the other hand, there are interface capturing schemes based on algebraic formulation that are comparatively more diffusive but computationally less expensive. These interface capturing schemes can be used with implicit and explicit volume fraction formulations. For industrial strength cases, the use of implicit schemes is increasing as it is a good compromise between speed and accuracy. In meshing such complex geometries, there is a tradeoff between the cell count and the quality of mesh. While resolving the key areas with good quality mesh and keeping the cell count within acceptable limit, sometimes quality of mesh in certain regions suffers. Such regions may contain highly skewed cells, cells with high aspect ratio or high cell-jumps. Such mesh can have issues in convergence, worsen the issue of interfacial diffusion and lead to inaccurate results. In the present work, two numerical treatments called interfacial anti-diffusion and poor mesh numerics are developed and implemented in ANSYS Fluent R16. Interfacial anti-diffusion treatment helps reduce the numerical diffusion and sharpen the interface. Poor mesh numerics treatment identifies cells with bad quality and applies appropriate numerical treatment to help stability and convergence. Results of test cases and an industrial strength case are reported with and without these treatments. It is shown that using these treatment results in improvement of stability and accuracy. For the industrial strength case, the results are in good agreement with the experimental data.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125986165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. The complete presentation was not made available for publication as part of the conference proceedings. The recent proliferation of extremely large datasets due to high-throughput instrumentation in sciences and engineering, ubiquitous deployment of sensors, and birth and rise of social networks, have resulted in numerous data-driven challenges that are now captured by the umbrella term "big data". This talk will have two parts. In the first part, I will brief the audience on the ongoing federal initiatives in Big Data in the United States. The second part will focus on my group's research in big data, focused on supporting applications in the life sciences. In particular, I will describe big data problems arising from advances in high-throughput DNA sequencing and our work on developing parallel methods to support genomic and metagenomic applications driven by these advances.
{"title":"Big data in life sciences and public health","authors":"S. Aluru","doi":"10.1109/HiPCW.2015.32","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.32","url":null,"abstract":"Summary form only given. The complete presentation was not made available for publication as part of the conference proceedings. The recent proliferation of extremely large datasets due to high-throughput instrumentation in sciences and engineering, ubiquitous deployment of sensors, and birth and rise of social networks, have resulted in numerous data-driven challenges that are now captured by the umbrella term \"big data\". This talk will have two parts. In the first part, I will brief the audience on the ongoing federal initiatives in Big Data in the United States. The second part will focus on my group's research in big data, focused on supporting applications in the life sciences. In particular, I will describe big data problems arising from advances in high-throughput DNA sequencing and our work on developing parallel methods to support genomic and metagenomic applications driven by these advances.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122262869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The challenge faced by compressor industries is for an efficient, reliable compressed air machine. In order to achieve this, moisture separator plays a key role in removing the bulk water coming out of the cooler. But the cost investment on an external moisture separator has made industries to rethink for alternate approach. In this regard this paper presents an integrated moisture separator design as a cost effective solution using a commercially available CFD algorithm with Eulerian-Lagrangian Multiphase approach. The principal finding from this study suggests that the hook shape fin arrangement upstream and downstream leads to uniform flow distribution with better separation efficiency. The steady state CFD simulation is in good agreement with corresponding experimental test data. Numerous parameters were validated such as droplet size, droplet impingement, Stokes number, Weber number, Reynolds number, separation efficiency and pressure drop. Thus the results indicate that steady state CFD may be an effective design tool.
{"title":"Integrated Moisture Separator Design Using CFD","authors":"G. Gowda, Balaji Kasthurirangan","doi":"10.1109/HiPCW.2015.11","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.11","url":null,"abstract":"The challenge faced by compressor industries is for an efficient, reliable compressed air machine. In order to achieve this, moisture separator plays a key role in removing the bulk water coming out of the cooler. But the cost investment on an external moisture separator has made industries to rethink for alternate approach. In this regard this paper presents an integrated moisture separator design as a cost effective solution using a commercially available CFD algorithm with Eulerian-Lagrangian Multiphase approach. The principal finding from this study suggests that the hook shape fin arrangement upstream and downstream leads to uniform flow distribution with better separation efficiency. The steady state CFD simulation is in good agreement with corresponding experimental test data. Numerous parameters were validated such as droplet size, droplet impingement, Stokes number, Weber number, Reynolds number, separation efficiency and pressure drop. Thus the results indicate that steady state CFD may be an effective design tool.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121766723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the explosion of big data analytics, scaling linear algebra packages has become extremely important. Inthe context of GPUs, cuBLAS API provides a highly efficientpackage for linear algebra subroutines on a single GPU. Dueto inputs of large dimensions, it often becomes necessary tocompute over clusters. However, the package does not provide facilities for computing over a 'cluster of GPUs' efficiently. Inthis paper, we demonstrate a high level framework for scaling linear algebra computations across a cluster of GPUs, through matrix multiplication problem. In particular, we describe amethod of specifying matrices using powerlists that captures both parallelism and recursion succinctly, and automatically schedule partitioned matrices over a GPU cluster to gain the advantages of cuBLAS for computing the product of partitioned matrices over a cluster of GPUs. Our experimental results show significant performance gains, of the order ofat least 132% for large matrices over that of a single GPUcomputation. The method reflects the map-reduce paradigmwhere the matrices are mapped to appropriate partitioned matrices and sent to appropriate members of the clusters andthe results are collected to obtain the resultant matrix.
{"title":"Scaling Computation on GPUs Using Powerlists","authors":"Anshu S. Anand, R. Shyamasundar","doi":"10.1109/HiPCW.2015.14","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.14","url":null,"abstract":"With the explosion of big data analytics, scaling linear algebra packages has become extremely important. Inthe context of GPUs, cuBLAS API provides a highly efficientpackage for linear algebra subroutines on a single GPU. Dueto inputs of large dimensions, it often becomes necessary tocompute over clusters. However, the package does not provide facilities for computing over a 'cluster of GPUs' efficiently. Inthis paper, we demonstrate a high level framework for scaling linear algebra computations across a cluster of GPUs, through matrix multiplication problem. In particular, we describe amethod of specifying matrices using powerlists that captures both parallelism and recursion succinctly, and automatically schedule partitioned matrices over a GPU cluster to gain the advantages of cuBLAS for computing the product of partitioned matrices over a cluster of GPUs. Our experimental results show significant performance gains, of the order ofat least 132% for large matrices over that of a single GPUcomputation. The method reflects the map-reduce paradigmwhere the matrices are mapped to appropriate partitioned matrices and sent to appropriate members of the clusters andthe results are collected to obtain the resultant matrix.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125228767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A major portion of the big data that is produced comprises of videos coming from surveillance cameras deployed to view streets, buildings, offices etc. The surveillance videos are mainly used for monitoring day to day activities. The video sequences are long and the events of interest occur only over a short duration. Hence, there is a pressing need to analyze and detect events to avoid continuous manual monitoring of entire video sequence. The first step towards that is to extract the foreground information. In this paper we present an effective online multilinear subspace learning algorithm which incrementally learns and models the background as a low-rank tensor. This background modeling combined with appropriate post processing steps is useful to detect anomalous events, thus in turn the foreground, in the video. The efficacy of the proposed method is also brought out in the simulation results provided.
{"title":"Sequential Multilinear Subspace Based Event Detection in Large Video Data Sequences","authors":"Bharat Venkitesh, K. PavanKumarReddy, M. Chandra","doi":"10.1109/HiPCW.2015.13","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.13","url":null,"abstract":"A major portion of the big data that is produced comprises of videos coming from surveillance cameras deployed to view streets, buildings, offices etc. The surveillance videos are mainly used for monitoring day to day activities. The video sequences are long and the events of interest occur only over a short duration. Hence, there is a pressing need to analyze and detect events to avoid continuous manual monitoring of entire video sequence. The first step towards that is to extract the foreground information. In this paper we present an effective online multilinear subspace learning algorithm which incrementally learns and models the background as a low-rank tensor. This background modeling combined with appropriate post processing steps is useful to detect anomalous events, thus in turn the foreground, in the video. The efficacy of the proposed method is also brought out in the simulation results provided.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"324 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122988586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fast growth of application data has led the migration of existing reporting applications to Big data open source technologies such as Hive and Hadoop. Their wide acceptance also considers their use for servicing on-line analytic queries. Ensuring performance assurance of Hive queries will be required to maintain desired level of application performance. Hive query execution time may increase with increase in data size and change in the cluster size. In this paper, we propose a regression based analytical model to predict execution time of Hive query with growth in data volume. A Hive query is executed as DAG of MapReduce (MR) jobs on Hadoop system, this requires predictive model for MR job execution time. We propose multiple linear regression to compute models for various sub phases of MR job execution and build a consolidated model for predicting the execution time of a MR job on large data volume. We introduce ratio of a phase output record size to its input record size and number of map waves as additional sensitive parameters for predicting MR job execution time. The model is validated with MapReduce benchmark and real world financial application for prediction error within 10 %.
{"title":"Performance Assurance Model for HiveQL on Large Data Volume","authors":"Amit Sangroya, Rekha Singhal","doi":"10.1109/HiPCW.2015.8","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.8","url":null,"abstract":"Fast growth of application data has led the migration of existing reporting applications to Big data open source technologies such as Hive and Hadoop. Their wide acceptance also considers their use for servicing on-line analytic queries. Ensuring performance assurance of Hive queries will be required to maintain desired level of application performance. Hive query execution time may increase with increase in data size and change in the cluster size. In this paper, we propose a regression based analytical model to predict execution time of Hive query with growth in data volume. A Hive query is executed as DAG of MapReduce (MR) jobs on Hadoop system, this requires predictive model for MR job execution time. We propose multiple linear regression to compute models for various sub phases of MR job execution and build a consolidated model for predicting the execution time of a MR job on large data volume. We introduce ratio of a phase output record size to its input record size and number of map waves as additional sensitive parameters for predicting MR job execution time. The model is validated with MapReduce benchmark and real world financial application for prediction error within 10 %.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116491072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A mesh-dependent relation for the slip number in the Navier-slip with friction boundary condition for computations of impinging droplets with sharp interface methods is proposed. The relation is obtained as a function of Reynolds number, Weber number and the mesh size. The proposed relation is validated for several test cases by comparing the numerically obtained wetting diameter with the experimental results. Further, the computationally obtained maximum wetting diameter using the proposed slip relation is verified with the theoretical predictions. The relative error between the computationally obtained maximum wetting diameter and the theoretical predictions is less than 10% for impinging droplet on a hydrophilic surface, and the error increases in the case of hydrophobic surface.
{"title":"On the Navier-Slip Boundary Condition for Computations of Impinging Droplets","authors":"Jagannath Venkatesan, Sashikumaar Ganesan","doi":"10.1109/HiPCW.2015.10","DOIUrl":"https://doi.org/10.1109/HiPCW.2015.10","url":null,"abstract":"A mesh-dependent relation for the slip number in the Navier-slip with friction boundary condition for computations of impinging droplets with sharp interface methods is proposed. The relation is obtained as a function of Reynolds number, Weber number and the mesh size. The proposed relation is validated for several test cases by comparing the numerically obtained wetting diameter with the experimental results. Further, the computationally obtained maximum wetting diameter using the proposed slip relation is verified with the theoretical predictions. The relative error between the computationally obtained maximum wetting diameter and the theoretical predictions is less than 10% for impinging droplet on a hydrophilic surface, and the error increases in the case of hydrophobic surface.","PeriodicalId":203902,"journal":{"name":"2015 IEEE 22nd International Conference on High Performance Computing Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}