Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237095
D. Merten, N. Ettrich
GRT angle migration is a ray-based method in seismic imaging, that provides direct access to the angle dependency of the reflection coefficient in the sub-surface. The implementation of this algorithm has become feasible by massive parallelism with fast access to large dataset. The parallelization strategy is described with respect to the underlying data mapping problem.
{"title":"GRT angle migration a 5D data mapping problem","authors":"D. Merten, N. Ettrich","doi":"10.1109/HPCSim.2015.7237095","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237095","url":null,"abstract":"GRT angle migration is a ray-based method in seismic imaging, that provides direct access to the angle dependency of the reflection coefficient in the sub-surface. The implementation of this algorithm has become feasible by massive parallelism with fast access to large dataset. The parallelization strategy is described with respect to the underlying data mapping problem.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126152227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237033
S. Zertal
SSDs have been widely deployed in different areas and become competitive storage devices even for data-intensive applications. They have important performance and endurance requirements and their internal features provide a real potential to fulfil them. The multiple and independent SSD internal components allow parallel access to data at each of the four levels (package-chip-die-plane) but it relies completely on the data layout scheme. We proposed a data layout algorithm based only on the SSD basic operations. It distributes data up to the lowest level to exploit the fine grain internal parallelism and improves the SSD performance. In this paper, we also use advanced commands available on newer SSDs and request scheduling in combination with data layout scheme to provide up to the planes parallelism, taking into account both performance and endurance. The result is a new data layout algorithn to exploit the fine grain SSD internal parallelism. It respects the rules imposed by the wise use of advanced commands and the recommandations of maintaining a wide data distribution. The results show an improvement of performance and a Write Amplification (WA) factor very close to the one using basic operations which indicates a preserved endurance.
{"title":"Advanced commands and distributed data layout to enhance the SSD internal parallelism","authors":"S. Zertal","doi":"10.1109/HPCSim.2015.7237033","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237033","url":null,"abstract":"SSDs have been widely deployed in different areas and become competitive storage devices even for data-intensive applications. They have important performance and endurance requirements and their internal features provide a real potential to fulfil them. The multiple and independent SSD internal components allow parallel access to data at each of the four levels (package-chip-die-plane) but it relies completely on the data layout scheme. We proposed a data layout algorithm based only on the SSD basic operations. It distributes data up to the lowest level to exploit the fine grain internal parallelism and improves the SSD performance. In this paper, we also use advanced commands available on newer SSDs and request scheduling in combination with data layout scheme to provide up to the planes parallelism, taking into account both performance and endurance. The result is a new data layout algorithn to exploit the fine grain SSD internal parallelism. It respects the rules imposed by the wise use of advanced commands and the recommandations of maintaining a wide data distribution. The results show an improvement of performance and a Write Amplification (WA) factor very close to the one using basic operations which indicates a preserved endurance.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125787441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237073
W. Vanderbauwhede, T. Takemi
In this paper we present a new scheme for parallelization of the Successive Over-Relaxation method for solving the Poisson equation over a 3-D volume. Our new scheme is both simple and effective, outperforming the conventional Red-Black scheme by a factor of 16 on an NVIDIA GeForce GTX 590 GPU, a factor of 11 on an NVIDIA GeForce TITAN Black GPU and a factor of 5 on an Intel Xeon Phi. The speed-up compared to the fully optimised reference implementation running on an Intel Xeon CPU is 16 times on the GTX 590, 22 times on the TITAN and 5 times on the Xeon Phi. We explain the rationale and the implementation in OpenCL and present the performance evaluation results.
{"title":"Twinned buffering: A simple and highly effective scheme for parallelization of Successive Over-Relaxation on GPUs and other accelerators","authors":"W. Vanderbauwhede, T. Takemi","doi":"10.1109/HPCSim.2015.7237073","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237073","url":null,"abstract":"In this paper we present a new scheme for parallelization of the Successive Over-Relaxation method for solving the Poisson equation over a 3-D volume. Our new scheme is both simple and effective, outperforming the conventional Red-Black scheme by a factor of 16 on an NVIDIA GeForce GTX 590 GPU, a factor of 11 on an NVIDIA GeForce TITAN Black GPU and a factor of 5 on an Intel Xeon Phi. The speed-up compared to the fully optimised reference implementation running on an Intel Xeon CPU is 16 times on the GTX 590, 22 times on the TITAN and 5 times on the Xeon Phi. We explain the rationale and the implementation in OpenCL and present the performance evaluation results.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131523339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237061
M. Sajjad, Karandeep Singh, Chang-Won Ahn
Multi-Agent based modeling and simulation (MAS) has proven to be a useful approach for the study of complex social phenomena. Due to the diversity and huge number of factors, many population dynamics problems are difficult to be addressed properly with traditional analytical and statistical techniques. This research work focus on match making process of population dynamics. We designed a model in which agents interacting with other agents and environment to find a life partner. We are considering that agent's age and socio-economics (referred to as education and income level) conditions are the key factors while taking decision for family formation. Using belief, desires and intensions (BDI) architecture, we explicitly take into account the agent's heterogeneity with respect to age and income level. Using multi-agent model, this study explores how changes in agent's desires and intensions might be transmitted through a population to effect the overall perception. Our model give more substantial evidence about how and why these attributes can influence the evolution of family formation.
{"title":"Multi-Agent modeling for match-making using BDI architecture","authors":"M. Sajjad, Karandeep Singh, Chang-Won Ahn","doi":"10.1109/HPCSim.2015.7237061","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237061","url":null,"abstract":"Multi-Agent based modeling and simulation (MAS) has proven to be a useful approach for the study of complex social phenomena. Due to the diversity and huge number of factors, many population dynamics problems are difficult to be addressed properly with traditional analytical and statistical techniques. This research work focus on match making process of population dynamics. We designed a model in which agents interacting with other agents and environment to find a life partner. We are considering that agent's age and socio-economics (referred to as education and income level) conditions are the key factors while taking decision for family formation. Using belief, desires and intensions (BDI) architecture, we explicitly take into account the agent's heterogeneity with respect to age and income level. Using multi-agent model, this study explores how changes in agent's desires and intensions might be transmitted through a population to effect the overall perception. Our model give more substantial evidence about how and why these attributes can influence the evolution of family formation.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134083836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237017
S. Gesing, R. Dooley, M. Pierce, Jens Krüger, Richard Grunzke, S. Herres‐Pawlis, A. Hoffmann
Modeling and simulations, which necessitate HPC infrastructures, are often based on complex scientific theories and involve interdisciplinary research teams. IT specialists support with the efficient access to HPC infrastructures. They design, implement and configure the simulations and models reflecting the sophisticated theoretical models and approaches developed and applied by domain researchers. Roles in such interdisciplinary teams may overlap dependent on the knowledge and experience with computational resources and/or the research domain. Bioinformaticians, for example, are in general trained to act as IT specialists, while having also a good knowledge about biology and chemistry to support the user community competently. Domain researchers are mainly not IT specialists and the requirement to employ HPC infrastructures via command line often forms a huge hurdle for them. Thus, there is the need to increase the usability of simulations and models on HPC infrastructures for the uptake by the user community. Science gateways form a solution, which offer a graphical user interface tailored to a specific research domain with a single point of entry for job and data management hiding the underlying infrastructure. In the last 10 years quite a few web development frameworks, science gateway frameworks and APIs with different foci and strengths have evolved to support the developers of science gateways in implementing an intuitive solution for a target research domain. The selection of a suitable technology for a specific use case is essential and helps reducing the effort in implementing the science gateway by re-using existing software or frameworks. Thus, a solution for a user community can be provided more efficiently. This paper goes into detail for science gateway concepts as well as information resources, gives examples for successful technologies and proposes criteria for choosing a technology for a use case.
{"title":"Science gateways - leveraging modeling and simulations in HPC infrastructures via increased usability","authors":"S. Gesing, R. Dooley, M. Pierce, Jens Krüger, Richard Grunzke, S. Herres‐Pawlis, A. Hoffmann","doi":"10.1109/HPCSim.2015.7237017","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237017","url":null,"abstract":"Modeling and simulations, which necessitate HPC infrastructures, are often based on complex scientific theories and involve interdisciplinary research teams. IT specialists support with the efficient access to HPC infrastructures. They design, implement and configure the simulations and models reflecting the sophisticated theoretical models and approaches developed and applied by domain researchers. Roles in such interdisciplinary teams may overlap dependent on the knowledge and experience with computational resources and/or the research domain. Bioinformaticians, for example, are in general trained to act as IT specialists, while having also a good knowledge about biology and chemistry to support the user community competently. Domain researchers are mainly not IT specialists and the requirement to employ HPC infrastructures via command line often forms a huge hurdle for them. Thus, there is the need to increase the usability of simulations and models on HPC infrastructures for the uptake by the user community. Science gateways form a solution, which offer a graphical user interface tailored to a specific research domain with a single point of entry for job and data management hiding the underlying infrastructure. In the last 10 years quite a few web development frameworks, science gateway frameworks and APIs with different foci and strengths have evolved to support the developers of science gateways in implementing an intuitive solution for a target research domain. The selection of a suitable technology for a specific use case is essential and helps reducing the effort in implementing the science gateway by re-using existing software or frameworks. Thus, a solution for a user community can be provided more efficiently. This paper goes into detail for science gateway concepts as well as information resources, gives examples for successful technologies and proposes criteria for choosing a technology for a use case.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116323141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237105
Mikolaj Baranowski, A. Belloum, M. Bubak
Cloud service providers support many different standards in authentication and resource management. Libraries and APIs provided to operate in a cloud environment are not consistent and require deep understanding of internal technical aspects. Thus, we propose a framework that provides high-level language to develop applications and, thanks to its layered structure, API to modify low-level operation.
{"title":"Cookery: A framework for developing cloud applications","authors":"Mikolaj Baranowski, A. Belloum, M. Bubak","doi":"10.1109/HPCSim.2015.7237105","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237105","url":null,"abstract":"Cloud service providers support many different standards in authentication and resource management. Libraries and APIs provided to operate in a cloud environment are not consistent and require deep understanding of internal technical aspects. Thus, we propose a framework that provides high-level language to develop applications and, thanks to its layered structure, API to modify low-level operation.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116433428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237057
C. Diamantini, Laura Genga, D. Potena
Process Mining (PM) encompasses a number of methodologies designed for extracting knowledge from event logs, typically recorded by operational information systems like ERPs, Workflow Management Systems or other process-aware enterprise systems. The structured nature of processes implemented in these systems has led to the development of effective techniques for conformance checking (check if a real execution trace conforms to a predefined process schema) or process discovery (synthesize a process schema from a set of real execution traces recorded in the trace log) [1]. However in many knowledge-intensive domains, like e.g. health care, emergency management, research and innovation development, processes are typically characterized by little or no structure, since the flow of activities strongly depends on context-dependent decisions that should rely on human knowledge. Consequently, classical process discovery techniques usually provide limited support in analyzing these processes. As a further issue, in these domains an integrated information system may not even exist, requiring to integrate a number of independent event logs.
{"title":"ESub: Mining and exploring substructures in knowledge-intensive processes","authors":"C. Diamantini, Laura Genga, D. Potena","doi":"10.1109/HPCSim.2015.7237057","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237057","url":null,"abstract":"Process Mining (PM) encompasses a number of methodologies designed for extracting knowledge from event logs, typically recorded by operational information systems like ERPs, Workflow Management Systems or other process-aware enterprise systems. The structured nature of processes implemented in these systems has led to the development of effective techniques for conformance checking (check if a real execution trace conforms to a predefined process schema) or process discovery (synthesize a process schema from a set of real execution traces recorded in the trace log) [1]. However in many knowledge-intensive domains, like e.g. health care, emergency management, research and innovation development, processes are typically characterized by little or no structure, since the flow of activities strongly depends on context-dependent decisions that should rely on human knowledge. Consequently, classical process discovery techniques usually provide limited support in analyzing these processes. As a further issue, in these domains an integrated information system may not even exist, requiring to integrate a number of independent event logs.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123284004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237050
J. Zubairi, E. Erdogan, Shaun Reich
LTE and WiMAX are broadband wireless technologies recognized as 4G. Both technologies support various multimedia applications such as voice over Internet protocol (VoIP), video conferencing and online gaming. In general, the network applications have diverse requirements to be satisfied including Quality of Service (QoS). Therefore, effective scheduling is critical for overall performance of 4G systems. Many traffic scheduling algorithms are available for wireless networks, e.g. Weighted Fair Queuing (WFQ), Earliest Deadline First (EDF) and Weighted Round Robin (WRR). In this work, an analysis of various scheduling algorithms in 4G networks based on QoS criteria is carried out. The WiMAX hybrid scheduling algorithm (EDF + WFQ + FIFO) provides strict priority to QoS traffic over BE subscriber requests. This may cause starvation when the concentration of nrtPS and BE is high. We have experimented with a modified hybrid algorithm (MOHSA) to provide a more equitable scheduling. Similarly, LTE has time domain QoS scheduling in GBR (1,2) and non-GBR (3,4 and 5) classes. We have implemented a variant of MOHSA in TD scheduler of LTE to ensure fairness. Results indicate a fair allocation of bandwidth and starvation avoidance for BE traffic.
{"title":"Experiments in fair scheduling in 4G WiMAX and LTE","authors":"J. Zubairi, E. Erdogan, Shaun Reich","doi":"10.1109/HPCSim.2015.7237050","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237050","url":null,"abstract":"LTE and WiMAX are broadband wireless technologies recognized as 4G. Both technologies support various multimedia applications such as voice over Internet protocol (VoIP), video conferencing and online gaming. In general, the network applications have diverse requirements to be satisfied including Quality of Service (QoS). Therefore, effective scheduling is critical for overall performance of 4G systems. Many traffic scheduling algorithms are available for wireless networks, e.g. Weighted Fair Queuing (WFQ), Earliest Deadline First (EDF) and Weighted Round Robin (WRR). In this work, an analysis of various scheduling algorithms in 4G networks based on QoS criteria is carried out. The WiMAX hybrid scheduling algorithm (EDF + WFQ + FIFO) provides strict priority to QoS traffic over BE subscriber requests. This may cause starvation when the concentration of nrtPS and BE is high. We have experimented with a modified hybrid algorithm (MOHSA) to provide a more equitable scheduling. Similarly, LTE has time domain QoS scheduling in GBR (1,2) and non-GBR (3,4 and 5) classes. We have implemented a variant of MOHSA in TD scheduler of LTE to ensure fairness. Results indicate a fair allocation of bandwidth and starvation avoidance for BE traffic.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124916934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237089
Nunzio Cassavia, Pietro Dicosta, E. Masciari, D. Saccá
Due to the emerging Big Data applications traditional data management techniques result inadequate in many real life scenarios. In particular, OLAP techniques require substantial changes in order to offer useful analysis due to huge amount of data to be analyzed and their velocity and variety. In this paper, we describe an approach for dynamic Big Data searching that based on data collected by a suitable storage system, enrich data in order to guide users through data exploration in a efficient and effective way.
{"title":"Improving tourist experience by Big Data tools","authors":"Nunzio Cassavia, Pietro Dicosta, E. Masciari, D. Saccá","doi":"10.1109/HPCSim.2015.7237089","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237089","url":null,"abstract":"Due to the emerging Big Data applications traditional data management techniques result inadequate in many real life scenarios. In particular, OLAP techniques require substantial changes in order to offer useful analysis due to huge amount of data to be analyzed and their velocity and variety. In this paper, we describe an approach for dynamic Big Data searching that based on data collected by a suitable storage system, enrich data in order to guide users through data exploration in a efficient and effective way.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121684380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237058
Matt Talistu, Teng-Sheng Moh, M. Moh
With the growth of the Internet, social networks, and other distributed systems, there is an abundance of data about user transactions, network traffic, social interactions, and other areas that is available for analysis. Extracting knowledge from this data has become a growing field of research recently, especially as the size of the data makes traditional data mining methods ineffective. Some approaches assume the data is at a central location or a complete set of data is available for analysis. However, many modern-day applications consume distributed data streams. The dataset is spread across multiple locations and each location only has access to a portion of the data stream. We propose a distributed data stream analysis method, which uses hierarchical clustering for local online summary, a gossip protocol for distributing these summaries, and spectral clustering for offline analysis. The resulting solution successfully avoids the heavy computation and communication capability requirements of a centralized approach. Through experiments, we have demonstrated that the proposed solution is able to accurately cluster the data streams and is highly scalable. Its quality significantly increases as the number of microcluster increases, yet it is fault-tolerant when this number is small. Finally, it has achieved a similar level of accuracy when compared with a centralized approach.
{"title":"Gossip-based spectral clustering of distributed data streams","authors":"Matt Talistu, Teng-Sheng Moh, M. Moh","doi":"10.1109/HPCSim.2015.7237058","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237058","url":null,"abstract":"With the growth of the Internet, social networks, and other distributed systems, there is an abundance of data about user transactions, network traffic, social interactions, and other areas that is available for analysis. Extracting knowledge from this data has become a growing field of research recently, especially as the size of the data makes traditional data mining methods ineffective. Some approaches assume the data is at a central location or a complete set of data is available for analysis. However, many modern-day applications consume distributed data streams. The dataset is spread across multiple locations and each location only has access to a portion of the data stream. We propose a distributed data stream analysis method, which uses hierarchical clustering for local online summary, a gossip protocol for distributing these summaries, and spectral clustering for offline analysis. The resulting solution successfully avoids the heavy computation and communication capability requirements of a centralized approach. Through experiments, we have demonstrated that the proposed solution is able to accurately cluster the data streams and is highly scalable. Its quality significantly increases as the number of microcluster increases, yet it is fault-tolerant when this number is small. Finally, it has achieved a similar level of accuracy when compared with a centralized approach.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121996103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}