Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237117
Stefan Lankes
Future generation supercomputers will be a hundred times faster than today's leaders of the Top 500 while reaching the exascale mark. It is predicted that this performance gain in terms of CPU power will be achieved by a shift in the ratio of compute nodes to cores per node. The amount of nodes will not grow significantly compared to today's systems, instead they will be built by using many-core CPUs holding more than hundreds of cores resulting in a widening gap between compute power and I/O performance [1]. Four key challenges of future exascale systems have been identified by previous studies that must be coped with when designing them: energy and power, memory and storage, concurrency and locality, and resiliency [2].
{"title":"Revisiting co-scheduling for upcoming ExaScale systems","authors":"Stefan Lankes","doi":"10.1109/HPCSim.2015.7237117","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237117","url":null,"abstract":"Future generation supercomputers will be a hundred times faster than today's leaders of the Top 500 while reaching the exascale mark. It is predicted that this performance gain in terms of CPU power will be achieved by a shift in the ratio of compute nodes to cores per node. The amount of nodes will not grow significantly compared to today's systems, instead they will be built by using many-core CPUs holding more than hundreds of cores resulting in a widening gap between compute power and I/O performance [1]. Four key challenges of future exascale systems have been identified by previous studies that must be coped with when designing them: energy and power, memory and storage, concurrency and locality, and resiliency [2].","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126779255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237072
G. Utrera, Marisa Gil, X. Martorell
Applications for HPC platforms are mainly based on hybrid programming models: MPI for communication and OpenMP for task and fork-join parallelism to exploit shared memory communication inside a node. On the basis of this scheme, much research has been carried out to improve performance. Some examples are: the overlap of communication and computation, or the increase of speedup and bandwidth on new network fabrics (i.e. Infiniband and 10GB or 40GB ethernet). Henceforth, as far as computation and communication are concerned, the HPC platforms will be heterogeneous with high-speed networks. And, in this context, an important issue is to decide how to distribute the workload among all the nodes in order to balance the application execution as well as choosing the most appropriate programming model to exploit parallelism inside the node. In this paper we propose a mechanism to balance dynamically the work distribution among the heterogeneous components of an heterogeneous cluster based on their performance characteristics. For our evaluations we run the miniFE mini-application of the Mantevo suite benchmark, in a heterogeneous Intel MIC cluster. Experimental results show that making an effort to choose the appropriate number of threads can improve performance significantly over choosing the maximum available number of cores in the Intel MIC.
{"title":"In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance","authors":"G. Utrera, Marisa Gil, X. Martorell","doi":"10.1109/HPCSim.2015.7237072","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237072","url":null,"abstract":"Applications for HPC platforms are mainly based on hybrid programming models: MPI for communication and OpenMP for task and fork-join parallelism to exploit shared memory communication inside a node. On the basis of this scheme, much research has been carried out to improve performance. Some examples are: the overlap of communication and computation, or the increase of speedup and bandwidth on new network fabrics (i.e. Infiniband and 10GB or 40GB ethernet). Henceforth, as far as computation and communication are concerned, the HPC platforms will be heterogeneous with high-speed networks. And, in this context, an important issue is to decide how to distribute the workload among all the nodes in order to balance the application execution as well as choosing the most appropriate programming model to exploit parallelism inside the node. In this paper we propose a mechanism to balance dynamically the work distribution among the heterogeneous components of an heterogeneous cluster based on their performance characteristics. For our evaluations we run the miniFE mini-application of the Mantevo suite benchmark, in a heterogeneous Intel MIC cluster. Experimental results show that making an effort to choose the appropriate number of threads can improve performance significantly over choosing the maximum available number of cores in the Intel MIC.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124393442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237066
Y. Elloumi, M. Akil, M. Hedi
Several high-performance applications integrate loop bodies, which represent the most critical sections. This aspect brings two challenges. Firstly, the Worst Case Execution Time (WCET) must be determined in order to define the nested loop timing behaviour. Secondly, the challenge consists in raising the parallelism-level to enhance the performance. In particular, the Multidimensional Retiming (MR) is an important optimization approach that offers several instruction-level-parallelism solutions. Despite the fact that full parallelism allows achieving the optimal WCET, it leads to a high growth in processing cores, which is inadequate to embedded real-time implementations. The main idea of this paper consists in driving the parallelism-level rise in terms of WCET development. First, the MR parameters that correspond to the nested loops are extracted. Thereafter, the WCET is formulated in terms of parallelism level rise. Then, an optimization heuristic is proposed which identifies the parallelism level that permits respecting the WCET constraint. Our experiments indicate that the WCET prediction is accurate within an error rate of 8.54%. Second, the optimization heuristic implementations show an average improvement in number of cores of 27.18% compared to full parallel ones.
{"title":"WCET nested-loop minimization in terms of instruction-level-parallelism","authors":"Y. Elloumi, M. Akil, M. Hedi","doi":"10.1109/HPCSim.2015.7237066","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237066","url":null,"abstract":"Several high-performance applications integrate loop bodies, which represent the most critical sections. This aspect brings two challenges. Firstly, the Worst Case Execution Time (WCET) must be determined in order to define the nested loop timing behaviour. Secondly, the challenge consists in raising the parallelism-level to enhance the performance. In particular, the Multidimensional Retiming (MR) is an important optimization approach that offers several instruction-level-parallelism solutions. Despite the fact that full parallelism allows achieving the optimal WCET, it leads to a high growth in processing cores, which is inadequate to embedded real-time implementations. The main idea of this paper consists in driving the parallelism-level rise in terms of WCET development. First, the MR parameters that correspond to the nested loops are extracted. Thereafter, the WCET is formulated in terms of parallelism level rise. Then, an optimization heuristic is proposed which identifies the parallelism level that permits respecting the WCET constraint. Our experiments indicate that the WCET prediction is accurate within an error rate of 8.54%. Second, the optimization heuristic implementations show an average improvement in number of cores of 27.18% compared to full parallel ones.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134615229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237049
Sung-Soo Kim, Chunglae Cho, Jongho Won
We introduce a novel mobile middleware which provides a collaboration service among associated apps in a symmetric fashion. This paper focuses on the challenge that how users can receive the seamless collaboration services regardless of the changes of physical device configurations in the multiscreen environment. In order to solve this problem, we propose a novel system architecture which supports primitive operations for collaborating among distributed applications, such as remote invocation, session join, session invitation, push migration, pull migration and synchronization. Our system can provide communication transparency, seamless collaboration services and scalability among heterogeneous distributed applications. The experimental results demonstrate that our system can be successfully applied to the collaboration services among multiple apps in the home network environment.
{"title":"A collaboration middleware for service scalability in peer-to-peer systems","authors":"Sung-Soo Kim, Chunglae Cho, Jongho Won","doi":"10.1109/HPCSim.2015.7237049","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237049","url":null,"abstract":"We introduce a novel mobile middleware which provides a collaboration service among associated apps in a symmetric fashion. This paper focuses on the challenge that how users can receive the seamless collaboration services regardless of the changes of physical device configurations in the multiscreen environment. In order to solve this problem, we propose a novel system architecture which supports primitive operations for collaborating among distributed applications, such as remote invocation, session join, session invitation, push migration, pull migration and synchronization. Our system can provide communication transparency, seamless collaboration services and scalability among heterogeneous distributed applications. The experimental results demonstrate that our system can be successfully applied to the collaboration services among multiple apps in the home network environment.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134472725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237039
Luigi Catuogno, Aniello Castiglione, F. Palmieri
Honeypots are an indispensable tool for network and system security as well as for computer forensic investigations. They can be helpful for detecting possible intrusions, as well as for gathering information about their source, attack patterns, final target and purpose. Highly interactive honeypots, are probably the most useful and enlightening ones, since they reveal many information about intruders' behavior and skills, even though the implementation and setup of such tools might require considerable efforts and computational resources. Accordingly we present an architecture for highly interactive honeypots aiming at detecting password-cracking attacks by means of honeywords and leveraging container-based virtualization to provide persistent sessions needed to capture attacker activities.
{"title":"A honeypot system with honeyword-driven fake interactive sessions","authors":"Luigi Catuogno, Aniello Castiglione, F. Palmieri","doi":"10.1109/HPCSim.2015.7237039","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237039","url":null,"abstract":"Honeypots are an indispensable tool for network and system security as well as for computer forensic investigations. They can be helpful for detecting possible intrusions, as well as for gathering information about their source, attack patterns, final target and purpose. Highly interactive honeypots, are probably the most useful and enlightening ones, since they reveal many information about intruders' behavior and skills, even though the implementation and setup of such tools might require considerable efforts and computational resources. Accordingly we present an architecture for highly interactive honeypots aiming at detecting password-cracking attacks by means of honeywords and leveraging container-based virtualization to provide persistent sessions needed to capture attacker activities.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115281040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237112
Mykola Pechenizkiy
Ever increasing volumes of sensor readings, transactional records, web data and event logs call for next generation of big data mining technology providing effective and efficient tools for making use of the streaming data. Predictive analytics on data streams is actively studied in research communities and used in the real-world applications that in turn put in the spotlight several important challenges to be addressed. In this talk I will focus on the challenges of dealing with evolving data streams. In dynamically changing and nonstationary environments, the data distribution can change over time. When such changes can be anticipated and modeled explicitly, we can design context-aware predictive models. When such changes in underlying data distribution over time are unexpected, we deal with the so-called problem of concept drift. I will highlight some of the recent developments in the proactive handling of concept drift and link them to research in context-aware predictive modeling. I will also share some of the insights we gained through the performed case studies in the domains of web analytics, stress analytics, and food sales analytics.
{"title":"Predictive analytics on evolving data streams anticipating and adapting to changes in known and unknown contexts","authors":"Mykola Pechenizkiy","doi":"10.1109/HPCSim.2015.7237112","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237112","url":null,"abstract":"Ever increasing volumes of sensor readings, transactional records, web data and event logs call for next generation of big data mining technology providing effective and efficient tools for making use of the streaming data. Predictive analytics on data streams is actively studied in research communities and used in the real-world applications that in turn put in the spotlight several important challenges to be addressed. In this talk I will focus on the challenges of dealing with evolving data streams. In dynamically changing and nonstationary environments, the data distribution can change over time. When such changes can be anticipated and modeled explicitly, we can design context-aware predictive models. When such changes in underlying data distribution over time are unexpected, we deal with the so-called problem of concept drift. I will highlight some of the recent developments in the proactive handling of concept drift and link them to research in context-aware predictive modeling. I will also share some of the insights we gained through the performed case studies in the domains of web analytics, stress analytics, and food sales analytics.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129618934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237030
A. Schönberger, K. Hofmann
Memory is a heterogeneous complex in modern systems. Access time and bandwidth improvement of DRAM using die-stacking technology can only be evaluated by interacting with hardware components like underlying cache, CPU and software components like executed application and processed input. In this work we analyze encoding and decoding processes of JPEG2000 algorithm execution on MIPS I core for different picture sizes. Thereby we can observe that for picture sizes below particular critical value the DRAM share of execution time reaches max. 4%. Any DRAM improvement for this case would not lead to significant performance gain of whole system. Starting with particular picture size depending on last-level cache size the acceleration effect of cache falls off and DRAM influence rises up to 25% and remains for larger pictures. System-level estimation shows that our suggested 3D DRAM architecture can reduce that rise down to a third and is partially able to adopt cache functionality.
{"title":"Analysis of asymmetric 3D DRAM architecture in combination with L2 cache size reduction","authors":"A. Schönberger, K. Hofmann","doi":"10.1109/HPCSim.2015.7237030","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237030","url":null,"abstract":"Memory is a heterogeneous complex in modern systems. Access time and bandwidth improvement of DRAM using die-stacking technology can only be evaluated by interacting with hardware components like underlying cache, CPU and software components like executed application and processed input. In this work we analyze encoding and decoding processes of JPEG2000 algorithm execution on MIPS I core for different picture sizes. Thereby we can observe that for picture sizes below particular critical value the DRAM share of execution time reaches max. 4%. Any DRAM improvement for this case would not lead to significant performance gain of whole system. Starting with particular picture size depending on last-level cache size the acceleration effect of cache falls off and DRAM influence rises up to 25% and remains for larger pictures. System-level estimation shows that our suggested 3D DRAM architecture can reduce that rise down to a third and is partially able to adopt cache functionality.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121886961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237101
Daniel Becker, A. Streit
The next generation of photon science experiments will be able to produce thousands of images per second. However, many of them will not be useful for further analysis. Due to this large amount of data, it is not feasible to store all data offline for later analysis. Instead, this analysis has to be shifted as close to the data sources as possible. In addition, due to the large volume and velocity of the data, this analysis has to be done highly parallel. In this article we recapitulate our previous work on algorithms on data analysis in photon science as well as the potentially relevant BM3D algorithm. These algorithms are discussed with a focus on their parallel processing capabilities.
{"title":"Real-time signal identification in big data streams Bragg-Spot localization in photon science","authors":"Daniel Becker, A. Streit","doi":"10.1109/HPCSim.2015.7237101","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237101","url":null,"abstract":"The next generation of photon science experiments will be able to produce thousands of images per second. However, many of them will not be useful for further analysis. Due to this large amount of data, it is not feasible to store all data offline for later analysis. Instead, this analysis has to be shifted as close to the data sources as possible. In addition, due to the large volume and velocity of the data, this analysis has to be done highly parallel. In this article we recapitulate our previous work on algorithms on data analysis in photon science as well as the potentially relevant BM3D algorithm. These algorithms are discussed with a focus on their parallel processing capabilities.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128975017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSim.2015.7237027
Daniel Spiekermann, Tobias Eggendorfer, J. Keller
With the rise of cloud computing environments and the increasingly ubiquitous utilization of its opportunities, the amount of data analysed in a traditional digital forensic examination is increasing significantly, thus increasing the risk to miss evidence. Without adopting new methodology or different approaches investigators are unable to guarantee a valid digital forensic investigation. Due to the large amount of cloud platforms it is hardly feasible to identify them when investigating a computer. Knowing all different services of cloud computing platforms is impossible for a human. The paper therefore proposes to investigate raw network data in order to improve the complete digital investigation process by correlating network and computer forensic parts. We present a new method to analyse network traffic to find information about the usage of cloud specific data. With the possibility to automate this extraction and the comparison with a cloud service knowledge base, the error rate of a forensic investigation is reduced. It also reduces the risk of human errors.
{"title":"Using network data to improve digital investigation in cloud computing environments","authors":"Daniel Spiekermann, Tobias Eggendorfer, J. Keller","doi":"10.1109/HPCSim.2015.7237027","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237027","url":null,"abstract":"With the rise of cloud computing environments and the increasingly ubiquitous utilization of its opportunities, the amount of data analysed in a traditional digital forensic examination is increasing significantly, thus increasing the risk to miss evidence. Without adopting new methodology or different approaches investigators are unable to guarantee a valid digital forensic investigation. Due to the large amount of cloud platforms it is hardly feasible to identify them when investigating a computer. Knowing all different services of cloud computing platforms is impossible for a human. The paper therefore proposes to investigate raw network data in order to improve the complete digital investigation process by correlating network and computer forensic parts. We present a new method to analyse network traffic to find information about the usage of cloud specific data. With the possibility to automate this extraction and the comparison with a cloud service knowledge base, the error rate of a forensic investigation is reduced. It also reduces the risk of human errors.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127524344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-20DOI: 10.1109/HPCSIM.2015.7237063
O. Yalgashev, M. Bakhouya, A. Chariete, J. Gaber, M. Manier
Nanocommunication concepts have emerged as a new paradigm that allows nano-processing-elements to communicate using molecular-, acoustic-, mechanical-, or electromagnetic-based techniques. In this paper, the performances of the electromagnetic-based nanocommunication technique as an on-chip communication fabric for SoCs is evaluated. Simulations have been conducted with 2D mesh-like nano-NoC using two routing techniques, flooding and XY routing, and using several traffic patterns, such as Bit-Reversal, Shuffle, Transpose and Uniform. Performance metrics mainly the latency, the throughput and the energy consumption are evaluated and reported to show the behavior of nano-NoC when varying the number of nano-processing-elements and the transmission range.
{"title":"Electromagnetic-based nanonetworks communication in SoC design","authors":"O. Yalgashev, M. Bakhouya, A. Chariete, J. Gaber, M. Manier","doi":"10.1109/HPCSIM.2015.7237063","DOIUrl":"https://doi.org/10.1109/HPCSIM.2015.7237063","url":null,"abstract":"Nanocommunication concepts have emerged as a new paradigm that allows nano-processing-elements to communicate using molecular-, acoustic-, mechanical-, or electromagnetic-based techniques. In this paper, the performances of the electromagnetic-based nanocommunication technique as an on-chip communication fabric for SoCs is evaluated. Simulations have been conducted with 2D mesh-like nano-NoC using two routing techniques, flooding and XY routing, and using several traffic patterns, such as Bit-Reversal, Shuffle, Transpose and Uniform. Performance metrics mainly the latency, the throughput and the energy consumption are evaluated and reported to show the behavior of nano-NoC when varying the number of nano-processing-elements and the transmission range.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}