Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097876
Shuo Liu, Soamar Homsi, Ming Fan, Shaolei Ren, Gang Quan, Shangping Ren
Web applications grow tremendously in both scale and scope, the application patterns turn to be more and more sophisticated. It is important but challenging for service providers to lower the operational costs without degrading user experiences, especially in the case where a service provider's profit is closely related to the user experience (e.g. response time.) In this paper, we study the problem of efficiently scheduling multi-tier time sensitive applications on distributed computing platforms with respect to the user's Quality of Service (QoS) requirements. The efficiency refers to the QoS satisfaction with low average response times. The service provider must ensures that service requests be served successfully before end-to-end deadlines with certain probabilities. To solve this problem, we propose an approach to judiciously assign a deadline for each service tier. An application request is dropped if any one of its services misses its deadline. Our simulation results demonstrate that our approach can statistically guarantee the required QoS more efficiently than the other widely applied methods (e.g. acceptance control, first-come-first-serve, deterministic sub deadline assignment, etc.) irrespective of whether the resources are shared or not by multiple different applications.
{"title":"Scheduling time-sensitive multi-tier services with probabilistic performance guarantee","authors":"Shuo Liu, Soamar Homsi, Ming Fan, Shaolei Ren, Gang Quan, Shangping Ren","doi":"10.1109/PADSW.2014.7097876","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097876","url":null,"abstract":"Web applications grow tremendously in both scale and scope, the application patterns turn to be more and more sophisticated. It is important but challenging for service providers to lower the operational costs without degrading user experiences, especially in the case where a service provider's profit is closely related to the user experience (e.g. response time.) In this paper, we study the problem of efficiently scheduling multi-tier time sensitive applications on distributed computing platforms with respect to the user's Quality of Service (QoS) requirements. The efficiency refers to the QoS satisfaction with low average response times. The service provider must ensures that service requests be served successfully before end-to-end deadlines with certain probabilities. To solve this problem, we propose an approach to judiciously assign a deadline for each service tier. An application request is dropped if any one of its services misses its deadline. Our simulation results demonstrate that our approach can statistically guarantee the required QoS more efficiently than the other widely applied methods (e.g. acceptance control, first-come-first-serve, deterministic sub deadline assignment, etc.) irrespective of whether the resources are shared or not by multiple different applications.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134090576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097888
Daisuke Yamamasu, Naohiro Hayashibara
In peer-to-peer networks, each node directly connects to other nodes without access points. This type of network system is useful for information sharing by using mobile devices (e.g., smart phones). On message delivery over the network, it is very difficult to assume the static routing if each node is assumed to move. In this paper, we suppose to use gossip-style epidemic message dissemination and show the performance evaluation of several gossip algorithms in terms of network topology. Specifically, we focus on the distribution of links in the network. Our results clarified the characteristics of those algorithms on the topologies that are biased the degree distribution locally.
{"title":"On message reachability of gossip algorithms in degree-biased peer-to-peer networks","authors":"Daisuke Yamamasu, Naohiro Hayashibara","doi":"10.1109/PADSW.2014.7097888","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097888","url":null,"abstract":"In peer-to-peer networks, each node directly connects to other nodes without access points. This type of network system is useful for information sharing by using mobile devices (e.g., smart phones). On message delivery over the network, it is very difficult to assume the static routing if each node is assumed to move. In this paper, we suppose to use gossip-style epidemic message dissemination and show the performance evaluation of several gossip algorithms in terms of network topology. Specifically, we focus on the distribution of links in the network. Our results clarified the characteristics of those algorithms on the topologies that are biased the degree distribution locally.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133366660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097893
Xian Chen, Wenzhi Chen, Shuiqiao Yang, Zhongyong Lu, Zonghui Wang
With the rapid development of multi-core and multi-threading technologies, the performance gap between CPU and storage system is widening year by year, causing the storage system to be the bottleneck of the whole system performance. To alleviate this situation, flash memory has been used as the caching device of HDDs. On the other hand, cloud computing is becoming more and more popular and mature in industry field. As the key building block of it, virtualization technology allows several virtual machines (VMs) running on one single physical machine simultaneously, most of which usually run the same or similar operating systems and applications. In this scenario, flash cache will be occupied by many duplicate data blocks. However, existing flash cache architectures and replacement policies don't take this observation into consideration, which greatly limits the efficient use of the flash cache. In this paper, we propose a new duplication-aware flash cache architecture (DASH). In this architecture, flash cache is organized to cache only one copy of the duplicate data blocks, which can notably expand the effective cache capacity, making more I/O requests hit in the cache. Moreover, this architecture can reduce the amount of data written to flash cache, and thus the life span of flash device can be significantly prolonged. Experiments based on realistic applications show that, in some situations, our cache architecture can improve the cache hit ratio by 5 times, reduce the average I/O latency by 63% and eliminate flash cache writes by 81%.
{"title":"DASH: A duplication-aware flash cache architecture in virtualization environment","authors":"Xian Chen, Wenzhi Chen, Shuiqiao Yang, Zhongyong Lu, Zonghui Wang","doi":"10.1109/PADSW.2014.7097893","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097893","url":null,"abstract":"With the rapid development of multi-core and multi-threading technologies, the performance gap between CPU and storage system is widening year by year, causing the storage system to be the bottleneck of the whole system performance. To alleviate this situation, flash memory has been used as the caching device of HDDs. On the other hand, cloud computing is becoming more and more popular and mature in industry field. As the key building block of it, virtualization technology allows several virtual machines (VMs) running on one single physical machine simultaneously, most of which usually run the same or similar operating systems and applications. In this scenario, flash cache will be occupied by many duplicate data blocks. However, existing flash cache architectures and replacement policies don't take this observation into consideration, which greatly limits the efficient use of the flash cache. In this paper, we propose a new duplication-aware flash cache architecture (DASH). In this architecture, flash cache is organized to cache only one copy of the duplicate data blocks, which can notably expand the effective cache capacity, making more I/O requests hit in the cache. Moreover, this architecture can reduce the amount of data written to flash cache, and thus the life span of flash device can be significantly prolonged. Experiments based on realistic applications show that, in some situations, our cache architecture can improve the cache hit ratio by 5 times, reduce the average I/O latency by 63% and eliminate flash cache writes by 81%.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129145945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097800
Chenying Hou, Dong Li, Li Cui
The lightweight RESTful protocols, such as CoAP and SeaHttp, have been proposed for web-based sensor network (WBSN), which provides web service on resource-constrained devices. In general, because sensing data are spatially correlated in sensor network, it is efficient to request a group of devices located in nearby area. Thus group requesting is a typical way to provide web service for resource-constrained devices in WBSN. However, it is a critical problem that how to make an optimal assignment of nodes for a group of requests to maximize network lifetime. In this paper, we address this problem in the scenario where nodes have different initial energy, and they can process in-network group request with branch and combine methods supported by SeaHttp. We prove this problem is NP-complete and transform the problem into an edge-weighted semi-matching problem in bipartite graph using the fat tree construction algorithm. Finally we propose an approximation algorithm to solve the problem. Simulation results show that our approach prolong lifetime of the network by 29.11% on average, which is more competitive when it is applied in a high concurrency scenario compared with traditional methods.
{"title":"EasiMG: A method of maximizing lifetime for group request in web-based sensor network","authors":"Chenying Hou, Dong Li, Li Cui","doi":"10.1109/PADSW.2014.7097800","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097800","url":null,"abstract":"The lightweight RESTful protocols, such as CoAP and SeaHttp, have been proposed for web-based sensor network (WBSN), which provides web service on resource-constrained devices. In general, because sensing data are spatially correlated in sensor network, it is efficient to request a group of devices located in nearby area. Thus group requesting is a typical way to provide web service for resource-constrained devices in WBSN. However, it is a critical problem that how to make an optimal assignment of nodes for a group of requests to maximize network lifetime. In this paper, we address this problem in the scenario where nodes have different initial energy, and they can process in-network group request with branch and combine methods supported by SeaHttp. We prove this problem is NP-complete and transform the problem into an edge-weighted semi-matching problem in bipartite graph using the fat tree construction algorithm. Finally we propose an approximation algorithm to solve the problem. Simulation results show that our approach prolong lifetime of the network by 29.11% on average, which is more competitive when it is applied in a high concurrency scenario compared with traditional methods.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128822417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097863
David Ozog, A. Malony, J. Hammond, P. Balaji
Partitioned global address space (PGAS) applications, such as the Tensor Contraction Engine (TCE) in NWChem, often apply a one-process-per-core mapping in which each process iterates through the following work-processing cycle: (1) determine a work-item dynamically, (2) get data via one-sided operations on remote blocks, (3) perform computation on the data locally, (4) put (or accumulate) resultant data into an appropriate remote location, and (5) repeat the cycle. However, this simple flow of execution does not effectively hide communication latency costs despite the opportunities for making asynchronous progress. Utilizing nonblocking communication calls is not sufficient unless care is taken to efficiently manage a responsive queue of outstanding communication requests. This paper presents a new runtime model and its library implementation for managing tunable “work queues” in PGAS applications. Our runtime execution model, called WorkQ, assigns some number of on-node “producer” processes to primarily do communication (steps 1, 2, 4, and 5) and the other “consumer” processes to do computation (step 3); but processes can switch roles dynamically for the sake of performance. Load balance, synchronization, and overlap of communication and computation are facilitated by a tunable nodewise FIFO message queue protocol. Our WorkQ library implementation enables an MPI+X hybrid programming model where the X comprises SysV message queues and the user's choice of SysV, POSIX, and MPI shared memory. We develop a simplified software mini-application that mimics the performance behavior of the TCE at arbitrary scale, and we show that the WorkQ engine outperforms the original model by about a factor of 2. We also show performance improvement in the TCE coupled cluster module of NWChem.
{"title":"WorkQ: A many-core producer/consumer execution model applied to PGAS computations","authors":"David Ozog, A. Malony, J. Hammond, P. Balaji","doi":"10.1109/PADSW.2014.7097863","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097863","url":null,"abstract":"Partitioned global address space (PGAS) applications, such as the Tensor Contraction Engine (TCE) in NWChem, often apply a one-process-per-core mapping in which each process iterates through the following work-processing cycle: (1) determine a work-item dynamically, (2) get data via one-sided operations on remote blocks, (3) perform computation on the data locally, (4) put (or accumulate) resultant data into an appropriate remote location, and (5) repeat the cycle. However, this simple flow of execution does not effectively hide communication latency costs despite the opportunities for making asynchronous progress. Utilizing nonblocking communication calls is not sufficient unless care is taken to efficiently manage a responsive queue of outstanding communication requests. This paper presents a new runtime model and its library implementation for managing tunable “work queues” in PGAS applications. Our runtime execution model, called WorkQ, assigns some number of on-node “producer” processes to primarily do communication (steps 1, 2, 4, and 5) and the other “consumer” processes to do computation (step 3); but processes can switch roles dynamically for the sake of performance. Load balance, synchronization, and overlap of communication and computation are facilitated by a tunable nodewise FIFO message queue protocol. Our WorkQ library implementation enables an MPI+X hybrid programming model where the X comprises SysV message queues and the user's choice of SysV, POSIX, and MPI shared memory. We develop a simplified software mini-application that mimics the performance behavior of the TCE at arbitrary scale, and we show that the WorkQ engine outperforms the original model by about a factor of 2. We also show performance improvement in the TCE coupled cluster module of NWChem.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117121899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097891
N. Ivaki, Filipe Araújo
The Hypertext Transfer Protocol (HTTP) and the Transmission Control Protocol (TCP) are the most popular protocols used in the development of web-based applications. Despite their popularity, the use of these protocols brings two limitations to applications and systems that require reliable interactive real-time communications: 1) HTTP forces applications to work in a request-response paradigm, even if a reply is not necessary, not allowing the server to send anything to a client without the client explicitly requesting it; 2) TCP provides no recovery options for network outages, thus forcing developers to write their own error-prone, complex, and ad hoc solutions. In this paper we introduce a solution that offers both bi-directional and reliable communication to web-based applications, even in presence of connection failures. To make this possible, we combine the idea behind WebSockets and a Session-Based Fault-Tolerant design pattern.
{"title":"Fault-Tolerant bi-directional communications in web-based applications","authors":"N. Ivaki, Filipe Araújo","doi":"10.1109/PADSW.2014.7097891","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097891","url":null,"abstract":"The Hypertext Transfer Protocol (HTTP) and the Transmission Control Protocol (TCP) are the most popular protocols used in the development of web-based applications. Despite their popularity, the use of these protocols brings two limitations to applications and systems that require reliable interactive real-time communications: 1) HTTP forces applications to work in a request-response paradigm, even if a reply is not necessary, not allowing the server to send anything to a client without the client explicitly requesting it; 2) TCP provides no recovery options for network outages, thus forcing developers to write their own error-prone, complex, and ad hoc solutions. In this paper we introduce a solution that offers both bi-directional and reliable communication to web-based applications, even in presence of connection failures. To make this possible, we combine the idea behind WebSockets and a Session-Based Fault-Tolerant design pattern.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114175613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097823
N. Thepvilojanapong, H. Saito, K. Murase, Tsubasa Ito, Ryo Kanaoka, T. Leppänen, J. Riekki, Y. Tobe
In this paper, we propose a novel vibration-based communication system for Bluetooth-equipped smartphones. Smartphones are commonly utilized through the display; however, sometimes users would like to exchange pieces of information with nearby peers, without shifting their focus from current task. Additionally, this enables communication in cases without visual or sound contact. For this purpose, we developed a novel application called Hand-to-Hand on Bluetooth Communication (H2BCom). We use the common gestures of tapping and touching the smartphone display for sending a Morse coded message over a Bluetooth channel. On the receiving side, the Morse coded message is presented as vibration of the smartphone. We show the design, implementation and evaluation of H2BCom on Android phones. In our user study, we found that using the Morse code was difficult for beginners, but the users'skill improved after taking a tutorial of the Morse code.
{"title":"Hand-to-Hand instant message communication: Revisiting Morse code","authors":"N. Thepvilojanapong, H. Saito, K. Murase, Tsubasa Ito, Ryo Kanaoka, T. Leppänen, J. Riekki, Y. Tobe","doi":"10.1109/PADSW.2014.7097823","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097823","url":null,"abstract":"In this paper, we propose a novel vibration-based communication system for Bluetooth-equipped smartphones. Smartphones are commonly utilized through the display; however, sometimes users would like to exchange pieces of information with nearby peers, without shifting their focus from current task. Additionally, this enables communication in cases without visual or sound contact. For this purpose, we developed a novel application called Hand-to-Hand on Bluetooth Communication (H2BCom). We use the common gestures of tapping and touching the smartphone display for sending a Morse coded message over a Bluetooth channel. On the receiving side, the Morse coded message is presented as vibration of the smartphone. We show the design, implementation and evaluation of H2BCom on Android phones. In our user study, we found that using the Morse code was difficult for beginners, but the users'skill improved after taking a tutorial of the Morse code.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127026331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097841
Ayman Abbas, Rian Voss, Lars Wienbrandt, M. Schimmler
A weakness of many security systems is the strength of the chosen password or key derivation function. We show how FPGA technology can be used to effectively attack cryptographic applications with a password dictionary. We have implemented two independent PBKDF2 cores each using four HMAC cores with pipelines calculating a RIPEMD-160 hash to derive encryption keys together with one resource optimized AES-256 XTS core for direct decryption on a Xilinx Spartan6-LX150 FPGA. Our design targets TRUECRYPT containers, but may be applied to similar encryption tools with little adaption. In order to save resources and maximize speed, we have further optimized the RIPEMD-160 hash function for this purpose. Our design executed on the multi-FPGA system RIVYERA S6-LX150 containing 128 S6-LX150 FPGAs, finally reaches a peak performance of about 245,000 passwords per second.
{"title":"An efficient implementation of PBKDF2 with RIPEMD-160 on multiple FPGAs","authors":"Ayman Abbas, Rian Voss, Lars Wienbrandt, M. Schimmler","doi":"10.1109/PADSW.2014.7097841","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097841","url":null,"abstract":"A weakness of many security systems is the strength of the chosen password or key derivation function. We show how FPGA technology can be used to effectively attack cryptographic applications with a password dictionary. We have implemented two independent PBKDF2 cores each using four HMAC cores with pipelines calculating a RIPEMD-160 hash to derive encryption keys together with one resource optimized AES-256 XTS core for direct decryption on a Xilinx Spartan6-LX150 FPGA. Our design targets TRUECRYPT containers, but may be applied to similar encryption tools with little adaption. In order to save resources and maximize speed, we have further optimized the RIPEMD-160 hash function for this purpose. Our design executed on the multi-FPGA system RIVYERA S6-LX150 containing 128 S6-LX150 FPGAs, finally reaches a peak performance of about 245,000 passwords per second.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"71 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126102261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097856
Hsuan-Te Chiu, J. Chou, V. Vishwanath, S. Byna, Kesheng Wu
Complex indexing techniques are needed to reduce the time of analyzing massive scientific datasets, but generating these indexing data structures can be very time consuming. In this work, we propose a set of strategies to simplify the index file structure and to improve the I/O performance during index construction using FastQuery, which is a parallel indexing and querying system for scientific data. FastQuery has been used to analyze data from various scientific applications, including a trillion plasma particles simulation. To accelerate query process, FastQuery uses FastBit to build indexes, and then stores the indexes into file system through parallel scientific data format libraries, such as HDF5. Although these data format libraries are designed to support more complex multi-dimensional arrays, we observed that it still takes considerable work to map the indexing data structures into arrays, especially on parallel machines. To address this problem, in this paper, we attempt to minimize the I/O time by storing indexes into our self-defined binary data format. By fully controlling the data structure, we can minimize the I/O synchronization overhead and explore more efficient I/O strategy for storing indexes. Our experiments of indexing a trillion particle dataset using 20,000 cores of a supercomputer show that the proposed binary I/O driver can reach 85% of the peak I/O bandwidth on the system, and achieves a speedup of up to 4X in terms of the total execution time comparing to the previous FastQuery implementation with HDF5 I/O driver.
{"title":"Simplifying index file structure to improve I/O performance of parallel indexing","authors":"Hsuan-Te Chiu, J. Chou, V. Vishwanath, S. Byna, Kesheng Wu","doi":"10.1109/PADSW.2014.7097856","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097856","url":null,"abstract":"Complex indexing techniques are needed to reduce the time of analyzing massive scientific datasets, but generating these indexing data structures can be very time consuming. In this work, we propose a set of strategies to simplify the index file structure and to improve the I/O performance during index construction using FastQuery, which is a parallel indexing and querying system for scientific data. FastQuery has been used to analyze data from various scientific applications, including a trillion plasma particles simulation. To accelerate query process, FastQuery uses FastBit to build indexes, and then stores the indexes into file system through parallel scientific data format libraries, such as HDF5. Although these data format libraries are designed to support more complex multi-dimensional arrays, we observed that it still takes considerable work to map the indexing data structures into arrays, especially on parallel machines. To address this problem, in this paper, we attempt to minimize the I/O time by storing indexes into our self-defined binary data format. By fully controlling the data structure, we can minimize the I/O synchronization overhead and explore more efficient I/O strategy for storing indexes. Our experiments of indexing a trillion particle dataset using 20,000 cores of a supercomputer show that the proposed binary I/O driver can reach 85% of the peak I/O bandwidth on the system, and achieves a speedup of up to 4X in terms of the total execution time comparing to the previous FastQuery implementation with HDF5 I/O driver.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123487711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097900
Changwoo Yoon, K. Song
A system for providing a distributed device resource-object-connection service based on a service delivery platform (SDP) is described. The system includes an SDP and proxy. The SDP configures to define distributed service functions as enablers, generates a convergence service by combining the enablers, and provides the generated convergence service. The proxy configures to connect a distributed device and an SDP to allow the SDP to use the distributed device as a resource, and define and use the distributed device as an enabler. The system are capable of defining distributed service functions as well as distributed sensors as enablers, and thereby allowing the distributed sensors to be used in the same sense as service-function enablers.
{"title":"Distributed sensor device resource-object connection based on service delivery platform","authors":"Changwoo Yoon, K. Song","doi":"10.1109/PADSW.2014.7097900","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097900","url":null,"abstract":"A system for providing a distributed device resource-object-connection service based on a service delivery platform (SDP) is described. The system includes an SDP and proxy. The SDP configures to define distributed service functions as enablers, generates a convergence service by combining the enablers, and provides the generated convergence service. The proxy configures to connect a distributed device and an SDP to allow the SDP to use the distributed device as a resource, and define and use the distributed device as an enabler. The system are capable of defining distributed service functions as well as distributed sensors as enablers, and thereby allowing the distributed sensors to be used in the same sense as service-function enablers.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122447900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}