Traffic matrices are used in many network engineering tasks, for instance optimal network design. Unfortunately, measurements of these matrices are error-prone, a problem that is exacerbated when they are extrapolated to provide the predictions used in planning. Practical network design and management should consider sensitivity to such errors, but although robust optimisation techniques exist, it seems they are rarely used, at least in part because of the difficulty in generating an ensemble of admissible traffic matrices with a controllable error level. We address this problem in our paper by presenting a fast and flexible technique of generating synthetic traffic matrices. We demonstrate the utility of the method by presenting a methodology for robust network design based on adaptation of the mean-risk analysis concept from finance.
{"title":"Network-design sensitivity analysis","authors":"Paul Tune, M. Roughan","doi":"10.1145/2591971.2591979","DOIUrl":"https://doi.org/10.1145/2591971.2591979","url":null,"abstract":"Traffic matrices are used in many network engineering tasks, for instance optimal network design. Unfortunately, measurements of these matrices are error-prone, a problem that is exacerbated when they are extrapolated to provide the predictions used in planning. Practical network design and management should consider sensitivity to such errors, but although robust optimisation techniques exist, it seems they are rarely used, at least in part because of the difficulty in generating an ensemble of admissible traffic matrices with a controllable error level. We address this problem in our paper by presenting a fast and flexible technique of generating synthetic traffic matrices. We demonstrate the utility of the method by presenting a methodology for robust network design based on adaptation of the mean-risk analysis concept from finance.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127462766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Solid-State Drives (SSDs) with tens of flash chips and highly parallel architecture, we can speed up I/O operations by well-utilizing resources during page allocation. Proposals already exist for using static page allocation which does not balance the IO load and its efficiency depends on access address patterns. To our best knowledge, there have been no research thus far to show what happens if one or more internal resources can be freely allocated regardless of the request address. This paper explores the possibility of using different degrees of dynamism in page allocation and identifies key design opportunities that they present to improve SSD's characteristics.
{"title":"Unleashing the potentials of dynamism for page allocation strategies in SSDs","authors":"Arash Tavakkol, M. Arjomand, H. Sarbazi-Azad","doi":"10.1145/2591971.2592013","DOIUrl":"https://doi.org/10.1145/2591971.2592013","url":null,"abstract":"In Solid-State Drives (SSDs) with tens of flash chips and highly parallel architecture, we can speed up I/O operations by well-utilizing resources during page allocation. Proposals already exist for using static page allocation which does not balance the IO load and its efficiency depends on access address patterns. To our best knowledge, there have been no research thus far to show what happens if one or more internal resources can be freely allocated regardless of the request address. This paper explores the possibility of using different degrees of dynamism in page allocation and identifies key design opportunities that they present to improve SSD's characteristics.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"18 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131093939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A variety of models have been proposed and analyzed to understand how a new innovation (e.g., a technology, a product, or even a behavior) diffuses over a social network, broadly classified into either of epidemic-based or game-based ones. In this paper, we consider a game-based model, where each individual makes a selfish, rational choice in terms of its payoff in adopting the new innovation, but with some noise. We study how diffusion effect can be maximized by seeding a subset of individuals (within a given budget), i.e., convincing them to pre-adopt a new innovation. In particular, we aim at finding `good' seeds for minimizing the time to infect all others, i.e., diffusion speed maximization. To this end, we design polynomial-time approximation algorithms for three representative classes, Erdőos-Réenyi, planted partition and geometrically structured graph models, which correspond to globally well-connected, locally well-connected with large clusters and locally well-connected with small clusters, respectively, provide their performance guarantee in terms of approximation and complexity. First, for the dense Erdős-Rényi and planted partition graphs, we show that an arbitrary seeding and a simple seeding proportional to the size of clusters are almost optimal with high probability. Second, for geometrically structured sparse graphs, including planar and d-dimensional graphs, our algorithm that (a) constructs clusters, (b) seeds the border individuals among clusters, and (c) greedily seeds inside each cluster always outputs an almost optimal solution. We validate our theoretical findings with extensive simulations under a real social graph. We believe that our results provide new practical insights on how to seed over a social network depending on its connection structure, where individuals rationally adopt a new innovation. To our best knowledge, we are the first to study such diffusion speed maximization on the game-based diffusion, while the extensive research efforts have been made in epidemic-based models, often referred to as influence maximization.
{"title":"On maximizing diffusion speed in social networks: impact of random seeding and clustering","authors":"Jungseul Ok, Youngmi Jin, Jinwoo Shin, Yung Yi","doi":"10.1145/2591971.2591991","DOIUrl":"https://doi.org/10.1145/2591971.2591991","url":null,"abstract":"A variety of models have been proposed and analyzed to understand how a new innovation (e.g., a technology, a product, or even a behavior) diffuses over a social network, broadly classified into either of epidemic-based or game-based ones. In this paper, we consider a game-based model, where each individual makes a selfish, rational choice in terms of its payoff in adopting the new innovation, but with some noise. We study how diffusion effect can be maximized by seeding a subset of individuals (within a given budget), i.e., convincing them to pre-adopt a new innovation. In particular, we aim at finding `good' seeds for minimizing the time to infect all others, i.e., diffusion speed maximization. To this end, we design polynomial-time approximation algorithms for three representative classes, Erdőos-Réenyi, planted partition and geometrically structured graph models, which correspond to globally well-connected, locally well-connected with large clusters and locally well-connected with small clusters, respectively, provide their performance guarantee in terms of approximation and complexity. First, for the dense Erdős-Rényi and planted partition graphs, we show that an arbitrary seeding and a simple seeding proportional to the size of clusters are almost optimal with high probability. Second, for geometrically structured sparse graphs, including planar and d-dimensional graphs, our algorithm that (a) constructs clusters, (b) seeds the border individuals among clusters, and (c) greedily seeds inside each cluster always outputs an almost optimal solution. We validate our theoretical findings with extensive simulations under a real social graph. We believe that our results provide new practical insights on how to seed over a social network depending on its connection structure, where individuals rationally adopt a new innovation. To our best knowledge, we are the first to study such diffusion speed maximization on the game-based diffusion, while the extensive research efforts have been made in epidemic-based models, often referred to as influence maximization.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132040211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a novel trajectory-based localization scheme which (i) identifies a user's current trajectory based on the measurements collected while the user is moving, by finding the best match among the training traces (trajectory matching) and then (ii) localizes the user on the trajectory (localization). The core requirement of both the steps is an accurate and robust algorithm to match two time-series that may contain significant noise and perturbation due to differences in mobility, devices, and environments. To achieve this, we develop an enhanced Dynamic Time Warping (DTW) alignment, and apply it to RSS, channel state information, or magnetic field measurements collected from a trajectory. We use indoor and outdoor experiments to demonstrate its effectiveness.
{"title":"Unified localization framework using trajectory signatures","authors":"S. Rallapalli, Wei Dong, L. Qiu, Yin Zhang","doi":"10.1145/2591971.2592027","DOIUrl":"https://doi.org/10.1145/2591971.2592027","url":null,"abstract":"We develop a novel trajectory-based localization scheme which (i) identifies a user's current trajectory based on the measurements collected while the user is moving, by finding the best match among the training traces (trajectory matching) and then (ii) localizes the user on the trajectory (localization). The core requirement of both the steps is an accurate and robust algorithm to match two time-series that may contain significant noise and perturbation due to differences in mobility, devices, and environments. To achieve this, we develop an enhanced Dynamic Time Warping (DTW) alignment, and apply it to RSS, channel state information, or magnetic field measurements collected from a trajectory. We use indoor and outdoor experiments to demonstrate its effectiveness.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123097393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Kim, J. Rhee, Hui Zhang, Nipun Arora, Guofei Jiang, X. Zhang, Dongyan Xu
Performance bugs are frequently observed in commodity software. While profilers or source code-based tools can be used at development stage where a program is diagnosed in a well-defined environment, many performance bugs survive such a stage and affect production runs. OS kernel-level tracers are commonly used in post-development diagnosis due to their independence from programs and libraries; however, they lack detailed program-specific metrics to reason about performance problems such as function latencies and program contexts. In this paper, we propose a novel performance inference system, called IntroPerf, that generates fine-grained performance information -- like that from application profiling tools -- transparently by leveraging OS tracers that are widely available in most commodity operating systems. With system stack traces as input, IntroPerf enables transparent context-sensitive performance inference, and diagnoses application performance in a multi-layered scope ranging from user functions to the kernel. Evaluated with various performance bugs in multiple open source software projects, IntroPerf automatically ranks potential internal and external root causes of performance bugs with high accuracy without any prior knowledge about or instrumentation on the subject software. Our results show IntroPerf's effectiveness as a lightweight performance introspection tool for post-development diagnosis.
{"title":"IntroPerf: transparent context-sensitive multi-layer performance inference using system stack traces","authors":"C. Kim, J. Rhee, Hui Zhang, Nipun Arora, Guofei Jiang, X. Zhang, Dongyan Xu","doi":"10.1145/2591971.2592008","DOIUrl":"https://doi.org/10.1145/2591971.2592008","url":null,"abstract":"Performance bugs are frequently observed in commodity software. While profilers or source code-based tools can be used at development stage where a program is diagnosed in a well-defined environment, many performance bugs survive such a stage and affect production runs. OS kernel-level tracers are commonly used in post-development diagnosis due to their independence from programs and libraries; however, they lack detailed program-specific metrics to reason about performance problems such as function latencies and program contexts. In this paper, we propose a novel performance inference system, called IntroPerf, that generates fine-grained performance information -- like that from application profiling tools -- transparently by leveraging OS tracers that are widely available in most commodity operating systems. With system stack traces as input, IntroPerf enables transparent context-sensitive performance inference, and diagnoses application performance in a multi-layered scope ranging from user functions to the kernel. Evaluated with various performance bugs in multiple open source software projects, IntroPerf automatically ranks potential internal and external root causes of performance bugs with high accuracy without any prior knowledge about or instrumentation on the subject software. Our results show IntroPerf's effectiveness as a lightweight performance introspection tool for post-development diagnosis.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126300751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We investigate the DHCP churn impact on network characterization by analyzing 18 months of DHCP, DNS, Firewall Alert, and Netflow data collected from an enterprise network of 30,000 clients. We find that DHCP churn has clear impact on network metrics.
{"title":"Impact of DHCP churn on network characterization","authors":"Long H. Vu, D. Turaga, S. Parthasarathy","doi":"10.1145/2591971.2592034","DOIUrl":"https://doi.org/10.1145/2591971.2592034","url":null,"abstract":"We investigate the DHCP churn impact on network characterization by analyzing 18 months of DHCP, DNS, Firewall Alert, and Netflow data collected from an enterprise network of 30,000 clients. We find that DHCP churn has clear impact on network metrics.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126091306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advent of multi-core architectures has brought concurrent programming to the forefront of software development. In this context, Transactional Memory (TM) has gained increasing popularity as a simpler, attractive alternative to traditional lock-based synchronization. The recent integration of Hardware TM (HTM) in the last generation of Intel commodity processors turned TM into a mainstream technology, raising a number of questions on its future and that of concurrent programming. To evaluate the potential impact of Intel's HTM, we conducted the largest study on TM to date, comparing different locking techniques, hardware and software TMs, as well as different combinations of these mechanisms, from the dual perspective of performance and power consumption. As a result we perform a workload characterization, to help programmers better exploit the currently available TM facilities, and identify important research directions.
{"title":"On the energy and performance of commodity hardware transactional memory","authors":"Nuno Diegues, P. Romano, L. Rodrigues","doi":"10.1145/2591971.2592030","DOIUrl":"https://doi.org/10.1145/2591971.2592030","url":null,"abstract":"The advent of multi-core architectures has brought concurrent programming to the forefront of software development. In this context, Transactional Memory (TM) has gained increasing popularity as a simpler, attractive alternative to traditional lock-based synchronization. The recent integration of Hardware TM (HTM) in the last generation of Intel commodity processors turned TM into a mainstream technology, raising a number of questions on its future and that of concurrent programming.\u0000 To evaluate the potential impact of Intel's HTM, we conducted the largest study on TM to date, comparing different locking techniques, hardware and software TMs, as well as different combinations of these mechanisms, from the dual perspective of performance and power consumption. As a result we perform a workload characterization, to help programmers better exploit the currently available TM facilities, and identify important research directions.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132834850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of a single rumor source detection with multiple observations, from a statistical point of view of a spreading over a network, based on the susceptible-infectious model. For tree networks, multiple sequential observations for one single instance of rumor spreading cannot improve over the initial snapshot observation. The situation dramatically improves for multiple independent observations. We propose a unified inference framework based on the union rumor centrality, and provide explicit detection performance for degree-regular tree networks. Surprisingly, even with merely two observations, the detection probability at least doubles that of a single observation, and further approaches one, i.e., reliable detection, with increasing degree. This indicates that a richer diversity enhances detectability. For general graphs, a detection algorithm using a breadth-first search strategy is also proposed and evaluated. Besides rumor source detection, our results can be used in network forensics to combat recurring epidemic-like information spreading such as online anomaly and fraudulent email spams.
{"title":"Rumor source detection with multiple observations: fundamental limits and algorithms","authors":"Zhaoxu Wang, Wenxiang Dong, Wenyi Zhang, C. Tan","doi":"10.1145/2591971.2591993","DOIUrl":"https://doi.org/10.1145/2591971.2591993","url":null,"abstract":"This paper addresses the problem of a single rumor source detection with multiple observations, from a statistical point of view of a spreading over a network, based on the susceptible-infectious model. For tree networks, multiple sequential observations for one single instance of rumor spreading cannot improve over the initial snapshot observation. The situation dramatically improves for multiple independent observations. We propose a unified inference framework based on the union rumor centrality, and provide explicit detection performance for degree-regular tree networks. Surprisingly, even with merely two observations, the detection probability at least doubles that of a single observation, and further approaches one, i.e., reliable detection, with increasing degree. This indicates that a richer diversity enhances detectability. For general graphs, a detection algorithm using a breadth-first search strategy is also proposed and evaluated. Besides rumor source detection, our results can be used in network forensics to combat recurring epidemic-like information spreading such as online anomaly and fraudulent email spams.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131911231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiwei Huang, Sen Yang, Ashwin Lall, J. Romberg, Jun Xu, Chuang Lin
Error estimating codes (EEC) have recently been proposed for measuring the bit error rate (BER) in packets transmitted over wireless links. They however can provide such measurements only when there are no insertion and deletion errors, which could occur in various wireless network environments. In this work, we propose ``idEEC'', the first technique that can do so even in the presence of insertion and deletion errors. We show that idEEC is provable robust under most bit insertion and deletion scenarios, provided insertion/deletion errors occur with much lower probability than bit flipping errors. Our idEEC design can build upon any existing EEC scheme. The basic idea of the idEEC encoding is to divide the packet into a number of segments, each of which is encoded using the underlying EEC scheme. The basic idea of the idEEC decoding is to divide the packet into a few slices in a randomized manner -- each of which may contain several segments -- and then try to identify a slice that has no insertion and deletion errors in it (called a ``clean slice''). Once such a clean slice is found, it is removed from the packet for later processing, and this ``randomized divide and search'' procedure will be iteratively performed on the rest of the packet until no more clean slices can be found. The BER will then be estimated from all the clean slices discovered through all the iterations. A careful analysis of the accuracy guarantees of the idEEC decoding is provided, and the efficacy of idEEC is further validated by simulation experiments.
{"title":"Error estimating codes for insertion and deletion channels","authors":"Jiwei Huang, Sen Yang, Ashwin Lall, J. Romberg, Jun Xu, Chuang Lin","doi":"10.1145/2591971.2591976","DOIUrl":"https://doi.org/10.1145/2591971.2591976","url":null,"abstract":"Error estimating codes (EEC) have recently been proposed for measuring the bit error rate (BER) in packets transmitted over wireless links. They however can provide such measurements only when there are no insertion and deletion errors, which could occur in various wireless network environments. In this work, we propose ``idEEC'', the first technique that can do so even in the presence of insertion and deletion errors. We show that idEEC is provable robust under most bit insertion and deletion scenarios, provided insertion/deletion errors occur with much lower probability than bit flipping errors. Our idEEC design can build upon any existing EEC scheme. The basic idea of the idEEC encoding is to divide the packet into a number of segments, each of which is encoded using the underlying EEC scheme. The basic idea of the idEEC decoding is to divide the packet into a few slices in a randomized manner -- each of which may contain several segments -- and then try to identify a slice that has no insertion and deletion errors in it (called a ``clean slice''). Once such a clean slice is found, it is removed from the packet for later processing, and this ``randomized divide and search'' procedure will be iteratively performed on the rest of the packet until no more clean slices can be found. The BER will then be estimated from all the clean slices discovered through all the iterations. A careful analysis of the accuracy guarantees of the idEEC decoding is provided, and the efficacy of idEEC is further validated by simulation experiments.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129945101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaibo Wang, Xiaoning Ding, Rubao Lee, S. Kato, Xiaodong Zhang
GPGPUs are evolving from dedicated accelerators towards mainstream commodity computing resources. During the transition, the lack of system management of device memory space on GPGPUs has become a major hurdle. In existing GPGPU systems, device memory space is still managed explicitly by individual applications, which not only increases the burden of programmers but can also cause application crashes, hangs, or low performance. In this paper, we present the design and implementation of GDM, a fully functional GPGPU device memory manager to address the above problems and unleash the computing power of GPGPUs in general-purpose environments. To effectively coordinate the device memory usage of different applications, GDM takes control over device memory allocations and data transfers to and from device memory, leveraging a buffer allocated in each application's virtual memory. GDM utilizes the unique features of GPGPU systems and relies on several effective optimization techniques to guarantee the efficient usage of device memory space and to achieve high performance. We have evaluated GDM and compared it against state-of-the-art GPGPU system software on a range of workloads. The results show that GDM can prevent applications from crashes, including those induced by device memory leaks, and improve system performance by up to 43%.
{"title":"GDM: device memory management for gpgpu computing","authors":"Kaibo Wang, Xiaoning Ding, Rubao Lee, S. Kato, Xiaodong Zhang","doi":"10.1145/2591971.2592002","DOIUrl":"https://doi.org/10.1145/2591971.2592002","url":null,"abstract":"GPGPUs are evolving from dedicated accelerators towards mainstream commodity computing resources. During the transition, the lack of system management of device memory space on GPGPUs has become a major hurdle. In existing GPGPU systems, device memory space is still managed explicitly by individual applications, which not only increases the burden of programmers but can also cause application crashes, hangs, or low performance.\u0000 In this paper, we present the design and implementation of GDM, a fully functional GPGPU device memory manager to address the above problems and unleash the computing power of GPGPUs in general-purpose environments. To effectively coordinate the device memory usage of different applications, GDM takes control over device memory allocations and data transfers to and from device memory, leveraging a buffer allocated in each application's virtual memory. GDM utilizes the unique features of GPGPU systems and relies on several effective optimization techniques to guarantee the efficient usage of device memory space and to achieve high performance.\u0000 We have evaluated GDM and compared it against state-of-the-art GPGPU system software on a range of workloads. The results show that GDM can prevent applications from crashes, including those induced by device memory leaks, and improve system performance by up to 43%.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129252868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}