In this paper, we conduct a comprehensive study of SMS spam in a large cellular network in the US. Using one year of user reported spam messages to the network carrier, we devise text clustering techniques to group associated spam messages in order to identify SMS spam campaigns and spam activities. Our analysis shows that spam campaigns can last for months and have a wide impact on the cellular network. Combining with SMS network records collected during the same time, we find that spam numbers within the same activity often exhibit strong similarity in terms of their sending patterns, tenure and geolocations. Our analysis sheds light on the intentions and strategies of SMS spammers and provides unique insights in developing better method for detecting SMS spam.
{"title":"Understanding SMS spam in a large cellular network","authors":"Nan Jiang, Yu Jin, Ann Skudlark, Zhi-Li Zhang","doi":"10.1145/2465529.2465530","DOIUrl":"https://doi.org/10.1145/2465529.2465530","url":null,"abstract":"In this paper, we conduct a comprehensive study of SMS spam in a large cellular network in the US. Using one year of user reported spam messages to the network carrier, we devise text clustering techniques to group associated spam messages in order to identify SMS spam campaigns and spam activities. Our analysis shows that spam campaigns can last for months and have a wide impact on the cellular network. Combining with SMS network records collected during the same time, we find that spam numbers within the same activity often exhibit strong similarity in terms of their sending patterns, tenure and geolocations. Our analysis sheds light on the intentions and strategies of SMS spammers and provides unique insights in developing better method for detecting SMS spam.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129283228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenhua Liu, A. Wierman, Yuan Chen, Benjamin Razon, Niangjun Chen
Demand response is a crucial aspect of the future smart grid. It has the potential to provide significant peak demand reduction and to ease the incorporation of renewable energy into the grid. Data centers' participation in demand response is becoming increasingly important given the high and increasing energy consumption and the flexibility in demand management in data centers compared to conventional industrial facilities. In this extended abstract we briefly describe recent work in our full paper on two demand response schemes to reduce a data center's peak loads and energy expenditure: workload shifting and the use of local power generations. In our full paper, we conduct a detailed characterization study of coincident peak data over two decades from Fort Collins Utilities, Colorado and then develop two algorithms for data centers by combining workload scheduling and local power generation to avoid the coincident peak and reduce the energy expenditure. The first algorithm optimizes the expected cost and the second one provides a good worst-case guarantee for any coincident peak pattern. We evaluate these algorithms via numerical simulations based on real world traces from production systems. The results show that using workload shifting in combination with local generation can provide significant cost savings (up to 40% in the Fort Collins Utilities' case) compared to either alone.
{"title":"Data center demand response: avoiding the coincident peak via workload shifting and local generation","authors":"Zhenhua Liu, A. Wierman, Yuan Chen, Benjamin Razon, Niangjun Chen","doi":"10.1145/2465529.2465740","DOIUrl":"https://doi.org/10.1145/2465529.2465740","url":null,"abstract":"Demand response is a crucial aspect of the future smart grid. It has the potential to provide significant peak demand reduction and to ease the incorporation of renewable energy into the grid. Data centers' participation in demand response is becoming increasingly important given the high and increasing energy consumption and the flexibility in demand management in data centers compared to conventional industrial facilities. In this extended abstract we briefly describe recent work in our full paper on two demand response schemes to reduce a data center's peak loads and energy expenditure: workload shifting and the use of local power generations. In our full paper, we conduct a detailed characterization study of coincident peak data over two decades from Fort Collins Utilities, Colorado and then develop two algorithms for data centers by combining workload scheduling and local power generation to avoid the coincident peak and reduce the energy expenditure. The first algorithm optimizes the expected cost and the second one provides a good worst-case guarantee for any coincident peak pattern. We evaluate these algorithms via numerical simulations based on real world traces from production systems. The results show that using workload shifting in combination with local generation can provide significant cost savings (up to 40% in the Fort Collins Utilities' case) compared to either alone.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128820940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Wang, Dongzhe Tai, Ting Zhang, Jianyuan Lu, Boyang Xu, Huichen Dai, B. Liu
Different from the IP-based routers, Named Data Networking routers forward packets by content names, which consist of characters and have variable and unbounded length. This kind of complex name constitution plus the huge-sized name routing table makes wire speed name lookup an extremely challenging task. Greedy name lookup mechanism is proposed to speed up name lookup by dynamically adjusting the search path against the changes of the prefix table. Meanwhile, we elaborate a string-oriented perfect hash table to reduce memory consumption which stores the signature of the key in the entry instead of the key itself. Extensive experimental results on a commodity PC server with 3 million name prefix entries demonstrate that greedy name lookup mechanism achieves 57.14 million searches per second using only 72.95 MB memory.
{"title":"Greedy name lookup for named data networking","authors":"Yi Wang, Dongzhe Tai, Ting Zhang, Jianyuan Lu, Boyang Xu, Huichen Dai, B. Liu","doi":"10.1145/2465529.2465741","DOIUrl":"https://doi.org/10.1145/2465529.2465741","url":null,"abstract":"Different from the IP-based routers, Named Data Networking routers forward packets by content names, which consist of characters and have variable and unbounded length. This kind of complex name constitution plus the huge-sized name routing table makes wire speed name lookup an extremely challenging task. Greedy name lookup mechanism is proposed to speed up name lookup by dynamically adjusting the search path against the changes of the prefix table. Meanwhile, we elaborate a string-oriented perfect hash table to reduce memory consumption which stores the signature of the key in the entry instead of the key itself. Extensive experimental results on a commodity PC server with 3 million name prefix entries demonstrate that greedy name lookup mechanism achieves 57.14 million searches per second using only 72.95 MB memory.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"891 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we explore techniques that can automatically classify malware variants into their corresponding families. Our framework extracts structural information from malware programs as attributed function call graphs, further learns discriminant malware distance metrics, finally adopts an ensemble of classifiers for automated malware classification. Experimental results show that our method is able to achieve high classification accuracy.
{"title":"Discriminant malware distance learning on structuralinformation for automated malware classification","authors":"Deguang Kong, Guanhua Yan","doi":"10.1145/2465529.2465531","DOIUrl":"https://doi.org/10.1145/2465529.2465531","url":null,"abstract":"In this work, we explore techniques that can automatically classify malware variants into their corresponding families. Our framework extracts structural information from malware programs as attributed function call graphs, further learns discriminant malware distance metrics, finally adopts an ensemble of classifiers for automated malware classification. Experimental results show that our method is able to achieve high classification accuracy.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125157767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data center applications increasingly require a *geo-replicated* storage system, that is, a storage system replicated across many geographic locations. Geo-replication can reduce access latency, improve availability, and provide disaster tolerance. It turns out there are many techniques for geo-replication with different trade-offs. In this tutorial, we give an overview of these techniques, organized according to two orthogonal dimensions: level of synchrony (synchronous and asynchronous) and type of storage service (read-write, state machine, transaction). We explain the basic idea of these techniques, together with their applicability and trade-offs.
{"title":"Tutorial on geo-replication in data center applications","authors":"M. Aguilera","doi":"10.1145/2494232.2465768","DOIUrl":"https://doi.org/10.1145/2494232.2465768","url":null,"abstract":"Data center applications increasingly require a *geo-replicated* storage system, that is, a storage system replicated across many geographic locations. Geo-replication can reduce access latency, improve availability, and provide disaster tolerance. It turns out there are many techniques for geo-replication with different trade-offs. In this tutorial, we give an overview of these techniques, organized according to two orthogonal dimensions: level of synchrony (synchronous and asynchronous) and type of storage service (read-write, state machine, transaction). We explain the basic idea of these techniques, together with their applicability and trade-offs.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123570389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data centers are fascinating places, where the massive scale required to deliver on-line services like web search and cloud hosting turns minor issues into major challenges that must be addressed in the design of the physical infrastructure and the software platform. In this talk, I'll briefly overview the kinds of applications that run in mega-data centers and the workloads they place on the infrastructure. I'll then describe a number of challenges seen in Microsoft's data centers, with the goals of posing questions more than describing solutions and explaining how economic factors, technology issues, and software design interact when creating low-latency, low-cost, high availability services.
{"title":"Challenges in cloud scale data centers","authors":"D. Maltz","doi":"10.1145/2465529.2465767","DOIUrl":"https://doi.org/10.1145/2465529.2465767","url":null,"abstract":"Data centers are fascinating places, where the massive scale required to deliver on-line services like web search and cloud hosting turns minor issues into major challenges that must be addressed in the design of the physical infrastructure and the software platform. In this talk, I'll briefly overview the kinds of applications that run in mega-data centers and the workloads they place on the infrastructure. I'll then describe a number of challenges seen in Microsoft's data centers, with the goals of posing questions more than describing solutions and explaining how economic factors, technology issues, and software design interact when creating low-latency, low-cost, high availability services.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122600375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Synchronously logging updates to persistent storage first and then asynchronously committing these updates to their rightful storage locations is a well-known and heavily used technique to improve the sustained throughput of write-intensive disk-based data processing systems, whose latency and throughput accordingly are largely determined by the latency and throughput of the underlying logging mechanism. The conventional wisdom is that logging operations are relatively straightforward to optimize because the associated disk access pattern is largely sequential. However, it turns out that to achieve both high throughput and low latency for fine-grained logging operations, whose payload size is smaller than a disk sector, is extremely challenging. This paper describes the experiences and lessons we have gained from building a disk logging system that can successfully deliver over 1.2 million 256-byte logging operations per second, with the average logging latency below 1 msec.
{"title":"High-throughput low-latency fine-grained disk logging","authors":"D. Simha, T. Chiueh, G. Rajagopalan, P. Bose","doi":"10.1145/2465529.2465552","DOIUrl":"https://doi.org/10.1145/2465529.2465552","url":null,"abstract":"Synchronously logging updates to persistent storage first and then asynchronously committing these updates to their rightful storage locations is a well-known and heavily used technique to improve the sustained throughput of write-intensive disk-based data processing systems, whose latency and throughput accordingly are largely determined by the latency and throughput of the underlying logging mechanism. The conventional wisdom is that logging operations are relatively straightforward to optimize because the associated disk access pattern is largely sequential. However, it turns out that to achieve both high throughput and low latency for fine-grained logging operations, whose payload size is smaller than a disk sector, is extremely challenging. This paper describes the experiences and lessons we have gained from building a disk logging system that can successfully deliver over 1.2 million 256-byte logging operations per second, with the average logging latency below 1 msec.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131909926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How does energy accounting matter for energy management?","authors":"Mian Dong, Tian Lan, Lin Zhong","doi":"10.1145/2465529.2465742","DOIUrl":"https://doi.org/10.1145/2465529.2465742","url":null,"abstract":"","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134273938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many of the challenges faced by the modern world, from overcrowded transportation systems to overstretched healthcare systems, large benefits for society come about from small changes by very many individuals. We survey the problems and the cost they impose on society, and describe a framework for designing "nudge engines"---algorithms, incentives and technology for influencing human behavior. We present a model for analyzing their effectiveness and results from transportation pilots conducted in Bangalore, at Stanford and in Singapore, and a wellness program for the employees of Accenture-USA.
{"title":"Designing large-scale nudge engines","authors":"B. Prabhakar","doi":"10.1145/2465529.2465766","DOIUrl":"https://doi.org/10.1145/2465529.2465766","url":null,"abstract":"In many of the challenges faced by the modern world, from overcrowded transportation systems to overstretched healthcare systems, large benefits for society come about from small changes by very many individuals. We survey the problems and the cost they impose on society, and describe a framework for designing \"nudge engines\"---algorithms, incentives and technology for influencing human behavior. We present a model for analyzing their effectiveness and results from transportation pilots conducted in Bangalore, at Stanford and in Singapore, and a wellness program for the employees of Accenture-USA.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125700843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on cascading line failures in the transmission system of the power grid. Such a cascade may have a devastating effect not only on the power grid but also on the interconnected communication networks. Recent large-scale power outages demonstrated the limitations of epidemic- and percolation-based tools in modeling the cascade evolution. Hence, based on a linearized power flow model (that substantially differs from the classical packet flow models), we obtain results regarding the various properties of a cascade. Specifically, we consider performance metrics such as the the distance between failures, the length of the cascade, and the fraction of demand (load) satisfied after the cascade. We show, for example, that due to the unique properties of the model: (i) the distance between subsequent failures can be arbitrarily large and the cascade may be arbitrarily long, (ii) a large set of initial line failures may have a smaller effect than a failure of one of the lines in the set, and (iii) minor changes to the network parameters may have a significant impact. Moreover, we show that finding the set of lines whose removal has the most significant impact (under various metrics) is NP-Hard. Moreover, we develop a fast algorithm to recompute the flows at each step of the cascade. The results can provide insight into the design of smart grid measurement and control algorithms that can mitigate a cascade.
{"title":"Computational analysis of cascading failures in power networks","authors":"Dorian Mazauric, Saleh Soltan, G. Zussman","doi":"10.1145/2465529.2465752","DOIUrl":"https://doi.org/10.1145/2465529.2465752","url":null,"abstract":"This paper focuses on cascading line failures in the transmission system of the power grid. Such a cascade may have a devastating effect not only on the power grid but also on the interconnected communication networks. Recent large-scale power outages demonstrated the limitations of epidemic- and percolation-based tools in modeling the cascade evolution. Hence, based on a linearized power flow model (that substantially differs from the classical packet flow models), we obtain results regarding the various properties of a cascade. Specifically, we consider performance metrics such as the the distance between failures, the length of the cascade, and the fraction of demand (load) satisfied after the cascade. We show, for example, that due to the unique properties of the model: (i) the distance between subsequent failures can be arbitrarily large and the cascade may be arbitrarily long, (ii) a large set of initial line failures may have a smaller effect than a failure of one of the lines in the set, and (iii) minor changes to the network parameters may have a significant impact. Moreover, we show that finding the set of lines whose removal has the most significant impact (under various metrics) is NP-Hard. Moreover, we develop a fast algorithm to recompute the flows at each step of the cascade. The results can provide insight into the design of smart grid measurement and control algorithms that can mitigate a cascade.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125440656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}