Enno Ruijters, Dennis Guck, M. Noort, M. Stoelinga
Maintenance is an important way to increase system dependability: timely inspections, repairs and renewals can significantly increase a system's reliability, availability and life time. At the same time, maintenance incurs costs and planned downtime. Thus, good maintenance planning has to balance between these factors. In this paper, we study the effect of different maintenance strategies on the electrically insulated railway joint (EI-joint), a critical asset in railroad tracks for train detection, and a relative frequent cause for train disruptions. Together with experts in maintenance engineering, we have modeled the EI-joint as a fault maintenance tree (FMT), i.e. a fault tree augmented with maintenance aspects. We show how complex maintenance concepts, such as condition-based maintenance with periodic inspections, are naturally modeled by FMTs, and how several key performance indicators, such as the system reliability, number of failures, and costs, can be analysed. The faithfulness of quantitative analyses heavily depend on the accuracy of the parameter values in the models. Here, we have been in the unique situation that extensive data could be collected, both from incident registration databases, as well as from interviews with domain experts from several companies. This made that we could construct a model that faithfully predicts the expected number of failures at system level. Our analysis shows that that the current maintenance policy is close to cost-optimal. It is possible to increase joint reliability, e.g. by performing more inspections, but the additional maintenance costs outweigh the reduced cost of failures.
{"title":"Reliability-Centered Maintenance of the Electrically Insulated Railway Joint via Fault Tree Analysis: A Practical Experience Report","authors":"Enno Ruijters, Dennis Guck, M. Noort, M. Stoelinga","doi":"10.1109/DSN.2016.67","DOIUrl":"https://doi.org/10.1109/DSN.2016.67","url":null,"abstract":"Maintenance is an important way to increase system dependability: timely inspections, repairs and renewals can significantly increase a system's reliability, availability and life time. At the same time, maintenance incurs costs and planned downtime. Thus, good maintenance planning has to balance between these factors. In this paper, we study the effect of different maintenance strategies on the electrically insulated railway joint (EI-joint), a critical asset in railroad tracks for train detection, and a relative frequent cause for train disruptions. Together with experts in maintenance engineering, we have modeled the EI-joint as a fault maintenance tree (FMT), i.e. a fault tree augmented with maintenance aspects. We show how complex maintenance concepts, such as condition-based maintenance with periodic inspections, are naturally modeled by FMTs, and how several key performance indicators, such as the system reliability, number of failures, and costs, can be analysed. The faithfulness of quantitative analyses heavily depend on the accuracy of the parameter values in the models. Here, we have been in the unique situation that extensive data could be collected, both from incident registration databases, as well as from interviews with domain experts from several companies. This made that we could construct a model that faithfully predicts the expected number of failures at system level. Our analysis shows that that the current maintenance policy is close to cost-optimal. It is possible to increase joint reliability, e.g. by performing more inspections, but the additional maintenance costs outweigh the reduced cost of failures.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122665475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parity-based RAID poses a design trade-off issue for large-scale SSD storage systems: it improves reliability against SSD failures through redundancy, yet its parity updates incur extra I/Os and garbage collection operations, thereby degrading the endurance and performance of SSDs. We propose EPLOG, a storage layer that reduces parity traffic to SSDs, so as to provide endurance, reliability, and performance guarantees for SSD RAID arrays. EPLOG mitigates parity update overhead via elastic parity logging, which redirects parity traffic to separate log devices (to improve endurance and reliability) and eliminates the need of pre-reading data in parity computations (to improve performance). We design EPLOG as a user-level implementation that is fully compatible with commodity hardware and general erasure coding schemes. We evaluate EPLOG through reliability analysis and trace-driven testbed experiments. Compared to the Linux software RAID implementation, our experimental results show that our EPLOG prototype reduces the total write traffic to SSDs, reduces the number of garbage collection operations, and increases the I/O throughput. In addition, EPLOG significantly improves the I/O performance over the original parity logging design, and incurs low metadata overhead.
{"title":"Elastic Parity Logging for SSD RAID Arrays","authors":"Yongkun Li, H. Chan, P. Lee, Yinlong Xu","doi":"10.1109/DSN.2016.14","DOIUrl":"https://doi.org/10.1109/DSN.2016.14","url":null,"abstract":"Parity-based RAID poses a design trade-off issue for large-scale SSD storage systems: it improves reliability against SSD failures through redundancy, yet its parity updates incur extra I/Os and garbage collection operations, thereby degrading the endurance and performance of SSDs. We propose EPLOG, a storage layer that reduces parity traffic to SSDs, so as to provide endurance, reliability, and performance guarantees for SSD RAID arrays. EPLOG mitigates parity update overhead via elastic parity logging, which redirects parity traffic to separate log devices (to improve endurance and reliability) and eliminates the need of pre-reading data in parity computations (to improve performance). We design EPLOG as a user-level implementation that is fully compatible with commodity hardware and general erasure coding schemes. We evaluate EPLOG through reliability analysis and trace-driven testbed experiments. Compared to the Linux software RAID implementation, our experimental results show that our EPLOG prototype reduces the total write traffic to SSDs, reduces the number of garbage collection operations, and increases the I/O throughput. In addition, EPLOG significantly improves the I/O performance over the original parity logging design, and incurs low metadata overhead.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128843706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Mappouras, Alireza Vahid, A. Calderbank, Daniel J. Sorin
Motivated by embedded systems and datacenters that require long-life components, we extend the lifetime of Flash memory using rewriting codes that allow for multiple writes to a page before it needs to be erased. Although researchers have previously explored rewriting codes for this purpose, we make two significant contributions beyond prior work. First, we remove the assumption of idealized -- and unrealistically optimistic -- Flash cells used in prior work on endurance codes. Unfortunately, current Flash technology has a non-ideal interface, due to its underlying physical design, and does not, for example, allow all seemingly possible increases in a cell's level. We show how to provide the ideal multi-level cell interface, by developing a virtual Flash cell, and we evaluate its impact on existing endurance codes. Our second contribution is our development of novel endurance codes, called Methuselah Flash Codes (MFC), that provide better cost/lifetime trade-offs than previously studied codes.
{"title":"Methuselah Flash: Rewriting Codes for Extra Long Storage Lifetime","authors":"Georgios Mappouras, Alireza Vahid, A. Calderbank, Daniel J. Sorin","doi":"10.1109/DSN.2016.25","DOIUrl":"https://doi.org/10.1109/DSN.2016.25","url":null,"abstract":"Motivated by embedded systems and datacenters that require long-life components, we extend the lifetime of Flash memory using rewriting codes that allow for multiple writes to a page before it needs to be erased. Although researchers have previously explored rewriting codes for this purpose, we make two significant contributions beyond prior work. First, we remove the assumption of idealized -- and unrealistically optimistic -- Flash cells used in prior work on endurance codes. Unfortunately, current Flash technology has a non-ideal interface, due to its underlying physical design, and does not, for example, allow all seemingly possible increases in a cell's level. We show how to provide the ideal multi-level cell interface, by developing a virtual Flash cell, and we evaluate its impact on existing endurance codes. Our second contribution is our development of novel endurance codes, called Methuselah Flash Codes (MFC), that provide better cost/lifetime trade-offs than previously studied codes.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116686618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Alemzadeh, Daniel Chen, Xiao Li, T. Kesavadas, Z. Kalbarczyk, R. Iyer
This paper demonstrates targeted cyber-physical attacks on teleoperated surgical robots. These attacks exploit vulnerabilities in the robot's control system to infer a critical time during surgery to drive injection of malicious control commands to the robot. We show that these attacks can evade the safety checks of the robot, lead to catastrophic consequences in the physical system (e.g., sudden jumps of robotic arms or system's transition to an unwanted halt state), and cause patient injury, robot damage, or system unavailability in the middle of a surgery. We present a model-based analysis framework that can estimate the consequences of control commands through real-time computation of robot's dynamics. Our experiments on the RAVEN II robot demonstrate that this framework can detect and mitigate the malicious commands before they manifest in the physical system with an average accuracy of 90%.
{"title":"Targeted Attacks on Teleoperated Surgical Robots: Dynamic Model-Based Detection and Mitigation","authors":"H. Alemzadeh, Daniel Chen, Xiao Li, T. Kesavadas, Z. Kalbarczyk, R. Iyer","doi":"10.1109/DSN.2016.43","DOIUrl":"https://doi.org/10.1109/DSN.2016.43","url":null,"abstract":"This paper demonstrates targeted cyber-physical attacks on teleoperated surgical robots. These attacks exploit vulnerabilities in the robot's control system to infer a critical time during surgery to drive injection of malicious control commands to the robot. We show that these attacks can evade the safety checks of the robot, lead to catastrophic consequences in the physical system (e.g., sudden jumps of robotic arms or system's transition to an unwanted halt state), and cause patient injury, robot damage, or system unavailability in the middle of a surgery. We present a model-based analysis framework that can estimate the consequences of control commands through real-time computation of robot's dynamics. Our experiments on the RAVEN II robot demonstrate that this framework can detect and mitigate the malicious commands before they manifest in the physical system with an average accuracy of 90%.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
System-level detection and mitigation of DRAM failures offer a variety of system enhancements, such as better reliability, scalability, energy, and performance. Unfortunately, system-level detection is challenging for DRAM failures that depend on the data content of neighboring cells (data-dependent failures). DRAM vendors internally scramble/remap the system-level address space. Therefore, testing data-dependent failures using neighboring system-level addresses does not actually test the cells that are physically adjacent. In this work, we argue that one promising way to uncover data-dependent failures in the system is to determine the location of physically neighboring cells in the system address space. Unfortunately, if done naively, such a test takes 49 days to detect neighboring addresses even in a single memory row, making it infeasible in real systems. We develop PARBOR, an efficient system-level technique that determines the locations of the physically neighboring DRAM cells in the system address space and uses this information to detect data-dependent failures. To our knowledge, this is the first work that solves the challenge of detecting data-dependent failures in DRAM in the presence of DRAM-internal scrambling of system-level addresses. We experimentally demonstrate the effectiveness of PARBOR using 144 real DRAM chips from three major vendors. Our experimental evaluation shows that PARBOR 1) detects neighboring cell locations with only 66-90 tests, a 745,654X reduction compared to the naive test, and 2) uncovers 21.9% more failures compared to a random-pattern test that is unaware of the neighbor cell locations. We introduce a new mechanism that utilizes PARBOR to reduce refresh rate based on the data content of memory locations, thereby improving system performance and efficiency. We hope that our fast and efficient system-level detection technique enables other new ideas and mechanisms that improve the reliability, performance, and energy efficiency of DRAM-based memory systems.
{"title":"PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM","authors":"S. Khan, Donghyuk Lee, O. Mutlu","doi":"10.1109/DSN.2016.30","DOIUrl":"https://doi.org/10.1109/DSN.2016.30","url":null,"abstract":"System-level detection and mitigation of DRAM failures offer a variety of system enhancements, such as better reliability, scalability, energy, and performance. Unfortunately, system-level detection is challenging for DRAM failures that depend on the data content of neighboring cells (data-dependent failures). DRAM vendors internally scramble/remap the system-level address space. Therefore, testing data-dependent failures using neighboring system-level addresses does not actually test the cells that are physically adjacent. In this work, we argue that one promising way to uncover data-dependent failures in the system is to determine the location of physically neighboring cells in the system address space. Unfortunately, if done naively, such a test takes 49 days to detect neighboring addresses even in a single memory row, making it infeasible in real systems. We develop PARBOR, an efficient system-level technique that determines the locations of the physically neighboring DRAM cells in the system address space and uses this information to detect data-dependent failures. To our knowledge, this is the first work that solves the challenge of detecting data-dependent failures in DRAM in the presence of DRAM-internal scrambling of system-level addresses. We experimentally demonstrate the effectiveness of PARBOR using 144 real DRAM chips from three major vendors. Our experimental evaluation shows that PARBOR 1) detects neighboring cell locations with only 66-90 tests, a 745,654X reduction compared to the naive test, and 2) uncovers 21.9% more failures compared to a random-pattern test that is unaware of the neighbor cell locations. We introduce a new mechanism that utilizes PARBOR to reduce refresh rate based on the data content of memory locations, thereby improving system performance and efficiency. We hope that our fast and efficient system-level detection technique enables other new ideas and mechanisms that improve the reliability, performance, and energy efficiency of DRAM-based memory systems.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134440831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rekeying refers to an operation of replacing an existing key with a new key for encryption. It renews security protection, so as to protect against key compromise and enable dynamic access control in cryptographic storage. However, it is non-trivial to realize efficient rekeying in encrypted deduplication storage systems, which use deterministic content-derived encryption keys to allow deduplication on ciphertexts. We design and implement REED, a rekeying-aware encrypted deduplication storage system. REED builds on a deterministic version of all-or-nothing transform (AONT), such that it enables secure and lightweight rekeying, while preserving the deduplication capability. We propose two REED encryption schemes that trade between performance and security, and extend REED for dynamic access control. We implement a REED prototype with various performance optimization techniques. Our trace-driven testbed evaluation shows that our REED prototype maintains high performance and storage efficiency.
{"title":"Rekeying for Encrypted Deduplication Storage","authors":"Jingwei Li, Chuan Qin, P. Lee, Jin Li","doi":"10.1109/DSN.2016.62","DOIUrl":"https://doi.org/10.1109/DSN.2016.62","url":null,"abstract":"Rekeying refers to an operation of replacing an existing key with a new key for encryption. It renews security protection, so as to protect against key compromise and enable dynamic access control in cryptographic storage. However, it is non-trivial to realize efficient rekeying in encrypted deduplication storage systems, which use deterministic content-derived encryption keys to allow deduplication on ciphertexts. We design and implement REED, a rekeying-aware encrypted deduplication storage system. REED builds on a deterministic version of all-or-nothing transform (AONT), such that it enables secure and lightweight rekeying, while preserving the deduplication capability. We propose two REED encryption schemes that trade between performance and security, and extend REED for dynamic access control. We implement a REED prototype with various performance optimization techniques. Our trace-driven testbed evaluation shows that our REED prototype maintains high performance and storage efficiency.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Hu, Jiyong Jang, M. Stoecklin, Ting Wang, D. Schales, Dhilung Kirat, J. Rao
Sophisticated cyber security threats, such as advanced persistent threats, rely on infecting end points within a targeted security domain and embedding malware. Typically, such malware periodically reaches out to the command and control infrastructures controlled by adversaries. Such callback behavior, called beaconing, is challenging to detect as (a) detection requires long-term temporal analysis of communication patterns at several levels of granularity, (b) malware authors employ various strategies to hide beaconing behavior, and (c) it is also employed by legitimate applications (such as updates checks). In this paper, we develop a comprehensive methodology to identify stealthy beaconing behavior from network traffic observations. We use an 8-step filtering approach to iteratively refine and eliminate legitimate beaconing traffic and pinpoint malicious beaconing cases for in-depth investigation and takedown. We provide a systematic evaluation of our core beaconing detection algorithm and conduct a large-scale evaluation of web proxy data (more than 30 billion events) collected over a 5-month period at a corporate network comprising over 130,000 end-user devices. Our findings indicate that our approach reliably exposes malicious beaconing behavior, which may be overlooked by traditional security mechanisms.
{"title":"BAYWATCH: Robust Beaconing Detection to Identify Infected Hosts in Large-Scale Enterprise Networks","authors":"Xin Hu, Jiyong Jang, M. Stoecklin, Ting Wang, D. Schales, Dhilung Kirat, J. Rao","doi":"10.1109/DSN.2016.50","DOIUrl":"https://doi.org/10.1109/DSN.2016.50","url":null,"abstract":"Sophisticated cyber security threats, such as advanced persistent threats, rely on infecting end points within a targeted security domain and embedding malware. Typically, such malware periodically reaches out to the command and control infrastructures controlled by adversaries. Such callback behavior, called beaconing, is challenging to detect as (a) detection requires long-term temporal analysis of communication patterns at several levels of granularity, (b) malware authors employ various strategies to hide beaconing behavior, and (c) it is also employed by legitimate applications (such as updates checks). In this paper, we develop a comprehensive methodology to identify stealthy beaconing behavior from network traffic observations. We use an 8-step filtering approach to iteratively refine and eliminate legitimate beaconing traffic and pinpoint malicious beaconing cases for in-depth investigation and takedown. We provide a systematic evaluation of our core beaconing detection algorithm and conduct a large-scale evaluation of web proxy data (more than 30 billion events) collected over a 5-month period at a corporate network comprising over 130,000 end-user devices. Our findings indicate that our approach reliably exposes malicious beaconing behavior, which may be overlooked by traditional security mechanisms.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116444703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the processor feature sizes shrink, mitigating faults in low level system services has become a critical aspect of dependable system design. In this paper we introduce SuperGlue, an interface description language (IDL) and compiler for recovery from transient faults in a component-based operating system. SuperGlue generates code for interface-driven recovery that uses commodity hardware isolation, micro-rebooting, and interface-directed fault recovery to provide predictable and efficient recovery from faults that impact low-level system services. SuperGlue decreases the amount of recovery code system designers need to implement by an order of magnitude, and replaces it with declarative specifications. We evaluate SuperGlue with a fault injection campaign in low-level system components (e.g., memory mapping manager and scheduler). Additionally, we evaluate the performance of SuperGlue in a web-server application. Results show that SuperGlue improves system reliability with only a small performance degradation of 11.84%.
{"title":"SuperGlue: IDL-Based, System-Level Fault Tolerance for Embedded Systems","authors":"Jiguo Song, Gedare Bloom, Gabriel Parmer","doi":"10.1109/DSN.2016.29","DOIUrl":"https://doi.org/10.1109/DSN.2016.29","url":null,"abstract":"As the processor feature sizes shrink, mitigating faults in low level system services has become a critical aspect of dependable system design. In this paper we introduce SuperGlue, an interface description language (IDL) and compiler for recovery from transient faults in a component-based operating system. SuperGlue generates code for interface-driven recovery that uses commodity hardware isolation, micro-rebooting, and interface-directed fault recovery to provide predictable and efficient recovery from faults that impact low-level system services. SuperGlue decreases the amount of recovery code system designers need to implement by an order of magnitude, and replaces it with declarative specifications. We evaluate SuperGlue with a fault injection campaign in low-level system components (e.g., memory mapping manager and scheduler). Additionally, we evaluate the performance of SuperGlue in a web-server application. Results show that SuperGlue improves system reliability with only a small performance degradation of 11.84%.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132934521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Once pervasive, the File Transfer Protocol (FTP) has been largely supplanted by HTTP, SCP, and BitTorrent for transferring data between hosts. Yet, in a comprehensive analysis of the FTP ecosystem as of 2015, we find that there are still more than 13~million FTP servers in the IPv4 address space, 1.1~million of which allow "anonymous" (public) access. These anonymous FTP servers leak sensitive information, such as tax documents and cryptographic secrets. More than 20,000 FTP servers allow public write access, which has facilitated malicious actors' use of free storage as well as malware deployment and click-fraud attacks. We further investigate real-world attacks by deploying eight FTP honeypots, shedding light on how attackers are abusing and exploiting vulnerable servers. We conclude with lessons and recommendations for securing FTP.
{"title":"FTP: The Forgotten Cloud","authors":"Drew Springall, Z. Durumeric, J. A. Halderman","doi":"10.1109/DSN.2016.52","DOIUrl":"https://doi.org/10.1109/DSN.2016.52","url":null,"abstract":"Once pervasive, the File Transfer Protocol (FTP) has been largely supplanted by HTTP, SCP, and BitTorrent for transferring data between hosts. Yet, in a comprehensive analysis of the FTP ecosystem as of 2015, we find that there are still more than 13~million FTP servers in the IPv4 address space, 1.1~million of which allow \"anonymous\" (public) access. These anonymous FTP servers leak sensitive information, such as tax documents and cryptographic secrets. More than 20,000 FTP servers allow public write access, which has facilitated malicious actors' use of free storage as well as malware deployment and click-fraud attacks. We further investigate real-world attacks by deploying eight FTP honeypots, shedding light on how attackers are abusing and exploiting vulnerable servers. We conclude with lessons and recommendations for securing FTP.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116627491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Performance ticket handling is an expensive operation in highly virtualized cloud data centers where physical boxes host multiple virtual machines (VMs). A large body of tickets arise from the resource usage warnings, e.g., CPU and RAM usages that exceed predefined thresholds. The transient nature of CPU and RAM usage as well as their strong correlation across time among co-located VMs drastically increase the complexity in ticket management. Based on a large resource usage data collected from production data centers, amount to 6K physical machines and more than 80K VMs, we first discover patterns of spatial dependency among co-located virtual resources. Leveraging our key findings, we develop an Active Ticket Managing(ATM) system that consists of (i) a novel time series prediction methodology and (ii) a proactive VM resizing policy for CPU and RAM resources for co-located VMs on a physical box that aims to drastically reduce usage tickets. ATM exploits the spatial dependency across multiple resources of co-located VMs for usage prediction and proactive VM resizing. Evaluation results on traces of 6K physical boxes and a prototype of a MediaWiki system show that ATM is able to achieve excellent prediction accuracy of a large number of VM time series and significant usage ticket reduction, i.e., up to 60%, at low computational overhead.
{"title":"Managing Data Center Tickets: Prediction and Active Sizing","authors":"Ji Xue, R. Birke, L. Chen, E. Smirni","doi":"10.1109/DSN.2016.38","DOIUrl":"https://doi.org/10.1109/DSN.2016.38","url":null,"abstract":"Performance ticket handling is an expensive operation in highly virtualized cloud data centers where physical boxes host multiple virtual machines (VMs). A large body of tickets arise from the resource usage warnings, e.g., CPU and RAM usages that exceed predefined thresholds. The transient nature of CPU and RAM usage as well as their strong correlation across time among co-located VMs drastically increase the complexity in ticket management. Based on a large resource usage data collected from production data centers, amount to 6K physical machines and more than 80K VMs, we first discover patterns of spatial dependency among co-located virtual resources. Leveraging our key findings, we develop an Active Ticket Managing(ATM) system that consists of (i) a novel time series prediction methodology and (ii) a proactive VM resizing policy for CPU and RAM resources for co-located VMs on a physical box that aims to drastically reduce usage tickets. ATM exploits the spatial dependency across multiple resources of co-located VMs for usage prediction and proactive VM resizing. Evaluation results on traces of 6K physical boxes and a prototype of a MediaWiki system show that ATM is able to achieve excellent prediction accuracy of a large number of VM time series and significant usage ticket reduction, i.e., up to 60%, at low computational overhead.","PeriodicalId":102292,"journal":{"name":"2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125902952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}