S. Di, Hanqi Guo, Eric Pershey, M. Snir, F. Cappello
An in-depth understanding of the failure features of HPC jobs in a supercomputer is critical to the large-scale system maintenance and improvement of the service quality for users. In this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM Blue Gene/Q Mira, based on 2001 days of observations with a total of over 32.44 billion core-hours. We study the impact of the system's events on the jobs' execution in order to understand the system's reliability from the perspective of jobs and users. The characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability (RAS) log; job scheduling log; the log regarding each job's physical execution tasks; and the I/O behavior log. We present 22 valuable takeaways based on our in-depth analysis. For instance, 99,245 job failures are reported in the job-scheduling log, a large majority (99.4%) of which are due to user behavior (such as bugs in code, wrong configuration, or misoperations). The job failures are correlated with multiple metrics and attributes, such as users/projects and job execution structure (number of tasks, scale, and core-hours). The best-fitting distributions of a failed job's execution length (or interruption interval) include Weibull, Pareto, inverse Gaussian, and Erlang/exponential, depending on the types of errors (i.e., exit codes). The RAS events affecting job executions exhibit a high correlation with users and core-hours and have a strong locality feature. In terms of the failed jobs, our similarity-based event-filtering analysis indicates that the mean time to interruption is about 3.5 days.
深入了解超级计算机中高性能计算作业的故障特征,对于大规模系统维护和提高用户服务质量至关重要。在本文中,我们研究了最强大的超级计算机之一IBM Blue Gene/Q Mira中数十万个工作的特征,基于2001天的观测,总计超过324.4亿核小时。为了从作业和用户的角度理解系统的可靠性,我们研究了系统事件对作业执行的影响。特征描述涉及基于多个数据源的联合分析,包括可靠性、可用性和可服务性(RAS)日志;作业调度日志;关于每个作业的物理执行任务的日志;以及I/O行为日志。根据我们的深入分析,我们提出了22条有价值的要点。例如,在作业调度日志中报告了99,245个作业失败,其中绝大多数(99.4%)是由于用户行为(例如代码错误、错误配置或误操作)造成的。作业失败与多个指标和属性相关,例如用户/项目和作业执行结构(任务数量、规模和核心小时数)。失败作业的执行长度(或中断间隔)的最佳拟合分布包括Weibull、Pareto、逆高斯和Erlang/exponential,这取决于错误的类型(即退出代码)。影响作业执行的RAS事件与用户和核心小时高度相关,并且具有很强的局部性特征。就失败的作业而言,我们基于相似性的事件过滤分析表明,中断的平均时间约为3.5天。
{"title":"Characterizing and Understanding HPC Job Failures Over The 2K-Day Life of IBM BlueGene/Q System","authors":"S. Di, Hanqi Guo, Eric Pershey, M. Snir, F. Cappello","doi":"10.1109/DSN.2019.00055","DOIUrl":"https://doi.org/10.1109/DSN.2019.00055","url":null,"abstract":"An in-depth understanding of the failure features of HPC jobs in a supercomputer is critical to the large-scale system maintenance and improvement of the service quality for users. In this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM Blue Gene/Q Mira, based on 2001 days of observations with a total of over 32.44 billion core-hours. We study the impact of the system's events on the jobs' execution in order to understand the system's reliability from the perspective of jobs and users. The characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability (RAS) log; job scheduling log; the log regarding each job's physical execution tasks; and the I/O behavior log. We present 22 valuable takeaways based on our in-depth analysis. For instance, 99,245 job failures are reported in the job-scheduling log, a large majority (99.4%) of which are due to user behavior (such as bugs in code, wrong configuration, or misoperations). The job failures are correlated with multiple metrics and attributes, such as users/projects and job execution structure (number of tasks, scale, and core-hours). The best-fitting distributions of a failed job's execution length (or interruption interval) include Weibull, Pareto, inverse Gaussian, and Erlang/exponential, depending on the types of errors (i.e., exit codes). The RAS events affecting job executions exhibit a high correlation with users and core-hours and have a strong locality feature. In terms of the failed jobs, our similarity-based event-filtering analysis indicates that the mean time to interruption is about 3.5 days.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114452048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Memory (DRAM) energy consumption is one of the major scalability bottlenecks for almost all computing systems, including throughput machines such as Graphics Processing Units (GPUs). A large fraction of DRAM dynamic energy is spent on fetching the data bits from a DRAM page (row) to a small-sized hardware structure called as the row buffer. The data access from this row buffer is much less expensive in terms of energy and latency. Hence, it is preferred to reuse the buffered data as much as possible before activating another row and bringing its data to these row buffers. Our thorough characterization of several GPGPU applications shows that these row buffers are poorly utilized leading to sub-optimal energy consumption. To address this, we propose a novel memory scheduling for GPUs that exploits latency and error tolerance properties of GPGPU applications to reduce row energy by 44% on average.
{"title":"Exploiting Latency and Error Tolerance of GPGPU Applications for an Energy-Efficient DRAM","authors":"Haonan Wang, Adwait Jog","doi":"10.1109/DSN.2019.00046","DOIUrl":"https://doi.org/10.1109/DSN.2019.00046","url":null,"abstract":"Memory (DRAM) energy consumption is one of the major scalability bottlenecks for almost all computing systems, including throughput machines such as Graphics Processing Units (GPUs). A large fraction of DRAM dynamic energy is spent on fetching the data bits from a DRAM page (row) to a small-sized hardware structure called as the row buffer. The data access from this row buffer is much less expensive in terms of energy and latency. Hence, it is preferred to reuse the buffered data as much as possible before activating another row and bringing its data to these row buffers. Our thorough characterization of several GPGPU applications shows that these row buffers are poorly utilized leading to sub-optimal energy consumption. To address this, we propose a novel memory scheduling for GPUs that exploits latency and error tolerance properties of GPGPU applications to reduce row energy by 44% on average.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132860901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learning based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and nonadaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.
{"title":"Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network","authors":"Jiaqi Yan, Guanhua Yan, Dong Jin","doi":"10.1109/DSN.2019.00020","DOIUrl":"https://doi.org/10.1109/DSN.2019.00020","url":null,"abstract":"Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learning based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and nonadaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132037899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maurice Bailleu, Donald Dragoti, Pramod Bhatotia, C. Fetzer
We introduce TEE-PERF, an architecture-and platform-independent performance measurement tool for trusted execution environments (TEEs). More specifically, TEE-PERF supports method-level profiling for unmodified multithreaded applications, without relying on any architecture-specific hardware features (e.g. Intel VTune Amplifier), or without requiring platform-dependent kernel features (e.g. Linux perf). Moreover, TEE-PERF provides accurate profiling measurements since it traces the entire process execution without employing instruction pointer sampling. Thus, TEE-PERF does not suffer from sampling frequency bias, which can occur with threads scheduled to align to the sampling frequency. We have implemented TEE-P ERF with an easy to use interface, and integrated it with Flame Graphs to visualize the performance bottlenecks. We have evaluated TEE-PERF based on the Phoenix multithreaded benchmark suite and real-world applications (RocksDB, SPDK, etc.), and compared it with Linux perf. Our experimental evaluation shows that TEE-PERF incurs low profiling overheads, while providing accurate profile measurements to identify and optimize the application bottlenecks in the context of TEEs. TEE-PERF is publicly available.
{"title":"TEE-Perf: A Profiler for Trusted Execution Environments","authors":"Maurice Bailleu, Donald Dragoti, Pramod Bhatotia, C. Fetzer","doi":"10.1109/DSN.2019.00050","DOIUrl":"https://doi.org/10.1109/DSN.2019.00050","url":null,"abstract":"We introduce TEE-PERF, an architecture-and platform-independent performance measurement tool for trusted execution environments (TEEs). More specifically, TEE-PERF supports method-level profiling for unmodified multithreaded applications, without relying on any architecture-specific hardware features (e.g. Intel VTune Amplifier), or without requiring platform-dependent kernel features (e.g. Linux perf). Moreover, TEE-PERF provides accurate profiling measurements since it traces the entire process execution without employing instruction pointer sampling. Thus, TEE-PERF does not suffer from sampling frequency bias, which can occur with threads scheduled to align to the sampling frequency. We have implemented TEE-P ERF with an easy to use interface, and integrated it with Flame Graphs to visualize the performance bottlenecks. We have evaluated TEE-PERF based on the Phoenix multithreaded benchmark suite and real-world applications (RocksDB, SPDK, etc.), and compared it with Linux perf. Our experimental evaluation shows that TEE-PERF incurs low profiling overheads, while providing accurate profile measurements to identify and optimize the application bottlenecks in the context of TEEs. TEE-PERF is publicly available.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127548333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The traditional expectation-maximization (EM) algorithm is a general purpose algorithm for maximum likelihood estimation in problems with incomplete data. Several variants of the algorithm exist to estimate the parameters of phase-type distributions (PHDs), a widely used class of distributions in performance and dependability modeling. EM algorithms are typical offline algorithms because they improve the likelihood function by iteratively running through a fixed sample. Nowadays data can be generated online in most systems such that offline algorithms seem to be outdated in this environment. This paper proposes an online EM algorithm for parameter estimation of PHDs. In contrast to the offline version, the online variant adds data immediately when it becomes available and includes no iteration. Different variants of the algorithms are proposed that exploit the specific structure of subclasses of PHDs like hyperexponential, hyper-Erlang or acyclic PHDs. The algorithm furthermore incorporates current methods to detect drifts or change points in a data stream and estimates a new PHD whenever such a behavior has been identified. Thus, the resulting distributions can be applied for online model prediction and for the generation of inhomogeneous PHDs as an extension of inhomogeneous Poisson processes. Numerical experiments with artificial and measured data streams show the applicability of the approach.
{"title":"An Online Approach to Estimate Parameters of Phase-Type Distributions","authors":"P. Buchholz, Iryna Dohndorf, J. Kriege","doi":"10.1109/DSN.2019.00024","DOIUrl":"https://doi.org/10.1109/DSN.2019.00024","url":null,"abstract":"The traditional expectation-maximization (EM) algorithm is a general purpose algorithm for maximum likelihood estimation in problems with incomplete data. Several variants of the algorithm exist to estimate the parameters of phase-type distributions (PHDs), a widely used class of distributions in performance and dependability modeling. EM algorithms are typical offline algorithms because they improve the likelihood function by iteratively running through a fixed sample. Nowadays data can be generated online in most systems such that offline algorithms seem to be outdated in this environment. This paper proposes an online EM algorithm for parameter estimation of PHDs. In contrast to the offline version, the online variant adds data immediately when it becomes available and includes no iteration. Different variants of the algorithms are proposed that exploit the specific structure of subclasses of PHDs like hyperexponential, hyper-Erlang or acyclic PHDs. The algorithm furthermore incorporates current methods to detect drifts or change points in a data stream and estimates a new PHD whenever such a behavior has been identified. Thus, the resulting distributions can be applied for online model prediction and for the generation of inhomogeneous PHDs as an extension of inhomogeneous Poisson processes. Numerical experiments with artificial and measured data streams show the applicability of the approach.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129023650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Mappouras, Alireza Vahid, A. Calderbank, Daniel J. Sorin
Racetrack memory is an exciting emerging memory technology with the potential to offer far greater capacity and performance than other non-volatile memories. Racetrack memory has an unusual error model, though, which precludes the use of the typical error coding techniques used by architects. In this paper, we introduce GreenFlag, a coding scheme that combines a new construction for Varshamov-Tenegolts codes with specially crafted delimiter bits that are placed between each codeword. GreenFlag is the first coding scheme that is compatible with 3D racetrack, which has the benefit of very high density but the limitation of a single read/write port per track. Based on our implementation of encoding/decoding hardware, we analyze the trade-offs between latency, code length, and code rate; we then use this analysis to evaluate the viability of racetrack at each level of the memory hierarchy.
{"title":"GreenFlag: Protecting 3D-Racetrack Memory from Shift Errors","authors":"Georgios Mappouras, Alireza Vahid, A. Calderbank, Daniel J. Sorin","doi":"10.1109/DSN.2019.00016","DOIUrl":"https://doi.org/10.1109/DSN.2019.00016","url":null,"abstract":"Racetrack memory is an exciting emerging memory technology with the potential to offer far greater capacity and performance than other non-volatile memories. Racetrack memory has an unusual error model, though, which precludes the use of the typical error coding techniques used by architects. In this paper, we introduce GreenFlag, a coding scheme that combines a new construction for Varshamov-Tenegolts codes with specially crafted delimiter bits that are placed between each codeword. GreenFlag is the first coding scheme that is compatible with 3D racetrack, which has the benefit of very high density but the limitation of a single read/write port per track. Based on our implementation of encoding/decoding hardware, we analyze the trade-offs between latency, code length, and code rate; we then use this analysis to evaluate the viability of racetrack at each level of the memory hierarchy.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134012275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the recent past, there has been a rapid increase in attacks on consumer Internet-of-Things (IoT) devices. Several attacks currently focus on easy targets for exploitation, such as weak configurations (weak default passwords). However, with governments, industries, and organizations proposing new laws and regulations to reduce and prevent such easy targets in the IoT space, attackers will move to more subtle exploits in these devices. Memory corruption vulnerabilities are a significant class of vulnerabilities in software security through which attackers can gain control of the entire system. Numerous memory corruption vulnerabilities have been found in IoT firmware already deployed in the consumer market. This paper presents an approach for exploiting stack-based buffer-overflow attacks in IoT firmware, to hijack the device remotely. To show the feasibility of this approach, we demonstrate exploiting a common network software application, Connman, used widely in IoT firmware such as Samsung smart TVs. A series of experiments are reported on, including: crashing and executing arbitrary code in the targeted software application in a controlled environment, adopting the attacks in uncontrolled environments (with standard software defenses such as W⊕X and ASLR enabled), and installing publicly available IoT firmware that uses this software application on a Raspberry Pi. The presented exploits demonstrate the ease in which an adversary can control IoT devices.
{"title":"Exploiting Memory Corruption Vulnerabilities in Connman for IoT Devices","authors":"K. V. English, Islam Obaidat, Meera Sridhar","doi":"10.1109/DSN.2019.00036","DOIUrl":"https://doi.org/10.1109/DSN.2019.00036","url":null,"abstract":"In the recent past, there has been a rapid increase in attacks on consumer Internet-of-Things (IoT) devices. Several attacks currently focus on easy targets for exploitation, such as weak configurations (weak default passwords). However, with governments, industries, and organizations proposing new laws and regulations to reduce and prevent such easy targets in the IoT space, attackers will move to more subtle exploits in these devices. Memory corruption vulnerabilities are a significant class of vulnerabilities in software security through which attackers can gain control of the entire system. Numerous memory corruption vulnerabilities have been found in IoT firmware already deployed in the consumer market. This paper presents an approach for exploiting stack-based buffer-overflow attacks in IoT firmware, to hijack the device remotely. To show the feasibility of this approach, we demonstrate exploiting a common network software application, Connman, used widely in IoT firmware such as Samsung smart TVs. A series of experiments are reported on, including: crashing and executing arbitrary code in the targeted software application in a controlled environment, adopting the attacks in uncontrolled environments (with standard software defenses such as W⊕X and ASLR enabled), and installing publicly available IoT firmware that uses this software application on a Raspberry Pi. The presented exploits demonstrate the ease in which an adversary can control IoT devices.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114156160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian Delgado, Tejaswini Vibhute, John Fastabend, K. Karavanic
Detecting unexpected changes in a system's runtime environment is critical to resilience. A repurposing of System Management Mode (SMM) for runtime security inspections has been proposed, due to SMM's high privilege and protected memory. However, key challenges prevent SMM's adoption for this purpose in production-level environments: the possibility of severe performance impacts, semantic gaps between SMM and host software, high overheads, overly broad access permissions, and lack of flexibility. We introduce a Runtime Integrity Measurement framework, EPA-RIMM, for both native Linux and Xen platforms, that includes several novel features to solve these challenges. EPA-RIMM decomposes large measurements to control perturbation and leverages the SMI Transfer Monitor (STM) to bridge the semantic gap between hypervisors and SMM, as well as restrict the measurement agent's accesses. We present a design and implementation for a concurrent approach that allows EPA-RIMM to utilize all cores in SMM, dramatically increasing measurement throughput and reducing application perturbation. Our Linux and Xen prototype results show that EPA-RIMM meets performance goals while continuously monitoring code and data for signs of attack, and that it is effective at detecting a number of recent exploits.
{"title":"EPA-RIMM : An Efficient, Performance-Aware Runtime Integrity Measurement Mechanism for Modern Server Platforms","authors":"Brian Delgado, Tejaswini Vibhute, John Fastabend, K. Karavanic","doi":"10.1109/DSN.2019.00051","DOIUrl":"https://doi.org/10.1109/DSN.2019.00051","url":null,"abstract":"Detecting unexpected changes in a system's runtime environment is critical to resilience. A repurposing of System Management Mode (SMM) for runtime security inspections has been proposed, due to SMM's high privilege and protected memory. However, key challenges prevent SMM's adoption for this purpose in production-level environments: the possibility of severe performance impacts, semantic gaps between SMM and host software, high overheads, overly broad access permissions, and lack of flexibility. We introduce a Runtime Integrity Measurement framework, EPA-RIMM, for both native Linux and Xen platforms, that includes several novel features to solve these challenges. EPA-RIMM decomposes large measurements to control perturbation and leverages the SMI Transfer Monitor (STM) to bridge the semantic gap between hypervisors and SMM, as well as restrict the measurement agent's accesses. We present a design and implementation for a concurrent approach that allows EPA-RIMM to utilize all cores in SMM, dramatically increasing measurement throughput and reducing application perturbation. Our Linux and Xen prototype results show that EPA-RIMM meets performance goals while continuously monitoring code and data for signs of attack, and that it is effective at detecting a number of recent exploits.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132926376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengye Wan, Jianhua Sun, Kun Sun, Ning Zhang, Qi Li
On ARM processors with TrustZone security extension, asynchronous introspection mechanisms have been developed in the secure world to detect security policy violations in the normal world. These mechanisms provide security protection via passively checking the normal world snapshot. However, since previous secure world checking solutions require to suspend the entire rich OS, asynchronous introspection has not been widely adopted in the real world. Given a multi-core ARM system that can execute the two worlds simultaneously on different cores, secure world introspection can check the rich OS without suspension. However, we identify a new normal-world evasion attack that can defeat the asynchronous introspection by removing the attacking traces in parallel from one core when the security checking is performing on another core. We perform a systematic study on this attack and present its efficiency against existing asynchronous introspection mechanisms. As the countermeasure, we propose a secure and trustworthy asynchronous introspection mechanism called SATIN, which can efficiently detect the evasion attacks by increasing the attackers' evasion time cost and decreasing the defender's execution time under a safe limit. We implement a prototype on an ARM development board and the experimental results show that SATIN can effectively prevent evasion attacks on multi-core systems with a minor system overhead.
{"title":"SATIN: A Secure and Trustworthy Asynchronous Introspection on Multi-Core ARM Processors","authors":"Shengye Wan, Jianhua Sun, Kun Sun, Ning Zhang, Qi Li","doi":"10.1109/DSN.2019.00040","DOIUrl":"https://doi.org/10.1109/DSN.2019.00040","url":null,"abstract":"On ARM processors with TrustZone security extension, asynchronous introspection mechanisms have been developed in the secure world to detect security policy violations in the normal world. These mechanisms provide security protection via passively checking the normal world snapshot. However, since previous secure world checking solutions require to suspend the entire rich OS, asynchronous introspection has not been widely adopted in the real world. Given a multi-core ARM system that can execute the two worlds simultaneously on different cores, secure world introspection can check the rich OS without suspension. However, we identify a new normal-world evasion attack that can defeat the asynchronous introspection by removing the attacking traces in parallel from one core when the security checking is performing on another core. We perform a systematic study on this attack and present its efficiency against existing asynchronous introspection mechanisms. As the countermeasure, we propose a secure and trustworthy asynchronous introspection mechanism called SATIN, which can efficiently detect the evasion attacks by increasing the attackers' evasion time cost and decreasing the defender's execution time under a safe limit. We implement a prototype on an ARM development board and the experimental results show that SATIN can effectively prevent evasion attacks on multi-core systems with a minor system overhead.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123140054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Criswell, Jie Zhou, Spyridoula Gravani, Xiaoyu Hu
Operating systems such as Linux break the power of the root user into separate privileges (which Linux calls capabilities) and give processes the ability to enable privileges only when needed and to discard them permanently when the program no longer needs them. However, there is no method of measuring how well the use of such facilities reduces the risk of privilege escalation attacks if the program has a vulnerability. This paper presents PrivAnalyzer, an automated tool that measures how effectively programs use Linux privileges. PrivAnalyzer consists of three components: 1) AutoPriv, an existing LLVM-based C/C++ compiler which uses static analysis to transform a program that uses Linux privileges into a program that safely removes them when no longer needed, 2) ChronoPriv, a new LLVM C/C++ compiler pass that performs dynamic analysis to determine for how long a program retains various privileges, and 3) ROSA, a new bounded model checker that can model the damage a program can do at each program point if an attacker can exploit the program and abuse its privileges. We use PrivAnalyzer to determine how long five privileged open source programs retain the ability to cause serious damage to a system and find that merely transforming a program to drop privileges does not significantly improve security. However, we find that simple refactoring can considerably increase the efficacy of Linux privileges. In two programs that we refactored, we reduced the percentage of execution in which a device file can be read and written from 97% and 88% to 4% and 1%, respectively.
{"title":"PrivAnalyzer: Measuring the Efficacy of Linux Privilege Use","authors":"J. Criswell, Jie Zhou, Spyridoula Gravani, Xiaoyu Hu","doi":"10.1109/DSN.2019.00065","DOIUrl":"https://doi.org/10.1109/DSN.2019.00065","url":null,"abstract":"Operating systems such as Linux break the power of the root user into separate privileges (which Linux calls capabilities) and give processes the ability to enable privileges only when needed and to discard them permanently when the program no longer needs them. However, there is no method of measuring how well the use of such facilities reduces the risk of privilege escalation attacks if the program has a vulnerability. This paper presents PrivAnalyzer, an automated tool that measures how effectively programs use Linux privileges. PrivAnalyzer consists of three components: 1) AutoPriv, an existing LLVM-based C/C++ compiler which uses static analysis to transform a program that uses Linux privileges into a program that safely removes them when no longer needed, 2) ChronoPriv, a new LLVM C/C++ compiler pass that performs dynamic analysis to determine for how long a program retains various privileges, and 3) ROSA, a new bounded model checker that can model the damage a program can do at each program point if an attacker can exploit the program and abuse its privileges. We use PrivAnalyzer to determine how long five privileged open source programs retain the ability to cause serious damage to a system and find that merely transforming a program to drop privileges does not significantly improve security. However, we find that simple refactoring can considerably increase the efficacy of Linux privileges. In two programs that we refactored, we reduced the percentage of execution in which a device file can be read and written from 97% and 88% to 4% and 1%, respectively.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114722340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}