2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)最新文献

Characterizing and Understanding HPC Job Failures Over The 2K-Day Life of IBM BlueGene/Q System IBM BlueGene/Q系统在2000天寿命内的高性能计算作业故障特征和理解

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00055

S. Di, Hanqi Guo, Eric Pershey, M. Snir, F. Cappello

An in-depth understanding of the failure features of HPC jobs in a supercomputer is critical to the large-scale system maintenance and improvement of the service quality for users. In this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM Blue Gene/Q Mira, based on 2001 days of observations with a total of over 32.44 billion core-hours. We study the impact of the system's events on the jobs' execution in order to understand the system's reliability from the perspective of jobs and users. The characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability (RAS) log; job scheduling log; the log regarding each job's physical execution tasks; and the I/O behavior log. We present 22 valuable takeaways based on our in-depth analysis. For instance, 99,245 job failures are reported in the job-scheduling log, a large majority (99.4%) of which are due to user behavior (such as bugs in code, wrong configuration, or misoperations). The job failures are correlated with multiple metrics and attributes, such as users/projects and job execution structure (number of tasks, scale, and core-hours). The best-fitting distributions of a failed job's execution length (or interruption interval) include Weibull, Pareto, inverse Gaussian, and Erlang/exponential, depending on the types of errors (i.e., exit codes). The RAS events affecting job executions exhibit a high correlation with users and core-hours and have a strong locality feature. In terms of the failed jobs, our similarity-based event-filtering analysis indicates that the mean time to interruption is about 3.5 days.

深入了解超级计算机中高性能计算作业的故障特征，对于大规模系统维护和提高用户服务质量至关重要。在本文中，我们研究了最强大的超级计算机之一IBM Blue Gene/Q Mira中数十万个工作的特征，基于2001天的观测，总计超过324.4亿核小时。为了从作业和用户的角度理解系统的可靠性，我们研究了系统事件对作业执行的影响。特征描述涉及基于多个数据源的联合分析，包括可靠性、可用性和可服务性(RAS)日志;作业调度日志;关于每个作业的物理执行任务的日志;以及I/O行为日志。根据我们的深入分析，我们提出了22条有价值的要点。例如，在作业调度日志中报告了99,245个作业失败，其中绝大多数(99.4%)是由于用户行为(例如代码错误、错误配置或误操作)造成的。作业失败与多个指标和属性相关，例如用户/项目和作业执行结构(任务数量、规模和核心小时数)。失败作业的执行长度(或中断间隔)的最佳拟合分布包括Weibull、Pareto、逆高斯和Erlang/exponential，这取决于错误的类型(即退出代码)。影响作业执行的RAS事件与用户和核心小时高度相关，并且具有很强的局部性特征。就失败的作业而言，我们基于相似性的事件过滤分析表明，中断的平均时间约为3.5天。

{"title":"Characterizing and Understanding HPC Job Failures Over The 2K-Day Life of IBM BlueGene/Q System","authors":"S. Di, Hanqi Guo, Eric Pershey, M. Snir, F. Cappello","doi":"10.1109/DSN.2019.00055","DOIUrl":"https://doi.org/10.1109/DSN.2019.00055","url":null,"abstract":"An in-depth understanding of the failure features of HPC jobs in a supercomputer is critical to the large-scale system maintenance and improvement of the service quality for users. In this paper, we investigate the features of hundreds of thousands of jobs in one of the most powerful supercomputers, the IBM Blue Gene/Q Mira, based on 2001 days of observations with a total of over 32.44 billion core-hours. We study the impact of the system's events on the jobs' execution in order to understand the system's reliability from the perspective of jobs and users. The characterization involves a joint analysis based on multiple data sources, including the reliability, availability, and serviceability (RAS) log; job scheduling log; the log regarding each job's physical execution tasks; and the I/O behavior log. We present 22 valuable takeaways based on our in-depth analysis. For instance, 99,245 job failures are reported in the job-scheduling log, a large majority (99.4%) of which are due to user behavior (such as bugs in code, wrong configuration, or misoperations). The job failures are correlated with multiple metrics and attributes, such as users/projects and job execution structure (number of tasks, scale, and core-hours). The best-fitting distributions of a failed job's execution length (or interruption interval) include Weibull, Pareto, inverse Gaussian, and Erlang/exponential, depending on the types of errors (i.e., exit codes). The RAS events affecting job executions exhibit a high correlation with users and core-hours and have a strong locality feature. In terms of the failed jobs, our similarity-based event-filtering analysis indicates that the mean time to interruption is about 3.5 days.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114452048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Exploiting Latency and Error Tolerance of GPGPU Applications for an Energy-Efficient DRAM 利用GPGPU应用的延迟和容错性实现高能效DRAM

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00046

Haonan Wang, Adwait Jog

Memory (DRAM) energy consumption is one of the major scalability bottlenecks for almost all computing systems, including throughput machines such as Graphics Processing Units (GPUs). A large fraction of DRAM dynamic energy is spent on fetching the data bits from a DRAM page (row) to a small-sized hardware structure called as the row buffer. The data access from this row buffer is much less expensive in terms of energy and latency. Hence, it is preferred to reuse the buffered data as much as possible before activating another row and bringing its data to these row buffers. Our thorough characterization of several GPGPU applications shows that these row buffers are poorly utilized leading to sub-optimal energy consumption. To address this, we propose a novel memory scheduling for GPUs that exploits latency and error tolerance properties of GPGPU applications to reduce row energy by 44% on average.

内存(DRAM)能耗是几乎所有计算系统(包括图形处理单元(gpu)等吞吐量机器)的主要可伸缩性瓶颈之一。DRAM动态能量的很大一部分花费在从DRAM页(行)获取数据位到称为行缓冲区的小型硬件结构上。从这个行缓冲区访问数据在能量和延迟方面要便宜得多。因此，在激活另一行并将其数据放入这些行缓冲区之前，最好尽可能重用缓冲的数据。我们对几个GPGPU应用程序的全面表征表明，这些行缓冲区利用率很低，导致能耗次优。为了解决这个问题，我们提出了一种新的gpu内存调度方法，利用GPGPU应用程序的延迟和容错特性，平均减少44%的行能量。

引用次数: 2

Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network 基于深度图卷积神经网络的恶意软件控制流图分类

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00020

Jiaqi Yan, Guanhua Yan, Dong Jin

Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learning based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and nonadaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.

长期以来，恶意软件一直是数字世界中最大的网络威胁之一。现有的基于机器学习的恶意软件分类方法依赖于从原始二进制文件或反汇编代码中提取的手工特征。这些特性的多样性使得构建通用的恶意软件分类系统在不同的操作环境中有效工作变得困难。为了在通用性和性能之间取得平衡，我们探索了新的机器学习技术，将恶意软件程序分类为其控制流图(cfg)。为了克服现有恶意软件分析方法使用低效率和非自适应图匹配技术的缺点，在本工作中，我们构建了一个新的系统，该系统使用深度图卷积神经网络嵌入cfg固有的结构信息，以实现有效而高效的恶意软件分类。我们使用两个包含超过20K个恶意软件样本的大型独立数据集来评估我们提出的系统，实验结果表明，它可以对cfg表示的恶意软件程序进行分类，其性能与应用于手工制作的恶意软件特征的最先进方法相当。

{"title":"Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network","authors":"Jiaqi Yan, Guanhua Yan, Dong Jin","doi":"10.1109/DSN.2019.00020","DOIUrl":"https://doi.org/10.1109/DSN.2019.00020","url":null,"abstract":"Malware have been one of the biggest cyber threats in the digital world for a long time. Existing machine learning based malware classification methods rely on handcrafted features extracted from raw binary files or disassembled code. The diversity of such features created has made it hard to build generic malware classification systems that work effectively across different operational environments. To strike a balance between generality and performance, we explore new machine learning techniques to classify malware programs represented as their control flow graphs (CFGs). To overcome the drawbacks of existing malware analysis methods using inefficient and nonadaptive graph matching techniques, in this work, we build a new system that uses deep graph convolutional neural network to embed structural information inherent in CFGs for effective yet efficient malware classification. We use two large independent datasets that contain more than 20K malware samples to evaluate our proposed system and the experimental results show that it can classify CFG-represented malware programs with performance comparable to those of the state-of-the-art methods applied on handcrafted malware features.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132037899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

An Online Approach to Estimate Parameters of Phase-Type Distributions 相型分布参数的在线估计方法

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00024

P. Buchholz, Iryna Dohndorf, J. Kriege

The traditional expectation-maximization (EM) algorithm is a general purpose algorithm for maximum likelihood estimation in problems with incomplete data. Several variants of the algorithm exist to estimate the parameters of phase-type distributions (PHDs), a widely used class of distributions in performance and dependability modeling. EM algorithms are typical offline algorithms because they improve the likelihood function by iteratively running through a fixed sample. Nowadays data can be generated online in most systems such that offline algorithms seem to be outdated in this environment. This paper proposes an online EM algorithm for parameter estimation of PHDs. In contrast to the offline version, the online variant adds data immediately when it becomes available and includes no iteration. Different variants of the algorithms are proposed that exploit the specific structure of subclasses of PHDs like hyperexponential, hyper-Erlang or acyclic PHDs. The algorithm furthermore incorporates current methods to detect drifts or change points in a data stream and estimates a new PHD whenever such a behavior has been identified. Thus, the resulting distributions can be applied for online model prediction and for the generation of inhomogeneous PHDs as an extension of inhomogeneous Poisson processes. Numerical experiments with artificial and measured data streams show the applicability of the approach.

传统的期望最大化(EM)算法是一种用于不完全数据问题中最大似然估计的通用算法。相位型分布(PHDs)是一种在性能和可靠性建模中广泛使用的分布类型。EM算法是典型的离线算法，因为它通过迭代运行固定样本来改进似然函数。如今，在大多数系统中，数据可以在线生成，因此离线算法在这种环境中似乎已经过时了。提出了一种用于博士学位参数估计的在线EM算法。与离线版本相比，在线版本在数据可用时立即添加数据，并且不包含迭代。提出了不同的算法变体，利用了超指数、超erlang或无环博士等博士子类的特定结构。该算法还结合了当前的方法来检测数据流中的漂移或改变点，并在识别出这种行为时估计新的PHD。因此，所得到的分布可以应用于在线模型预测和作为非均匀泊松过程的扩展的非均匀博士的产生。人工数据流和实测数据流的数值实验表明了该方法的适用性。

{"title":"An Online Approach to Estimate Parameters of Phase-Type Distributions","authors":"P. Buchholz, Iryna Dohndorf, J. Kriege","doi":"10.1109/DSN.2019.00024","DOIUrl":"https://doi.org/10.1109/DSN.2019.00024","url":null,"abstract":"The traditional expectation-maximization (EM) algorithm is a general purpose algorithm for maximum likelihood estimation in problems with incomplete data. Several variants of the algorithm exist to estimate the parameters of phase-type distributions (PHDs), a widely used class of distributions in performance and dependability modeling. EM algorithms are typical offline algorithms because they improve the likelihood function by iteratively running through a fixed sample. Nowadays data can be generated online in most systems such that offline algorithms seem to be outdated in this environment. This paper proposes an online EM algorithm for parameter estimation of PHDs. In contrast to the offline version, the online variant adds data immediately when it becomes available and includes no iteration. Different variants of the algorithms are proposed that exploit the specific structure of subclasses of PHDs like hyperexponential, hyper-Erlang or acyclic PHDs. The algorithm furthermore incorporates current methods to detect drifts or change points in a data stream and estimates a new PHD whenever such a behavior has been identified. Thus, the resulting distributions can be applied for online model prediction and for the generation of inhomogeneous PHDs as an extension of inhomogeneous Poisson processes. Numerical experiments with artificial and measured data streams show the applicability of the approach.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129023650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TEE-Perf: A Profiler for Trusted Execution Environments TEE-Perf:可信执行环境的分析器

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00050

Maurice Bailleu, Donald Dragoti, Pramod Bhatotia, C. Fetzer

We introduce TEE-PERF, an architecture-and platform-independent performance measurement tool for trusted execution environments (TEEs). More specifically, TEE-PERF supports method-level profiling for unmodified multithreaded applications, without relying on any architecture-specific hardware features (e.g. Intel VTune Amplifier), or without requiring platform-dependent kernel features (e.g. Linux perf). Moreover, TEE-PERF provides accurate profiling measurements since it traces the entire process execution without employing instruction pointer sampling. Thus, TEE-PERF does not suffer from sampling frequency bias, which can occur with threads scheduled to align to the sampling frequency. We have implemented TEE-P ERF with an easy to use interface, and integrated it with Flame Graphs to visualize the performance bottlenecks. We have evaluated TEE-PERF based on the Phoenix multithreaded benchmark suite and real-world applications (RocksDB, SPDK, etc.), and compared it with Linux perf. Our experimental evaluation shows that TEE-PERF incurs low profiling overheads, while providing accurate profile measurements to identify and optimize the application bottlenecks in the context of TEEs. TEE-PERF is publicly available.

我们介绍TEE-PERF，这是一种独立于架构和平台的性能测量工具，适用于可信执行环境(tee)。更具体地说，TEE-PERF支持未经修改的多线程应用程序的方法级分析，而不依赖于任何特定于体系结构的硬件特性(例如Intel VTune Amplifier)，也不需要依赖于平台的内核特性(例如Linux perf)。此外，TEE-PERF提供了准确的分析测量，因为它跟踪整个进程的执行，而不使用指令指针采样。因此，TEE-PERF不会受到采样频率偏差的影响，而这种偏差可能发生在计划与采样频率对齐的线程中。我们已经实现了TEE-P ERF与一个易于使用的界面，并与火焰图形集成，以可视化的性能瓶颈。我们基于Phoenix多线程基准套件和实际应用程序(RocksDB, SPDK等)对TEE-PERF进行了评估，并将其与Linux perf进行了比较。我们的实验评估表明，TEE-PERF的分析开销很低，同时提供了准确的分析测量，以识别和优化tee环境中的应用瓶颈。TEE-PERF是公开的。

{"title":"TEE-Perf: A Profiler for Trusted Execution Environments","authors":"Maurice Bailleu, Donald Dragoti, Pramod Bhatotia, C. Fetzer","doi":"10.1109/DSN.2019.00050","DOIUrl":"https://doi.org/10.1109/DSN.2019.00050","url":null,"abstract":"We introduce TEE-PERF, an architecture-and platform-independent performance measurement tool for trusted execution environments (TEEs). More specifically, TEE-PERF supports method-level profiling for unmodified multithreaded applications, without relying on any architecture-specific hardware features (e.g. Intel VTune Amplifier), or without requiring platform-dependent kernel features (e.g. Linux perf). Moreover, TEE-PERF provides accurate profiling measurements since it traces the entire process execution without employing instruction pointer sampling. Thus, TEE-PERF does not suffer from sampling frequency bias, which can occur with threads scheduled to align to the sampling frequency. We have implemented TEE-P ERF with an easy to use interface, and integrated it with Flame Graphs to visualize the performance bottlenecks. We have evaluated TEE-PERF based on the Phoenix multithreaded benchmark suite and real-world applications (RocksDB, SPDK, etc.), and compared it with Linux perf. Our experimental evaluation shows that TEE-PERF incurs low profiling overheads, while providing accurate profile measurements to identify and optimize the application bottlenecks in the context of TEEs. TEE-PERF is publicly available.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127548333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

GreenFlag: Protecting 3D-Racetrack Memory from Shift Errors GreenFlag:保护3D-Racetrack内存免受Shift错误

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00016

Georgios Mappouras, Alireza Vahid, A. Calderbank, Daniel J. Sorin

Racetrack memory is an exciting emerging memory technology with the potential to offer far greater capacity and performance than other non-volatile memories. Racetrack memory has an unusual error model, though, which precludes the use of the typical error coding techniques used by architects. In this paper, we introduce GreenFlag, a coding scheme that combines a new construction for Varshamov-Tenegolts codes with specially crafted delimiter bits that are placed between each codeword. GreenFlag is the first coding scheme that is compatible with 3D racetrack, which has the benefit of very high density but the limitation of a single read/write port per track. Based on our implementation of encoding/decoding hardware, we analyze the trade-offs between latency, code length, and code rate; we then use this analysis to evaluate the viability of racetrack at each level of the memory hierarchy.

赛马场存储器是一种令人兴奋的新兴存储器技术，具有比其他非易失性存储器提供更大容量和性能的潜力。然而，Racetrack内存有一个不同寻常的错误模型，这使得架构师无法使用典型的错误编码技术。在本文中，我们介绍了GreenFlag，这是一种编码方案，它结合了Varshamov-Tenegolts码的新结构和放置在每个码字之间的特制分隔符位。GreenFlag是第一个与3D赛道兼容的编码方案，它的优点是密度非常高，但每个赛道只有一个读/写端口。基于我们的编码/解码硬件实现，我们分析了延迟、码长和码率之间的权衡;然后，我们使用这个分析来评估racetrack在每个记忆层次的可行性。

引用次数: 13

Exploiting Memory Corruption Vulnerabilities in Connman for IoT Devices 利用物联网设备的Connman内存损坏漏洞

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00036

K. V. English, Islam Obaidat, Meera Sridhar

In the recent past, there has been a rapid increase in attacks on consumer Internet-of-Things (IoT) devices. Several attacks currently focus on easy targets for exploitation, such as weak configurations (weak default passwords). However, with governments, industries, and organizations proposing new laws and regulations to reduce and prevent such easy targets in the IoT space, attackers will move to more subtle exploits in these devices. Memory corruption vulnerabilities are a significant class of vulnerabilities in software security through which attackers can gain control of the entire system. Numerous memory corruption vulnerabilities have been found in IoT firmware already deployed in the consumer market. This paper presents an approach for exploiting stack-based buffer-overflow attacks in IoT firmware, to hijack the device remotely. To show the feasibility of this approach, we demonstrate exploiting a common network software application, Connman, used widely in IoT firmware such as Samsung smart TVs. A series of experiments are reported on, including: crashing and executing arbitrary code in the targeted software application in a controlled environment, adopting the attacks in uncontrolled environments (with standard software defenses such as W⊕X and ASLR enabled), and installing publicly available IoT firmware that uses this software application on a Raspberry Pi. The presented exploits demonstrate the ease in which an adversary can control IoT devices.

最近，针对消费者物联网(IoT)设备的攻击迅速增加。目前有几种攻击集中在容易被利用的目标上，比如弱配置(弱默认密码)。然而，随着政府、行业和组织提出新的法律法规来减少和防止物联网领域的这些简单目标，攻击者将转向这些设备中更微妙的漏洞利用。内存损坏漏洞是软件安全中的一类重要漏洞，攻击者可以通过它获得对整个系统的控制。在已经部署在消费市场的物联网固件中发现了许多内存损坏漏洞。本文提出了一种利用物联网固件中基于堆栈的缓冲区溢出攻击来远程劫持设备的方法。为了证明这种方法的可行性，我们演示了利用一种常见的网络软件应用程序，Connman，广泛用于物联网固件，如三星智能电视。报告了一系列实验，包括:在受控环境中在目标软件应用程序中崩溃和执行任意代码，在非受控环境中采用攻击(启用W⊕X和ASLR等标准软件防御)，以及在树莓派上安装使用该软件应用程序的公开可用物联网固件。所提出的漏洞证明了攻击者可以轻松控制物联网设备。

{"title":"Exploiting Memory Corruption Vulnerabilities in Connman for IoT Devices","authors":"K. V. English, Islam Obaidat, Meera Sridhar","doi":"10.1109/DSN.2019.00036","DOIUrl":"https://doi.org/10.1109/DSN.2019.00036","url":null,"abstract":"In the recent past, there has been a rapid increase in attacks on consumer Internet-of-Things (IoT) devices. Several attacks currently focus on easy targets for exploitation, such as weak configurations (weak default passwords). However, with governments, industries, and organizations proposing new laws and regulations to reduce and prevent such easy targets in the IoT space, attackers will move to more subtle exploits in these devices. Memory corruption vulnerabilities are a significant class of vulnerabilities in software security through which attackers can gain control of the entire system. Numerous memory corruption vulnerabilities have been found in IoT firmware already deployed in the consumer market. This paper presents an approach for exploiting stack-based buffer-overflow attacks in IoT firmware, to hijack the device remotely. To show the feasibility of this approach, we demonstrate exploiting a common network software application, Connman, used widely in IoT firmware such as Samsung smart TVs. A series of experiments are reported on, including: crashing and executing arbitrary code in the targeted software application in a controlled environment, adopting the attacks in uncontrolled environments (with standard software defenses such as W⊕X and ASLR enabled), and installing publicly available IoT firmware that uses this software application on a Raspberry Pi. The presented exploits demonstrate the ease in which an adversary can control IoT devices.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114156160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

EPA-RIMM : An Efficient, Performance-Aware Runtime Integrity Measurement Mechanism for Modern Server Platforms EPA-RIMM:现代服务器平台的高效、性能感知运行时完整性度量机制

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00051

Brian Delgado, Tejaswini Vibhute, John Fastabend, K. Karavanic

Detecting unexpected changes in a system's runtime environment is critical to resilience. A repurposing of System Management Mode (SMM) for runtime security inspections has been proposed, due to SMM's high privilege and protected memory. However, key challenges prevent SMM's adoption for this purpose in production-level environments: the possibility of severe performance impacts, semantic gaps between SMM and host software, high overheads, overly broad access permissions, and lack of flexibility. We introduce a Runtime Integrity Measurement framework, EPA-RIMM, for both native Linux and Xen platforms, that includes several novel features to solve these challenges. EPA-RIMM decomposes large measurements to control perturbation and leverages the SMI Transfer Monitor (STM) to bridge the semantic gap between hypervisors and SMM, as well as restrict the measurement agent's accesses. We present a design and implementation for a concurrent approach that allows EPA-RIMM to utilize all cores in SMM, dramatically increasing measurement throughput and reducing application perturbation. Our Linux and Xen prototype results show that EPA-RIMM meets performance goals while continuously monitoring code and data for signs of attack, and that it is effective at detecting a number of recent exploits.

检测系统运行时环境中的意外更改对于弹性至关重要。由于系统管理模式(SMM)的高特权和受保护的内存，提出了一种用于运行时安全检查的系统管理模式(SMM)的重新用途。然而，在生产级环境中采用SMM的主要挑战是:严重的性能影响的可能性、SMM和主机软件之间的语义差距、高开销、过于宽泛的访问权限以及缺乏灵活性。我们为本地Linux和Xen平台引入了一个运行时完整性度量框架EPA-RIMM，它包含了几个解决这些挑战的新特性。EPA-RIMM分解大型测量以控制扰动，并利用SMI传输监视器(STM)弥合管理程序和SMM之间的语义差距，并限制测量代理的访问。我们提出了一种并发方法的设计和实现，该方法允许EPA-RIMM利用SMM中的所有内核，从而显着提高测量吞吐量并减少应用扰动。我们的Linux和Xen原型结果表明，EPA-RIMM在持续监视代码和数据以寻找攻击迹象的同时满足了性能目标，并且它在检测许多最近的漏洞利用方面非常有效。

{"title":"EPA-RIMM : An Efficient, Performance-Aware Runtime Integrity Measurement Mechanism for Modern Server Platforms","authors":"Brian Delgado, Tejaswini Vibhute, John Fastabend, K. Karavanic","doi":"10.1109/DSN.2019.00051","DOIUrl":"https://doi.org/10.1109/DSN.2019.00051","url":null,"abstract":"Detecting unexpected changes in a system's runtime environment is critical to resilience. A repurposing of System Management Mode (SMM) for runtime security inspections has been proposed, due to SMM's high privilege and protected memory. However, key challenges prevent SMM's adoption for this purpose in production-level environments: the possibility of severe performance impacts, semantic gaps between SMM and host software, high overheads, overly broad access permissions, and lack of flexibility. We introduce a Runtime Integrity Measurement framework, EPA-RIMM, for both native Linux and Xen platforms, that includes several novel features to solve these challenges. EPA-RIMM decomposes large measurements to control perturbation and leverages the SMI Transfer Monitor (STM) to bridge the semantic gap between hypervisors and SMM, as well as restrict the measurement agent's accesses. We present a design and implementation for a concurrent approach that allows EPA-RIMM to utilize all cores in SMM, dramatically increasing measurement throughput and reducing application perturbation. Our Linux and Xen prototype results show that EPA-RIMM meets performance goals while continuously monitoring code and data for signs of attack, and that it is effective at detecting a number of recent exploits.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132926376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SATIN: A Secure and Trustworthy Asynchronous Introspection on Multi-Core ARM Processors 在多核ARM处理器上实现安全可靠的异步自省

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00040

Shengye Wan, Jianhua Sun, Kun Sun, Ning Zhang, Qi Li

On ARM processors with TrustZone security extension, asynchronous introspection mechanisms have been developed in the secure world to detect security policy violations in the normal world. These mechanisms provide security protection via passively checking the normal world snapshot. However, since previous secure world checking solutions require to suspend the entire rich OS, asynchronous introspection has not been widely adopted in the real world. Given a multi-core ARM system that can execute the two worlds simultaneously on different cores, secure world introspection can check the rich OS without suspension. However, we identify a new normal-world evasion attack that can defeat the asynchronous introspection by removing the attacking traces in parallel from one core when the security checking is performing on another core. We perform a systematic study on this attack and present its efficiency against existing asynchronous introspection mechanisms. As the countermeasure, we propose a secure and trustworthy asynchronous introspection mechanism called SATIN, which can efficiently detect the evasion attacks by increasing the attackers' evasion time cost and decreasing the defender's execution time under a safe limit. We implement a prototype on an ARM development board and the experimental results show that SATIN can effectively prevent evasion attacks on multi-core systems with a minor system overhead.

在具有TrustZone安全扩展的ARM处理器上，已经在安全领域开发了异步自省机制，以检测正常情况下的安全策略违规。这些机制通过被动检查正常世界快照来提供安全保护。然而，由于以前的安全世界检查解决方案需要挂起整个富操作系统，异步自省在现实世界中并没有被广泛采用。如果一个多核ARM系统可以在不同的核上同时执行两个世界，那么安全世界自省可以在不挂起的情况下检查富操作系统。然而，我们确定了一种新的正常世界逃避攻击，当安全检查在另一个核上执行时，它可以通过从一个核并行地删除攻击痕迹来击败异步自省。我们对这种攻击进行了系统的研究，并展示了它对现有异步内省机制的有效性。作为对策，我们提出了一种安全可信的异步内省机制——SATIN，该机制通过在安全限制下增加攻击者的逃避时间成本和减少防御者的执行时间，有效地检测逃避攻击。我们在ARM开发板上实现了一个原型，实验结果表明，在系统开销很小的情况下，SATIN可以有效地防止多核系统上的逃避攻击。

{"title":"SATIN: A Secure and Trustworthy Asynchronous Introspection on Multi-Core ARM Processors","authors":"Shengye Wan, Jianhua Sun, Kun Sun, Ning Zhang, Qi Li","doi":"10.1109/DSN.2019.00040","DOIUrl":"https://doi.org/10.1109/DSN.2019.00040","url":null,"abstract":"On ARM processors with TrustZone security extension, asynchronous introspection mechanisms have been developed in the secure world to detect security policy violations in the normal world. These mechanisms provide security protection via passively checking the normal world snapshot. However, since previous secure world checking solutions require to suspend the entire rich OS, asynchronous introspection has not been widely adopted in the real world. Given a multi-core ARM system that can execute the two worlds simultaneously on different cores, secure world introspection can check the rich OS without suspension. However, we identify a new normal-world evasion attack that can defeat the asynchronous introspection by removing the attacking traces in parallel from one core when the security checking is performing on another core. We perform a systematic study on this attack and present its efficiency against existing asynchronous introspection mechanisms. As the countermeasure, we propose a secure and trustworthy asynchronous introspection mechanism called SATIN, which can efficiently detect the evasion attacks by increasing the attackers' evasion time cost and decreasing the defender's execution time under a safe limit. We implement a prototype on an ARM development board and the experimental results show that SATIN can effectively prevent evasion attacks on multi-core systems with a minor system overhead.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123140054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

PrivAnalyzer: Measuring the Efficacy of Linux Privilege Use PrivAnalyzer:测量Linux特权使用的效率

2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Pub Date : 2019-06-24 DOI: 10.1109/DSN.2019.00065

J. Criswell, Jie Zhou, Spyridoula Gravani, Xiaoyu Hu

Operating systems such as Linux break the power of the root user into separate privileges (which Linux calls capabilities) and give processes the ability to enable privileges only when needed and to discard them permanently when the program no longer needs them. However, there is no method of measuring how well the use of such facilities reduces the risk of privilege escalation attacks if the program has a vulnerability. This paper presents PrivAnalyzer, an automated tool that measures how effectively programs use Linux privileges. PrivAnalyzer consists of three components: 1) AutoPriv, an existing LLVM-based C/C++ compiler which uses static analysis to transform a program that uses Linux privileges into a program that safely removes them when no longer needed, 2) ChronoPriv, a new LLVM C/C++ compiler pass that performs dynamic analysis to determine for how long a program retains various privileges, and 3) ROSA, a new bounded model checker that can model the damage a program can do at each program point if an attacker can exploit the program and abuse its privileges. We use PrivAnalyzer to determine how long five privileged open source programs retain the ability to cause serious damage to a system and find that merely transforming a program to drop privileges does not significantly improve security. However, we find that simple refactoring can considerably increase the efficacy of Linux privileges. In two programs that we refactored, we reduced the percentage of execution in which a device file can be read and written from 97% and 88% to 4% and 1%, respectively.

Linux等操作系统将根用户的权力分解为不同的特权(Linux称之为能力)，并赋予进程仅在需要时启用特权的能力，以及在程序不再需要特权时永久放弃特权的能力。但是，如果程序存在漏洞，没有方法可以衡量使用这些设施在多大程度上降低了特权升级攻击的风险。本文介绍了PrivAnalyzer，这是一个测量程序如何有效地使用Linux特权的自动化工具。PrivAnalyzer由三个组件组成:1) AutoPriv，现有的基于LLVM的C/ c++编译器，它使用静态分析将使用Linux特权的程序转换为不再需要时安全删除它们的程序，2)ChronoPriv，一个新的LLVM C/ c++编译器传递，执行动态分析以确定程序保留各种特权的时间，以及3)ROSA，一个新的有界模型检查器，如果攻击者可以利用程序并滥用其特权，可以对程序在每个程序点可能造成的损害进行建模。我们使用PrivAnalyzer来确定五个特权开源程序对系统造成严重损害的能力保留多久，并发现仅仅将程序转换为放弃特权并不能显着提高安全性。然而，我们发现简单的重构可以大大提高Linux特权的效率。在我们重构的两个程序中，我们将读取和写入设备文件的执行百分比分别从97%和88%降低到4%和1%。

{"title":"PrivAnalyzer: Measuring the Efficacy of Linux Privilege Use","authors":"J. Criswell, Jie Zhou, Spyridoula Gravani, Xiaoyu Hu","doi":"10.1109/DSN.2019.00065","DOIUrl":"https://doi.org/10.1109/DSN.2019.00065","url":null,"abstract":"Operating systems such as Linux break the power of the root user into separate privileges (which Linux calls capabilities) and give processes the ability to enable privileges only when needed and to discard them permanently when the program no longer needs them. However, there is no method of measuring how well the use of such facilities reduces the risk of privilege escalation attacks if the program has a vulnerability. This paper presents PrivAnalyzer, an automated tool that measures how effectively programs use Linux privileges. PrivAnalyzer consists of three components: 1) AutoPriv, an existing LLVM-based C/C++ compiler which uses static analysis to transform a program that uses Linux privileges into a program that safely removes them when no longer needed, 2) ChronoPriv, a new LLVM C/C++ compiler pass that performs dynamic analysis to determine for how long a program retains various privileges, and 3) ROSA, a new bounded model checker that can model the damage a program can do at each program point if an attacker can exploit the program and abuse its privileges. We use PrivAnalyzer to determine how long five privileged open source programs retain the ability to cause serious damage to a system and find that merely transforming a program to drop privileges does not significantly improve security. However, we find that simple refactoring can considerably increase the efficacy of Linux privileges. In two programs that we refactored, we reduced the percentage of execution in which a device file can be read and written from 97% and 88% to 4% and 1%, respectively.","PeriodicalId":271955,"journal":{"name":"2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114722340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3