This paper presents a systematic study on the security of modern file systems, following a vulnerability-centric perspective. Specifically, we collected 377 file system vulnerabilities committed to the CVE database in the past 20 years. We characterize them from four dimensions that include why the vulnerabilities appear, how the vulnerabilities can be exploited, what consequences can arise, and how the vulnerabilities are fixed. This way, we build a deep understanding of the attack surfaces faced by file systems, the threats imposed by the attack surfaces, and the good and bad practices in mitigating the attacks in file systems. We envision that our study will bring insights towards the future development of file systems, the enhancement of file system security, and the relevant vulnerability mitigating solutions.
More and more data are stored in cloud storage which brings two major challenges. First, the modified files in the cloud should be quickly synchronized to ensure data consistency, e.g., delta synchronization (sync) achieves efficient cloud sync by synchronizing only the updated part of the file. Second, the huge data in the cloud needs to be deduplicated and encrypted, e.g., Message-Locked Encryption (MLE) implements data deduplication by encrypting the content among different users. However, when combined, a few updates in the content can cause large sync traffic amplification for both keys and ciphertext in the MLE-based cloud storage, significantly degrading the cloud sync efficiency. A feature-based encryption sync scheme, FeatureSync, is proposed to address the delta amplification problem. However, with further improvement of the network bandwidth, the performance of FeatureSync stagnates. In our preliminary experimental evaluations, we find that the bottleneck of the computational overhead in the high-bandwidth network environments is the main bottleneck in FeatureSync. In this paper, we propose an enhanced feature-based encryption sync scheme FASTSync to optimize the performance of FeatureSync in high-bandwidth network environments. The performance evaluations on a lightweight prototype implementation of FASTSync show that FASTSync reduces the cloud sync time by 70.3% and the encryption time by 37.3% on average, compared with FeatureSync.
Symmetric Searchable Encryption (SSE), as an ideal primitive, can ensure data privacy while supporting retrieval over encrypted data. However, existing multi-user SSE schemes require the data owner to share the secret key with all query users or always be online to generate search tokens. While there are some solutions to this problem, they have at least one weakness, such as non-supporting conjunctive query, result decryption assistance of the data owner, and unauthorized access. To solve the above issues, we propose an
Hardware is often required to support fast search and high-throughput applications. Consequently, the performance of search algorithms is limited by storage bandwidth. Hence, the search algorithm must be optimized accordingly. We propose a CostCounter (CC) algorithm based on cuckoo hashing and an Improved CostCounter (ICC) algorithm. A better path can be selected when collisions occur using a cost counter to record the kick-out situation. Our simulation results indicate that the CC and ICC algorithms can achieve more significant performance improvements than Random Walk (RW), Breadth First Search (BFS), and MinCounter (MC). With two buckets and two slots per bucket, under the 95% memory load rate of the maximum load rate, CC and ICC are optimized on read-write times over 20% and 80% compared to MC and BFS, respectively. Furthermore, the CC and ICC algorithms achieve a slight improvement in storage efficiency compared with MC. In addition, we implement RW, MC, and the proposed algorithms using fine-grained locking to support a high throughput rate. From the test on field programmable gate arrays, we verify the simulation results and our algorithms optimize the maximum throughput over 23% compared to RW and 9% compared to MC under 95% of the memory capacity. The test results indicate that our CC and ICC algorithms can achieve better performance in terms of hardware bandwidth and memory load efficiency without incurring a significant resource cost.
This article systematically studies 99 distributed performance bugs from five widely deployed distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop MapReduce and ZooKeeper). We present the TaxPerf database, which collectively organizes the analysis results as over 400 classification labels and over 2,500 lines of bug re-description. TaxPerf is classified into six bug categories (and 18 bug subcategories) by their root causes; resource, blocking, synchronization, optimization, configuration, and logic. TaxPerf can be used as a benchmark for performance bug studies and debug tool designs. Although it is impractical to automatically detect all categories of performance bugs in TaxPerf, we find that an important category of blocking bugs can be effectively solved by analysis tools. We analyze the cascading nature of blocking bugs and design an automatic detection tool called PCatch, which (i) performs program analysis to identify code regions whose execution time can potentially increase dramatically with the workload size; (ii) adapts the traditional happens-before model to reason about software resource contention and performance dependency relationship; and (iii) uses dynamic tracking to identify whether the slowdown propagation is contained in one job. Evaluation shows that PCatch can accurately detect blocking bugs of representative distributed storage and computing systems by observing system executions under small-scale workloads.
Data arrangement determines the capacity, resilience, and performance of a distributed storage system. A scalable self-managed system must place its data efficiently not only during stable operation but also after an expansion, planned downscaling, or device failures. In this article, we present Derrick, a data balancing algorithm addressing these needs, which has been developed for HYDRAstor, a highly scalable commercial storage system. Derrick makes its decisions quickly in case of failures but takes additional time to find a nearly optimal data arrangement and a plan for reaching it when the device population changes. Compared to balancing algorithms in two other state-of-the-art systems, Derrick provides better capacity utilization, reduced data movement, and improved performance. Moreover, it can be easily adapted to meet custom placement requirements.