Large-scale storage systems need to provide the right amount of redundancy in their storage scheme to protect client data. In particular, many high-performance systems require data protection that imposes minimal impact on performance; thus, such systems use mirroring to guard against data loss. Unfortunately, as the number of copies increases, mirroring becomes costly and contributes relatively little to the overall system reliability. Compared to mirroring, parity-based schemes are space-efficient, but incur greater update and degraded-mode read costs. An ideal data protection scheme should perform similarly to mirroring, while providing the space efficiency of a parity-based erasure code. Our goal is to increase the reliability of systems that currently mirror data for protection without impacting performance or space overhead. To this end, we propose the use of large parity codes across two-way mirrored reliability groups. The secondary reliability groups are defined across an arbitrarily large set of mirrored groups, necessitating a small amount of non-volatile RAM for parity. Since each parity element is stored in non-volatile RAM, our scheme drastically increases the mean time to data loss without impacting overall system performance.
{"title":"Disaster recovery codes: increasing reliability with large-stripe erasure correcting codes","authors":"K. Greenan, E. L. Miller, T. Schwarz, D. Long","doi":"10.1145/1314313.1314322","DOIUrl":"https://doi.org/10.1145/1314313.1314322","url":null,"abstract":"Large-scale storage systems need to provide the right amount of redundancy in their storage scheme to protect client data. In particular, many high-performance systems require data protection that imposes minimal impact on performance; thus, such systems use mirroring to guard against data loss. Unfortunately, as the number of copies increases, mirroring becomes costly and contributes relatively little to the overall system reliability. Compared to mirroring, parity-based schemes are space-efficient, but incur greater update and degraded-mode read costs. An ideal data protection scheme should perform similarly to mirroring, while providing the space efficiency of a parity-based erasure code.\u0000 Our goal is to increase the reliability of systems that currently mirror data for protection without impacting performance or space overhead. To this end, we propose the use of large parity codes across two-way mirrored reliability groups. The secondary reliability groups are defined across an arbitrarily large set of mirrored groups, necessitating a small amount of non-volatile RAM for parity. Since each parity element is stored in non-volatile RAM, our scheme drastically increases the mean time to data loss without impacting overall system performance.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Krishnan, Giridhar Ravipati, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, B. Miller
Distributed file systems need to be robust in the face of failures. In this work, we study the failure handling and recovery mechanisms of a widely used distributed file system, Linux NFS. We study the behavior of NFS under corruption of important metadata through fault injection. We find that the NFS protocol behaves in unexpected ways in the presence of these corruptions. On some occasions, incorrect errors are communicated to the client application; inothers, the system hangs applications or crashes outright; in a few cases, success is falsely reported when an operation has failed. We use the results of our study to draw lessons for future designs and implementations of the NFS protocol.
{"title":"The effects of metadata corruption on nfs","authors":"S. Krishnan, Giridhar Ravipati, A. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, B. Miller","doi":"10.1145/1314313.1314324","DOIUrl":"https://doi.org/10.1145/1314313.1314324","url":null,"abstract":"Distributed file systems need to be robust in the face of failures. In this work, we study the failure handling and recovery mechanisms of a widely used distributed file system, Linux NFS. We study the behavior of NFS under corruption of important metadata through fault injection. We find that the NFS protocol behaves in unexpected ways in the presence of these corruptions. On some occasions, incorrect errors are communicated to the client application; inothers, the system hangs applications or crashes outright; in a few cases, success is falsely reported when an operation has failed. We use the results of our study to draw lessons for future designs and implementations of the NFS protocol.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"393 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Swaminathan, Yinian Mao, Guan-Ming Su, Hongmei Gou, Avinash L. Varna, Shan He, Min Wu, Douglas W. Oard
This paper introduces a new framework for confidentiality preserving rank-ordered search and retrieval over large document collections. The proposed framework not only protects document/query confidentiality against an outside intruder, but also prevents an untrusted data center from learning information about the query and the document collection. We present practical techniques for proper integration of relevance scoring methods and cryptographic techniques, such as order preserving encryption, to protect data collections and indices and provide efficient and accurate search capabilities to securely rank-order documents in response to a query. Experimental results on the W3C collection show that these techniques have comparable performance to conventional search systems designed for non-encrypted data in terms of search accuracy. The proposed methods thus form the first steps to bring together advanced information retrieval and secure search capabilities for a wide range of applications including managing data in government and business operations, enabling scholarly study of sensitive data, and facilitating the document discovery process in litigation.
{"title":"Confidentiality-preserving rank-ordered search","authors":"A. Swaminathan, Yinian Mao, Guan-Ming Su, Hongmei Gou, Avinash L. Varna, Shan He, Min Wu, Douglas W. Oard","doi":"10.1145/1314313.1314316","DOIUrl":"https://doi.org/10.1145/1314313.1314316","url":null,"abstract":"This paper introduces a new framework for confidentiality preserving rank-ordered search and retrieval over large document collections. The proposed framework not only protects document/query confidentiality against an outside intruder, but also prevents an untrusted data center from learning information about the query and the document collection. We present practical techniques for proper integration of relevance scoring methods and cryptographic techniques, such as order preserving encryption, to protect data collections and indices and provide efficient and accurate search capabilities to securely rank-order documents in response to a query. Experimental results on the W3C collection show that these techniques have comparable performance to conventional search systems designed for non-encrypted data in terms of search accuracy. The proposed methods thus form the first steps to bring together advanced information retrieval and secure search capabilities for a wide range of applications including managing data in government and business operations, enabling scholarly study of sensitive data, and facilitating the document discovery process in litigation.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122281169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data provenance summarizes the history of the ownership of the item, as well as the actions performed on it. While widely used in archives, art, and archeology, provenance is also very important in forensics, scientific computing, and legal proceedings involving data. Significant research has been conducted in this area, yet the security and privacy issues of provenance have not been explored. In this position paper, we define the secure provenance problem and argue that it is of vital importance in numerous applications. We then discuss a select few of the issues related to ensuring the privacy and integrity of provenance information.
{"title":"Introducing secure provenance: problems and challenges","authors":"Ragib Hasan, R. Sion, M. Winslett","doi":"10.1145/1314313.1314318","DOIUrl":"https://doi.org/10.1145/1314313.1314318","url":null,"abstract":"Data provenance summarizes the history of the ownership of the item, as well as the actions performed on it. While widely used in archives, art, and archeology, provenance is also very important in forensics, scientific computing, and legal proceedings involving data. Significant research has been conducted in this area, yet the security and privacy issues of provenance have not been explored. In this position paper, we define the secure provenance problem and argue that it is of vital importance in numerous applications. We then discuss a select few of the issues related to ensuring the privacy and integrity of provenance information.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128688127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed computation systems have become an important tool for scientific simulation, and a similarly distributed replica management system may be employed to increase the locality and availability of storage services. While users of such systems may have low expectations regarding the security and reliability of the computation involved, they expect that committed data sets resulting from complete jobs will be protected against storage faults, accidents and intrusion. We offer a solution to the distributed storage security problem that has no global view on user names or authentication specifics. Access control is handled by a rendition protocol, which is similar to a rendezvous protocol but is driven by the capability of the client user to effect change in the data on the underlying storage. In this paper, we discuss the benefits and liabilities of such a system.
{"title":"Access control for a replica management database","authors":"J. Wozniak, P. Brenner, D. Thain","doi":"10.1145/1179559.1179567","DOIUrl":"https://doi.org/10.1145/1179559.1179567","url":null,"abstract":"Distributed computation systems have become an important tool for scientific simulation, and a similarly distributed replica management system may be employed to increase the locality and availability of storage services. While users of such systems may have low expectations regarding the security and reliability of the computation involved, they expect that committed data sets resulting from complete jobs will be protected against storage faults, accidents and intrusion. We offer a solution to the distributed storage security problem that has no global view on user names or authentication specifics. Access control is handled by a rendition protocol, which is similar to a rendezvous protocol but is driven by the capability of the client user to effect change in the data on the underlying storage. In this paper, we discuss the benefits and liabilities of such a system.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Archival storage systems are designed for a write-once, read-maybe usage model which places an emphasis on the long-term preservation of their data contents. In contrast to traditional storage systems in which data lifetimes are measured in months or possibly years, data lifetimes in an archival system are measured in decades. Secure archival storage has the added goal of providing controlled access to its long-term contents. In contrast, public archival systems aim to ensure that their contents are available to anyone.Since secure archival storage systems must store data over much longer periods of time, new threats emerge that affect the security landscape in many novel, subtle ways. These security threats endanger the secrecy, availability and integrity of the archival storage contents. Adequate understanding of these threats is essential to effectively devise new policies and mechanisms to guard against them. We discuss many of these threats in this new context to fill this gap, and show how existing systems meet (or fail to meet) these threats.
{"title":"Long-term threats to secure archives","authors":"M. Storer, K. Greenan, E. L. Miller","doi":"10.1145/1179559.1179562","DOIUrl":"https://doi.org/10.1145/1179559.1179562","url":null,"abstract":"Archival storage systems are designed for a write-once, read-maybe usage model which places an emphasis on the long-term preservation of their data contents. In contrast to traditional storage systems in which data lifetimes are measured in months or possibly years, data lifetimes in an archival system are measured in decades. Secure archival storage has the added goal of providing controlled access to its long-term contents. In contrast, public archival systems aim to ensure that their contents are available to anyone.Since secure archival storage systems must store data over much longer periods of time, new threats emerge that affect the security landscape in many novel, subtle ways. These security threats endanger the secrecy, availability and integrity of the archival storage contents. Adequate understanding of these threats is essential to effectively devise new policies and mechanisms to guard against them. We discuss many of these threats in this new context to fill this gap, and show how existing systems meet (or fail to meet) these threats.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126005844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaotao Liu, Gal Niv, P. Shenoy, Kadangode K. Ramakrishnan, J. Merwe
This paper argues that the network latency due to synchronous replication is no longer tolerable in scenarios where businesses are required by regulation to separate their secondary sites from the primary by hundreds of miles. We propose a semantic-aware remote replication system to meet the contrasting needs of both system efficiency and safe remote replication with tight recovery-point and recovery-time objectives. Using experiments conducted on a commercial replication system and on a Linux file system we show that (i) unlike synchronous replication, asynchronous replication is relatively insensitive to network latency, and (ii) applications such as databases already intelligently deal with the weak persistency semantics offered by modern file systems. Our proposed system attempts to use asynchronous replication whenever possible and uses application/file-system "signals" to maintain synchrony between the primary and secondary sites. We present a high-level design of our system and discuss several potential challenges that need to be addressed in such a system.
{"title":"The case for semantic aware remote replication","authors":"Xiaotao Liu, Gal Niv, P. Shenoy, Kadangode K. Ramakrishnan, J. Merwe","doi":"10.1145/1179559.1179575","DOIUrl":"https://doi.org/10.1145/1179559.1179575","url":null,"abstract":"This paper argues that the network latency due to synchronous replication is no longer tolerable in scenarios where businesses are required by regulation to separate their secondary sites from the primary by hundreds of miles. We propose a semantic-aware remote replication system to meet the contrasting needs of both system efficiency and safe remote replication with tight recovery-point and recovery-time objectives. Using experiments conducted on a commercial replication system and on a Linux file system we show that (i) unlike synchronous replication, asynchronous replication is relatively insensitive to network latency, and (ii) applications such as databases already intelligently deal with the weak persistency semantics offered by modern file systems. Our proposed system attempts to use asynchronous replication whenever possible and uses application/file-system \"signals\" to maintain synchrony between the primary and secondary sites. We present a high-level design of our system and discuss several potential challenges that need to be addressed in such a system.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133500797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Backing up important data is crucial. A variety of causes can lead to data loss, such as disk failures, administration errors, virus infiltration, theft, and physical damage to equipment. Users and businesses have important information that is difficult to replace, such as financial records and contacts. Reliable backups are crucial because some data cannot be replaced, while recreating other data can be expensive in terms of time and money. We propose two methods which leverage various types of free Web storage to provide simple, reliable, and free backup solutions.The first method is based on the storage of data in the caches of Internet search engines. We have developed CrawlBackup, a tool which prepares and provides the data for Web crawlers and can then restore the data from the Internet even if all the data on the original computer is unavailable. The second method, called MailBackup, stores redundant copies of the important data in the mailboxes of Internet mail services. We have successfully used these backup systems since the middle of 2005. In this paper we discuss and compare these methods, their feasibility of deployment, their security, and their flexibility.
{"title":"Using free web storage for data backup","authors":"Avishay Traeger, N. Joukov, J. Sipek, E. Zadok","doi":"10.1145/1179559.1179574","DOIUrl":"https://doi.org/10.1145/1179559.1179574","url":null,"abstract":"Backing up important data is crucial. A variety of causes can lead to data loss, such as disk failures, administration errors, virus infiltration, theft, and physical damage to equipment. Users and businesses have important information that is difficult to replace, such as financial records and contacts. Reliable backups are crucial because some data cannot be replaced, while recreating other data can be expensive in terms of time and money. We propose two methods which leverage various types of free Web storage to provide simple, reliable, and free backup solutions.The first method is based on the storage of data in the caches of Internet search engines. We have developed CrawlBackup, a tool which prepares and provides the data for Web crawlers and can then restore the data from the Internet even if all the data on the original computer is unavailable. The second method, called MailBackup, stores redundant copies of the important data in the mailboxes of Internet mail services. We have successfully used these backup systems since the middle of 2005. In this paper we discuss and compare these methods, their feasibility of deployment, their security, and their flexibility.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122679314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper has three goals. (1) We try to debunk several held misconceptions about secure deletion: that encryption is an ideal solution for everybody, that existing data-overwriting tools work well, and that securely deleted files must be overwritten many times. (2) We discuss new and important issues that are often neglected: secure deletion consistency in case of power failures, handling versioning and journalling file systems, and metadata overwriting. (3) We present two solutions for on-demand secure deletion. First, we have created a highly portable and flexible system that performs only the minimal amount of work in kernel mode. Second, we present two in-kernel solutions in the form of Ext3 file system patches that can perform comprehensive data and metadata overwriting. We evaluated our proposed solutions and discuss the trade-offs involved.
{"title":"Secure deletion myths, issues, and solutions","authors":"N. Joukov, Harry Papaxenopoulos, E. Zadok","doi":"10.1145/1179559.1179571","DOIUrl":"https://doi.org/10.1145/1179559.1179571","url":null,"abstract":"This paper has three goals. (1) We try to debunk several held misconceptions about secure deletion: that encryption is an ideal solution for everybody, that existing data-overwriting tools work well, and that securely deleted files must be overwritten many times. (2) We discuss new and important issues that are often neglected: secure deletion consistency in case of power failures, handling versioning and journalling file systems, and metadata overwriting. (3) We present two solutions for on-demand secure deletion. First, we have created a highly portable and flexible system that performs only the minimal amount of work in kernel mode. Second, we present two in-kernel solutions in the form of Ext3 file system patches that can perform comprehensive data and metadata overwriting. We evaluated our proposed solutions and discuss the trade-offs involved.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114170088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Batch-correlated failures result from the manifestation of a common defect in most, if not all, disk drives belonging to the same production batch. They are much less frequent than random disk failures but can cause catastrophic data losses even in systems that rely on mirroring or erasure codes to protect their data. We propose to reduce impact of batch-correlated failures on disk arrays by storing redundant copies of the same data on disks from different batches and, possibly, different manufacturers. The technique is especially attractive for mirrored organizations as it only requires that the two disks that hold copies of the same data never belong to the same production batch. We also show that even partial diversity can greatly increase the probability that the data stored in a RAID array will survive batch-correlated failures.
{"title":"Using device diversity to protect data against batch-correlated disk failures","authors":"Jehan-Francois Pâris, D. Long","doi":"10.1145/1179559.1179568","DOIUrl":"https://doi.org/10.1145/1179559.1179568","url":null,"abstract":"Batch-correlated failures result from the manifestation of a common defect in most, if not all, disk drives belonging to the same production batch. They are much less frequent than random disk failures but can cause catastrophic data losses even in systems that rely on mirroring or erasure codes to protect their data. We propose to reduce impact of batch-correlated failures on disk arrays by storing redundant copies of the same data on disks from different batches and, possibly, different manufacturers. The technique is especially attractive for mirrored organizations as it only requires that the two disks that hold copies of the same data never belong to the same production batch. We also show that even partial diversity can greatly increase the probability that the data stored in a RAID array will survive batch-correlated failures.","PeriodicalId":413919,"journal":{"name":"ACM International Workshop on Storage Security And Survivability","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115840545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}