Large genomic datasets are created through numerous activities, including recreational genealogical investigations, biomedical research, and clinical care. At the same time, genomic data has become valuable for reuse beyond their initial point of collection, but privacy concerns often hinder access. Beacon services have emerged to broaden accessibility to such data. These services enable users to query for the presence of a particular minor allele in a dataset, and information helps care providers determine if genomic variation is spurious or has some known clinical indication. However, various studies have shown that this process can leak information regarding if individuals are members of the underlying dataset. There are various approaches to mitigate this vulnerability, but they are limited in that they (1) typically rely on heuristics to add noise to the Beacon responses; (2) offer probabilistic privacy guarantees only, neglecting data utility; and (3) assume a batch setting where all queries arrive at once. In this article, we present a novel algorithmic framework to ensure privacy in a Beacon service setting with a minimal number of query response flips. We represent this problem as one of combinatorial optimization in both the batch setting and the online setting (where queries arrive sequentially). We introduce principled algorithms with both privacy and, in some cases, worst-case utility guarantees. Moreover, through extensive experiments, we show that the proposed approaches significantly outperform the state of the art in terms of privacy and utility, using a dataset consisting of 800 individuals and 1.3 million single nucleotide variants.
In this work, we enhance the EasyCrypt proof assistant to reason about the computational complexity of adversaries. The key technical tool is a Hoare logic for reasoning about computational complexity (execution time and oracle calls) of adversarial computations. Our Hoare logic is built on top of the module system used by EasyCrypt for modeling adversaries. We prove that our logic is sound w.r.t. the semantics of EasyCrypt programs—we also provide full semantics for the EasyCrypt module system, which was lacking previously.
We showcase (for the first time in EasyCrypt and in other computer-aided cryptographic tools) how our approach can express precise relationships between the probability of adversarial success and their execution time. In particular, we can quantify existentially over adversaries in a complexity class and express general composition statements in simulation-based frameworks. Moreover, such statements can be composed to derive standard concrete security bounds for cryptographic constructions whose security is proved in a modular way. As a main benefit of our approach, we revisit security proofs of some well-known cryptographic constructions and present a new formalization of universal composability.
The linkage of records to identify common entities across multiple data sources has gained increasing interest over the last few decades. In the absence of unique entity identifiers, quasi-identifying attributes such as personal names and addresses are generally used to link records. Due to privacy concerns that arise when such sensitive information is used, privacy-preserving record linkage (PPRL) methods have been proposed to link records without revealing any sensitive or confidential information about these records. Popular PPRL methods such as Bloom filter encoding, however, are known to be susceptible to various privacy attacks. Therefore, a systematic analysis of the privacy risks associated with sensitive databases as well as PPRL methods used in linkage projects is of great importance. In this article we present a novel framework to assess the vulnerabilities of sensitive databases and existing PPRL encoding methods. We discuss five types of vulnerabilities: frequency, length, co-occurrence, similarity, and similarity neighborhood, of both plaintext and encoded values that an adversary can exploit in order to reidentify sensitive plaintext values from encoded data. In an experimental evaluation we assess the vulnerabilities of two databases using five existing PPRL encoding methods. This evaluation shows that our proposed framework can be used in real-world linkage applications to assess the vulnerabilities associated with sensitive databases to be linked, as well as with PPRL encoding methods.
Lateral movement is a key stage of system compromise used by advanced persistent threats. Detecting it is no simple task. When network host logs are abstracted into discrete temporal graphs, the problem can be reframed as anomalous edge detection in an evolving network. Research in modern deep graph learning techniques has produced many creative and complicated models for this task. However, as is the case in many machine learning fields, the generality of models is of paramount importance for accuracy and scalability during training and inference. In this article, we propose a formalized approach to this problem with a framework we call Euler. It consists of a model-agnostic graph neural network stacked upon a model-agnostic sequence encoding layer such as a recurrent neural network. Models built according to the Euler framework can easily distribute their graph convolutional layers across multiple machines for large performance improvements. Additionally, we demonstrate that Euler-based models are as good, or better, than every state-of-the-art approach to anomalous link detection and prediction that we tested. As anomaly-based intrusion detection systems, our models efficiently identified anomalous connections between entities with high precision and outperformed all other unsupervised techniques for anomalous lateral movement detection. Additionally, we show that as a piece of a larger anomaly detection pipeline, Euler models perform well enough for use in real-world systems. With more advanced, yet still lightweight, alerting mechanisms ingesting the embeddings produced by Euler models, precision is boosted from 0.243, to 0.986 on real-world network traffic.
This article presents an approach to provide strong assurance of the secure execution of distributed event-driven applications on shared infrastructures, while relying on a small Trusted Computing Base. We build upon and extend security primitives provided by Trusted Execution Environments (TEEs) to guarantee authenticity and integrity properties of applications, and to secure control of input and output devices. More specifically, we guarantee that if an output is produced by the application, it was allowed to be produced by the application’s source code based on an authentic trace of inputs.
We present an integrated open-source framework to develop, deploy, and use such applications across heterogeneous TEEs. Beyond authenticity and integrity, our framework optionally provides confidentiality and a notion of availability, and facilitates software development at a high level of abstraction over the platform-specific TEE layer. We support event-driven programming to develop distributed enclave applications in Rust and C for heterogeneous TEE, including Intel SGX, ARM TrustZone, and Sancus.
In this article we discuss the workings of our approach, the extensions we made to the Sancus processor, and the integration of our development model with commercial TEEs. Our evaluation of security and performance aspects show that TEEs, together with our programming model, form a basis for powerful security architectures for dependable systems in domains such as Industrial Control Systems and the Internet of Things, illustrating our framework’s unique suitability for a broad range of use cases which combine cloud processing, mobile and edge devices, and lightweight sensing and actuation.
Collaborative machine learning settings such as federated learning can be susceptible to adversarial interference and attacks. One class of such attacks is termed model inversion attacks, characterised by the adversary reverse-engineering the model into disclosing the training data. Previous implementations of this attack typically only rely on the shared data representations, ignoring the adversarial priors, or require that specific layers are present in the target model, reducing the potential attack surface. In this work, we propose a novel context-agnostic model inversion framework that builds on the foundations of gradient-based inversion attacks, but additionally exploits the features and the style of the data controlled by an in-the-network adversary. Our technique outperforms existing gradient-based approaches both qualitatively and quantitatively across all training settings, showing particular effectiveness against the collaborative medical imaging tasks. Finally, we demonstrate that our method achieves significant success on two downstream tasks: sensitive feature inference and facial recognition spoofing.
Multi-user (mu) security considers large-scale attackers that, given access to a number of cryptosystem instances, attempt to compromise at least one of them. We initiate the study of mu security of the so-called GGM tree that stems from the pseudorandom generator to pseudorandom function transformation of Goldreich, Goldwasser, and Micali, with a goal to provide references for its recently popularized use in applied cryptography. We propose a generalized model for GGM trees and analyze its mu prefix-constrained pseudorandom function security in the random oracle model. Our model allows to derive concrete bounds and improvements for various protocols, and we showcase on the Bitcoin-Improvement-Proposal standard
Recent advances of consensus control have made it significant in multi-agent systems such as in distributed machine learning, distributed multi-vehicle cooperative systems. However, during its application it is crucial to achieve resilience and privacy; specifically, when there are adversary/faulty nodes in a general topology structure, normal agents can also reach consensus while keeping their actual states unobserved.
In this article, we modify the state-of-the-art Q-consensus algorithm by introducing predefined noise or well-designed cryptography to guarantee the privacy of each agent state. In the former case, we add specified noise on agent state before it is transmitted to the neighbors and then gradually decrease the value of noise so the exact agent state cannot be evaluated. In the latter one, the Paillier cryptosystem is applied for reconstructing reward function in two consecutive interactions between each pair of neighboring agents. Therefore, multi-agent privacy-preserving resilient consensus (MAPPRC) can be achieved in a general topology structure. Moreover, in the modified version, we reconstruct reward function and credibility function so both convergence rate and stability of the system are improved.
The simulation results indicate the algorithms’ tolerance for constant and/or persistent faulty agents as well as their protection of privacy. Compared with the previous studies that consider both resilience and privacy-preserving requirements, the proposed algorithms in this article greatly relax the topological conditions. At the end of the article, to verify the effectiveness of the proposed algorithms, we conduct two sets of experiments, i.e., a smart-car hardware platform consisting of four vehicles and a distributed machine learning platform containing 10 workers and a server.