Julen Bernabé-Rodríguez, Albert Garreta, Oscar Lage
Big data has proven to be a very useful tool for companies and users, but companies with larger datasets have ended being more competitive than the others thanks to machine learning or artificial inteligence. Secure multi-party computation (SMPC) allows the smaller companies to jointly train arbitrary models on their private data while assuring privacy, and thus gives data owners the ability to perform what are currently known as federated learning algorithms. Besides, with a blockchain it is possible to coordinate and audit those computations in a decentralized way. In this document, we consider a private data marketplace as a space where researchers and data owners meet to agree the use of private data for statistics or more complex model trainings. This document presents a candidate architecure for a private data marketplace by combining SMPC and a public, general-purpose blockchain. Such a marketplace is proposed as a smart contract deployed in the blockchain, while the privacy preserving computation is held by SMPC.
{"title":"A Decentralized Private Data Marketplace using Blockchain and Secure Multi-Party Computation","authors":"Julen Bernabé-Rodríguez, Albert Garreta, Oscar Lage","doi":"10.1145/3652162","DOIUrl":"https://doi.org/10.1145/3652162","url":null,"abstract":"<p>Big data has proven to be a very useful tool for companies and users, but companies with larger datasets have ended being more competitive than the others thanks to machine learning or artificial inteligence. Secure multi-party computation (SMPC) allows the smaller companies to jointly train arbitrary models on their private data while assuring privacy, and thus gives data owners the ability to perform what are currently known as federated learning algorithms. Besides, with a blockchain it is possible to coordinate and audit those computations in a decentralized way. In this document, we consider a private data marketplace as a space where researchers and data owners meet to agree the use of private data for statistics or more complex model trainings. This document presents a candidate architecure for a private data marketplace by combining SMPC and a public, general-purpose blockchain. Such a marketplace is proposed as a smart contract deployed in the blockchain, while the privacy preserving computation is held by SMPC.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Bayer, Philipp Kuehn, Ramin Shanehsaz, Christian Reuter
The field of cybersecurity is evolving fast. Security professionals are in need of intelligence on past, current and - ideally - on upcoming threats, because attacks are becoming more advanced and are increasingly targeting larger and more complex systems. Since the processing and analysis of such large amounts of information cannot be addressed manually, cybersecurity experts rely on machine learning techniques. In the textual domain, pre-trained language models like BERT have proven to be helpful as they provide a good baseline for further fine-tuning. However, due to the domain-knowledge and the many technical terms in cybersecurity, general language models might miss the gist of textual information. For this reason, we create a high-quality dataset and present a language model specifically tailored to the cybersecurity domain which can serve as a basic building block for cybersecurity systems. The model is compared on 15 tasks: Domain-dependent extrinsic tasks for measuring the performance on specific problems, intrinsic tasks for measuring the performance of the internal representations of the model as well as general tasks from the SuperGLUE benchmark. The results of the intrinsic tasks show that our model improves the internal representation space of domain words compared to the other models. The extrinsic, domain-dependent tasks, consisting of sequence tagging and classification, show that the model performs best in cybersecurity scenarios. In addition, we pay special attention to the choice of hyperparameters against catastrophic forgetting, as pre-trained models tend to forget the original knowledge during further training.
{"title":"CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain","authors":"Markus Bayer, Philipp Kuehn, Ramin Shanehsaz, Christian Reuter","doi":"10.1145/3652594","DOIUrl":"https://doi.org/10.1145/3652594","url":null,"abstract":"<p>The field of cybersecurity is evolving fast. Security professionals are in need of intelligence on past, current and - ideally - on upcoming threats, because attacks are becoming more advanced and are increasingly targeting larger and more complex systems. Since the processing and analysis of such large amounts of information cannot be addressed manually, cybersecurity experts rely on machine learning techniques. In the textual domain, pre-trained language models like BERT have proven to be helpful as they provide a good baseline for further fine-tuning. However, due to the domain-knowledge and the many technical terms in cybersecurity, general language models might miss the gist of textual information. For this reason, we create a high-quality dataset and present a language model specifically tailored to the cybersecurity domain which can serve as a basic building block for cybersecurity systems. The model is compared on 15 tasks: Domain-dependent extrinsic tasks for measuring the performance on specific problems, intrinsic tasks for measuring the performance of the internal representations of the model as well as general tasks from the SuperGLUE benchmark. The results of the intrinsic tasks show that our model improves the internal representation space of domain words compared to the other models. The extrinsic, domain-dependent tasks, consisting of sequence tagging and classification, show that the model performs best in cybersecurity scenarios. In addition, we pay special attention to the choice of hyperparameters against catastrophic forgetting, as pre-trained models tend to forget the original knowledge during further training.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adaptive authentication enables smartphones and enterprise apps to decide when and how to authenticate users based on contextual and behavioral factors. In practice, a system may employ multiple policies to adapt its authentication mechanisms and access controls to various scenarios. However, existing approaches suffer from contradictory or insecure adaptations, which may enable attackers to bypass the authentication system. Besides, most existing approaches are inflexible and do not provide desirable access controls. We design and build a multi-stage risk-aware adaptive authentication and access control framework (MRAAC), which provides the following novel contributions: Multi-stage: