Antonios Xenakis, S. Nourin, Zhiyuan Chen, George Karabatis, Ahmed Aleroud, Jhancy Amarsingh
A large volume of network trace data are collected by the government, public, and private organizations, and can be analyzed for various purposes such as resolving network problems, improving network performance, and understanding user behavior. However, most organizations are reluctant to share their data with any external experts for analysis because it contains sensitive information deemed proprietary to the organization, thus raising privacy concerns. Even if the payload of network packets is not shared, header data may disclose sensitive information that adversaries can exploit to perform unauthorized actions. So network trace data needs to be anonymized before being shared. Most of existing anonymization tools have two major shortcomings: 1) they cannot provide provable protection; 2) their performance relies on setting the right parameter values such as the degree of privacy protection and the features that should be anonymized, but there is little assistance for a user to optimally set these parameters. This paper proposes a self-adaptive and secure approach to anonymize network trace data, and provides provable protection and automatic optimal settings of parameters. A comparison of the proposed approach with existing anonymization tools via experimentation demonstrated that the proposed method outperforms the existing anonymization techniques.
{"title":"A Self-Adaptive and Secure Approach to Share Network Trace Data","authors":"Antonios Xenakis, S. Nourin, Zhiyuan Chen, George Karabatis, Ahmed Aleroud, Jhancy Amarsingh","doi":"10.1145/3617181","DOIUrl":"https://doi.org/10.1145/3617181","url":null,"abstract":"A large volume of network trace data are collected by the government, public, and private organizations, and can be analyzed for various purposes such as resolving network problems, improving network performance, and understanding user behavior. However, most organizations are reluctant to share their data with any external experts for analysis because it contains sensitive information deemed proprietary to the organization, thus raising privacy concerns. Even if the payload of network packets is not shared, header data may disclose sensitive information that adversaries can exploit to perform unauthorized actions. So network trace data needs to be anonymized before being shared. Most of existing anonymization tools have two major shortcomings: 1) they cannot provide provable protection; 2) their performance relies on setting the right parameter values such as the degree of privacy protection and the features that should be anonymized, but there is little assistance for a user to optimally set these parameters. This paper proposes a self-adaptive and secure approach to anonymize network trace data, and provides provable protection and automatic optimal settings of parameters. A comparison of the proposed approach with existing anonymization tools via experimentation demonstrated that the proposed method outperforms the existing anonymization techniques.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128574940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Koen van Hove, J. V. D. Ham, Roland van Rijswijk-Deij
The Resource Public Key Infrastucture (RPKI) has been created to solve security short-comings of the Border Gateway Protocol (BGP). This creates an infrastructure where resource holders (ASes) can make attestations about their resources (IP-subnets). RPKI Certificate Authorities make these attestations available at Publication Points. Relying Party software retrieves and processes the RPKI-related data from all publication points, validates the data and makes it available to routers so they can make secure routing decisions. We contribute to this work by doing a threat analysis for Relying Party software, where an attacker controls a Certificate Authority and Publication Point. We implement a prototype testbed to analyse how current Relying Party software implementations react to scenarios originating from that threat model. Our results show that all current Relying Party software was susceptible to at least one of the identified threats. In addition to this, we also identified threats stemming from choices made in the protocol itself. Taken together, these threats potentially allowed an attacker to fully disrupt all RPKI Relying Party software on a global scale. We elaborate on our process, and we discuss the types of responses we received from other parties. We performed a Coordinated Vulnerability Disclosure to the implementers.
资源公钥基础设施(Resource Public Key infrastructure, RPKI)的诞生是为了解决边界网关协议BGP (Border Gateway Protocol)的安全缺陷。这创建了一个基础设施,资源持有者(ase)可以在其中对其资源(ip子网)进行证明。RPKI证书颁发机构在发布点提供这些证明。依赖方软件从所有发布点检索和处理rpki相关数据,验证数据并将其提供给路由器,以便它们可以做出安全的路由决策。我们通过对依赖方软件进行威胁分析来促进这项工作,其中攻击者控制着证书颁发机构和发布点。我们实现了一个原型测试平台,以分析当前的依赖方软件实现如何对源自该威胁模型的场景做出反应。我们的结果显示,所有当前的依赖方软件都容易受到至少一种已确定威胁的影响。除此之外,我们还确定了源自协议本身所做选择的威胁。综上所述,这些威胁可能使攻击者能够在全球范围内完全破坏所有RPKI依赖方软件。我们详细说明了我们的流程,并讨论了我们从其他各方收到的回应类型。我们对实现者执行了协调漏洞披露。
{"title":"rpkiller: Threat Analysis of the BGP Resource Public Key Infrastructure","authors":"Koen van Hove, J. V. D. Ham, Roland van Rijswijk-Deij","doi":"10.1145/3617182","DOIUrl":"https://doi.org/10.1145/3617182","url":null,"abstract":"The Resource Public Key Infrastucture (RPKI) has been created to solve security short-comings of the Border Gateway Protocol (BGP). This creates an infrastructure where resource holders (ASes) can make attestations about their resources (IP-subnets). RPKI Certificate Authorities make these attestations available at Publication Points. Relying Party software retrieves and processes the RPKI-related data from all publication points, validates the data and makes it available to routers so they can make secure routing decisions. We contribute to this work by doing a threat analysis for Relying Party software, where an attacker controls a Certificate Authority and Publication Point. We implement a prototype testbed to analyse how current Relying Party software implementations react to scenarios originating from that threat model. Our results show that all current Relying Party software was susceptible to at least one of the identified threats. In addition to this, we also identified threats stemming from choices made in the protocol itself. Taken together, these threats potentially allowed an attacker to fully disrupt all RPKI Relying Party software on a global scale. We elaborate on our process, and we discuss the types of responses we received from other parties. We performed a Coordinated Vulnerability Disclosure to the implementers.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133136523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kate Connolly, Anna Klempay, Mary McCann, P. Brenner
The dark web has become an increasingly important landscape for the sale of illicit cyber goods. Given the prevalence of malware and tools that are used to steal data from individuals on these markets, it is crucial that every company, governing body, and cyber professional be aware of what information is sold on these marketplaces. Knowing this information will allow these entities to protect themselves against cyber attacks and from information breaches. In this paper, we announce the public release of a data set on dark web marketplaces’ cybersecurity-related listings. We spent multiple years seeking out websites that sold illicit digital goods and collected data on the available products. Due to the marketplaces’ varied and complex layers of security, we leveraged the flexible Selenium WebDriver with Python to navigate the web pages and collect data. We present analysis of categories of malicious cyber goods sold on marketplaces, prices, persistent vendors, ratings, and other basic information on marketplace storefronts. Additionally, we share the tools and techniques we’ve compiled, enabling others to scrape dark web marketplaces at a significantly lower risk. We invite professionals who opt to gather data from the dark web to contribute to the publicly shared threat intelligence resource.
{"title":"Dark Web Marketplaces: Data for Collaborative Threat Intelligence","authors":"Kate Connolly, Anna Klempay, Mary McCann, P. Brenner","doi":"10.1145/3615666","DOIUrl":"https://doi.org/10.1145/3615666","url":null,"abstract":"The dark web has become an increasingly important landscape for the sale of illicit cyber goods. Given the prevalence of malware and tools that are used to steal data from individuals on these markets, it is crucial that every company, governing body, and cyber professional be aware of what information is sold on these marketplaces. Knowing this information will allow these entities to protect themselves against cyber attacks and from information breaches. In this paper, we announce the public release of a data set on dark web marketplaces’ cybersecurity-related listings. We spent multiple years seeking out websites that sold illicit digital goods and collected data on the available products. Due to the marketplaces’ varied and complex layers of security, we leveraged the flexible Selenium WebDriver with Python to navigate the web pages and collect data. We present analysis of categories of malicious cyber goods sold on marketplaces, prices, persistent vendors, ratings, and other basic information on marketplace storefronts. Additionally, we share the tools and techniques we’ve compiled, enabling others to scrape dark web marketplaces at a significantly lower risk. We invite professionals who opt to gather data from the dark web to contribute to the publicly shared threat intelligence resource.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128908604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik Hemberg, Matthew Turner, Nick Rutar, Una-May O’Reilly
Cross-linked threat, vulnerability, and defensive mitigation knowledge is critical in defending against diverse and dynamic cyber threats. Cyber analysts consult it by deductively or inductively creating a chain of reasoning to identify a threat starting from indicators they observe, or vice versa. Cyber hunters use it abductively to reason when hypothesizing specific threats. Threat modelers use it to explore threat postures. We aggregate five public sources of threat knowledge and three public sources of knowledge that describe cyber defensive mitigations, analytics and engagements, and which share some unidirectional links between them. We unify the sources into a graph, and in the graph we make all unidirectional cross-source links bidirectional. This enhancement of the knowledge makes the questions that analysts and automated systems formulate easier to answer. We demonstrate this in the context of various cyber analytic and hunting tasks, as well as modeling and simulations. Because the number of linked entries is very sparse, to further increase the analytic utility of the data, we use natural language processing and supervised machine learning to identify new links. These two contributions demonstrably increase the value of the knowledge sources for cyber security activities.
{"title":"Enhancements to Threat, Vulnerability, and Mitigation Knowledge For Cyber Analytics, Hunting, and Simulations","authors":"Erik Hemberg, Matthew Turner, Nick Rutar, Una-May O’Reilly","doi":"10.1145/3615668","DOIUrl":"https://doi.org/10.1145/3615668","url":null,"abstract":"Cross-linked threat, vulnerability, and defensive mitigation knowledge is critical in defending against diverse and dynamic cyber threats. Cyber analysts consult it by deductively or inductively creating a chain of reasoning to identify a threat starting from indicators they observe, or vice versa. Cyber hunters use it abductively to reason when hypothesizing specific threats. Threat modelers use it to explore threat postures. We aggregate five public sources of threat knowledge and three public sources of knowledge that describe cyber defensive mitigations, analytics and engagements, and which share some unidirectional links between them. We unify the sources into a graph, and in the graph we make all unidirectional cross-source links bidirectional. This enhancement of the knowledge makes the questions that analysts and automated systems formulate easier to answer. We demonstrate this in the context of various cyber analytic and hunting tasks, as well as modeling and simulations. Because the number of linked entries is very sparse, to further increase the analytic utility of the data, we use natural language processing and supervised machine learning to identify new links. These two contributions demonstrably increase the value of the knowledge sources for cyber security activities.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128311445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern computer architectures are complex, containing numerous components that can unintentionally reveal system operating properties. Defensive security professionals seek to minimize this kind of exposure while adversaries can leverage the data to attain an advantage. This article presents a novel covert interrogator program technique using light-weight sensor programs to target integer, floating point, and memory units within a computer’s architecture to collect data that can be used to match a running program to a known set of programs with up to 100% accuracy under simultaneous multithreading conditions. This technique is applicable to a broad spectrum of architectural components, does not rely on specific vulnerabilities, nor requires elevated privileges. Furthermore, this research demonstrates the technique in a system with operating system containers intended to provide isolation guarantees that limit a user’s ability to observe the activity of other users. In essence, this research exploits observable noise that is present whenever a program executes on a modern computer. This article presents interrogator program design considerations, a machine learning approach to identify models with high classification accuracy, and measures the effectiveness of the approach under a variety of program execution scenarios.
{"title":"Classifying Co-resident Computer Programs Using Information Revealed by Resource Contention","authors":"Tor J. Langehaug, B. Borghetti, Scott Graham","doi":"10.1145/3464306","DOIUrl":"https://doi.org/10.1145/3464306","url":null,"abstract":"Modern computer architectures are complex, containing numerous components that can unintentionally reveal system operating properties. Defensive security professionals seek to minimize this kind of exposure while adversaries can leverage the data to attain an advantage. This article presents a novel covert interrogator program technique using light-weight sensor programs to target integer, floating point, and memory units within a computer’s architecture to collect data that can be used to match a running program to a known set of programs with up to 100% accuracy under simultaneous multithreading conditions. This technique is applicable to a broad spectrum of architectural components, does not rely on specific vulnerabilities, nor requires elevated privileges. Furthermore, this research demonstrates the technique in a system with operating system containers intended to provide isolation guarantees that limit a user’s ability to observe the activity of other users. In essence, this research exploits observable noise that is present whenever a program executes on a modern computer. This article presents interrogator program design considerations, a machine learning approach to identify models with high classification accuracy, and measures the effectiveness of the approach under a variety of program execution scenarios.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"71 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131673470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To tackle issues associated with phishing website attacks, the study conducted rigorous experiments on RF, GB, and CATB classifiers. Since each classifier was an ensemble learner on their own; we integrated them into stacking and majority vote ensemble architectures to create hybrid-ensemble learning. Due to ensemble learning methods being known for their high computational time costs, the study applied the UFS technique to address these concerns and obtained promising results. Since the scalability and performance consistency of the phishing website detection system across numerous datasets is critical to combating various variants of phishing website attacks, we used three distinct phishing website datasets (DS-1, DS-2, and DS-3) to train and test each ensemble learning method to identify the best-performed one in terms of accuracy and model computational time. Our experimental findings reveal that the CATB classifier demonstrated scalable, consistent, and superior accuracy across three distinct datasets (attained 97.9% accuracy in DS-1, 97.36% accuracy in DS-2, and 98.59% accuracy in DS-3). When it comes to model computational time, the RF classifier was discovered to be the fastest when applied to all datasets, while the CATB classifier was discovered to be the second quickest when applied to all datasets.
{"title":"Single and Hybrid-Ensemble Learning-Based Phishing Website Detection: Examining Impacts of Varied Nature Datasets and Informative Feature Selection Technique","authors":"Kibreab Adane, Berhanu Beyene, Mohammed Abebe","doi":"10.1145/3611392","DOIUrl":"https://doi.org/10.1145/3611392","url":null,"abstract":"To tackle issues associated with phishing website attacks, the study conducted rigorous experiments on RF, GB, and CATB classifiers. Since each classifier was an ensemble learner on their own; we integrated them into stacking and majority vote ensemble architectures to create hybrid-ensemble learning. Due to ensemble learning methods being known for their high computational time costs, the study applied the UFS technique to address these concerns and obtained promising results. Since the scalability and performance consistency of the phishing website detection system across numerous datasets is critical to combating various variants of phishing website attacks, we used three distinct phishing website datasets (DS-1, DS-2, and DS-3) to train and test each ensemble learning method to identify the best-performed one in terms of accuracy and model computational time. Our experimental findings reveal that the CATB classifier demonstrated scalable, consistent, and superior accuracy across three distinct datasets (attained 97.9% accuracy in DS-1, 97.36% accuracy in DS-2, and 98.59% accuracy in DS-3). When it comes to model computational time, the RF classifier was discovered to be the fastest when applied to all datasets, while the CATB classifier was discovered to be the second quickest when applied to all datasets.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130543577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Husák, Pavol Sokol, M. Zádník, Václav Bartos, M. Horák
Sharing the alerts from intrusion detection systems among multiple computer networks and organizations allows for seeing the “big picture” of the network security situation and improves the capabilities of cyber incident response. However, such a task requires a number of technical and non-technical issues to be resolved, from data collection and distribution to proper categorization, data quality management, and issues of trust and privacy. In this field note, we illustrate the concepts and provide lessons learned on the example of SABU, an alert sharing and analysis platform used by academia and partner organizations in the Czech Republic. We discuss the initial willingness to share the data that was later weakened by the uncertainties around personal data protection, the issues of high volume and low quality of the data that prevented their straightforward use, and that the management of the community is a more severe issue than the technical implementation of alert sharing.
{"title":"Lessons Learned from Automated Sharing of Intrusion Detection Alerts: The Case of the SABU Platform","authors":"M. Husák, Pavol Sokol, M. Zádník, Václav Bartos, M. Horák","doi":"10.1145/3611391","DOIUrl":"https://doi.org/10.1145/3611391","url":null,"abstract":"Sharing the alerts from intrusion detection systems among multiple computer networks and organizations allows for seeing the “big picture” of the network security situation and improves the capabilities of cyber incident response. However, such a task requires a number of technical and non-technical issues to be resolved, from data collection and distribution to proper categorization, data quality management, and issues of trust and privacy. In this field note, we illustrate the concepts and provide lessons learned on the example of SABU, an alert sharing and analysis platform used by academia and partner organizations in the Czech Republic. We discuss the initial willingness to share the data that was later weakened by the uncertainties around personal data protection, the issues of high volume and low quality of the data that prevented their straightforward use, and that the management of the community is a more severe issue than the technical implementation of alert sharing.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114699895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The phenomenon of the non-consensual distribution of intimate or sexually explicit digital images of adults, a.k.a. non-consensual pornography (NCP) or revenge pornography is under the spotlight for the toll is taking on society. Law enforcement statistics confirm a dramatic global rise in abuses. For this reason, the research community is investigating different strategies to fight and mitigate the abuses and their effects. Since the phenomenon involves different aspects of personal and social interaction among users of social media and content sharing platforms, in the literature it is addressed under different academic disciplines. However, while most of the literature reviews focus on non-consensual pornography either from a social science or psychological perspective, to the best of our knowledge a systematic review of the research on the technical aspects of the problem is still missing. In this work, we present a Systematic Mapping Study (SMS) of the literature, looking at this interdisciplinary phenomenon through a technical lens. Therefore, we focus on the cyber side of the crime of non-consensual pornography with the aim of describing the state-of-the-art and the future lines of research from a technical and quantitative perspective.
{"title":"Mapping the Interdisciplinary Research on Non-consensual Pornography: Technical and Quantitative Perspectives","authors":"M. Falduti, Sergio Tessaris","doi":"10.1145/3608483","DOIUrl":"https://doi.org/10.1145/3608483","url":null,"abstract":"The phenomenon of the non-consensual distribution of intimate or sexually explicit digital images of adults, a.k.a. non-consensual pornography (NCP) or revenge pornography is under the spotlight for the toll is taking on society. Law enforcement statistics confirm a dramatic global rise in abuses. For this reason, the research community is investigating different strategies to fight and mitigate the abuses and their effects. Since the phenomenon involves different aspects of personal and social interaction among users of social media and content sharing platforms, in the literature it is addressed under different academic disciplines. However, while most of the literature reviews focus on non-consensual pornography either from a social science or psychological perspective, to the best of our knowledge a systematic review of the research on the technical aspects of the problem is still missing. In this work, we present a Systematic Mapping Study (SMS) of the literature, looking at this interdisciplinary phenomenon through a technical lens. Therefore, we focus on the cyber side of the crime of non-consensual pornography with the aim of describing the state-of-the-art and the future lines of research from a technical and quantitative perspective.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122688296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital forensics depends on data sets for various purposes like concept evaluation, educational training, and tool validation. Researchers have gathered such data sets into repositories and created data simulation frameworks for producing large amounts of data. Synthetic data often face skepticism due to its perceived deviation from real-world data, raising doubts about its realism. This paper addresses this concern, arguing that there is no definitive answer. We focus on four common digital forensic use cases that rely on data. Through these, we elucidate the specifications and prerequisites of data sets within their respective contexts. Our discourse uncovers that both real-world and synthetic data are indispensable for advancing digital forensic science, software, tools, and the competence of practitioners. Additionally, we provide an overview of available data set repositories and data generation frameworks, contributing to the ongoing dialogue on digital forensic data sets’ utility.
{"title":"Data for Digital Forensics: Why a Discussion on “How Realistic is Synthetic Data” is Dispensable","authors":"Thomas Göbel, Harald Baier, Frank Breitinger","doi":"10.1145/3609863","DOIUrl":"https://doi.org/10.1145/3609863","url":null,"abstract":"Digital forensics depends on data sets for various purposes like concept evaluation, educational training, and tool validation. Researchers have gathered such data sets into repositories and created data simulation frameworks for producing large amounts of data. Synthetic data often face skepticism due to its perceived deviation from real-world data, raising doubts about its realism. This paper addresses this concern, arguing that there is no definitive answer. We focus on four common digital forensic use cases that rely on data. Through these, we elucidate the specifications and prerequisites of data sets within their respective contexts. Our discourse uncovers that both real-world and synthetic data are indispensable for advancing digital forensic science, software, tools, and the competence of practitioners. Additionally, we provide an overview of available data set repositories and data generation frameworks, contributing to the ongoing dialogue on digital forensic data sets’ utility.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121660364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The concept of Software Defined Storage (SDS) has become very popular over the last few years. It is used in public, private, and hybrid clouds to store enterprise, private, and other kinds of data. Ceph is an open source software that implements an SDS stack. This article analyzes the data found on storage devices (Object Store Devices (OSDs)) used to store Ceph BlueStore data from a data forensics point of view. The Object Store Device (OSD) data is categorized using the model proposed by Carrier into the five categories file system, content, metadata, file name, and application category. It then describes how the different data can be connected to present useful information about the content of an OSD and presents the implementation of a forensic software tool for OSD analysis based on Ceph 12.2.4 luminous.
软件定义存储(SDS)的概念在过去几年中变得非常流行。它用于公共、私有和混合云中,以存储企业、私有和其他类型的数据。Ceph是一个实现SDS堆栈的开源软件。本文从数据取证的角度分析了用于存储Ceph BlueStore数据的存储设备(对象存储设备)上的数据。OSD (Object Store Device)数据按照Carrier提出的模型分为文件系统、内容、元数据、文件名和应用类别五类。然后介绍了如何将不同的数据连接起来,以提供有关OSD内容的有用信息,并介绍了基于Ceph 12.2.4 luminous的OSD分析取证软件工具的实现。
{"title":"Forensic Examination of Ceph","authors":"Florian Bausch, Andreas Dewald","doi":"10.1145/3609862","DOIUrl":"https://doi.org/10.1145/3609862","url":null,"abstract":"The concept of Software Defined Storage (SDS) has become very popular over the last few years. It is used in public, private, and hybrid clouds to store enterprise, private, and other kinds of data. Ceph is an open source software that implements an SDS stack. This article analyzes the data found on storage devices (Object Store Devices (OSDs)) used to store Ceph BlueStore data from a data forensics point of view. The Object Store Device (OSD) data is categorized using the model proposed by Carrier into the five categories file system, content, metadata, file name, and application category. It then describes how the different data can be connected to present useful information about the content of an OSD and presents the implementation of a forensic software tool for OSD analysis based on Ceph 12.2.4 luminous.","PeriodicalId":202552,"journal":{"name":"Digital Threats: Research and Practice","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130766972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}