Changhai Ou, Fan Zhang, Xinping Zhou, Kexin Qiao, Renjun Zhang
The existing multiple-layer candidate sieve exploits collisions to filter the candidates to achieve a much smaller space for easier key recovery, and tries to recover the key ranking at very deep candidate space. However, it leads to enormous computation yet achieves very low success probability. In this paper, we build a novel Simple Multiple-Layer Sieve (SMLS) from Correlation Power Analysis (CPA) and achieve better performance than the existing one. Furthermore, we build two combined sieves named Two-Layer Stacking Sieve (TLSS) and Full-Layer Stacking Sieve (FLSS) since same operations in serial cryptographic implementation generate similar leakage. The experimental results verify their superiority.
{"title":"Multiple-Layer Candidate Sieves Against Serial Cryptographic Implementations","authors":"Changhai Ou, Fan Zhang, Xinping Zhou, Kexin Qiao, Renjun Zhang","doi":"10.29007/d3gt","DOIUrl":"https://doi.org/10.29007/d3gt","url":null,"abstract":"The existing multiple-layer candidate sieve exploits collisions to filter the candidates to achieve a much smaller space for easier key recovery, and tries to recover the key ranking at very deep candidate space. However, it leads to enormous computation yet achieves very low success probability. In this paper, we build a novel Simple Multiple-Layer Sieve (SMLS) from Correlation Power Analysis (CPA) and achieve better performance than the existing one. Furthermore, we build two combined sieves named Two-Layer Stacking Sieve (TLSS) and Full-Layer Stacking Sieve (FLSS) since same operations in serial cryptographic implementation generate similar leakage. The experimental results verify their superiority.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69432027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Side-channel attacks aim at extracting secret keys from cryptographic devices. Ran- domly masking the implementation is a provable way to protect the secrets against this threat. Recently, various masking schemes have converged to the “code-based masking” philosophy. In code-based masking, different codes allow for different levels of side-channel security. In practice, for a given leakage function, it is important to select the code which enables the best resistance, i.e., which forces the attacker to capture and analyze the largest number of side-channel traces.This paper is a first attempt to address the constructive selection of the optimal codes in the context of side-channel countermeasures, in particular for code-based masking when the device leaks information in the Hamming weight leakage model. We show that the problem is related to the weight enumeration of the extended dual of the masking code. We first present mathematical tools to study those weight enumeration polynomials, and then provide an efficient method to search for good codes, based on a lexicographic sorting of the weight enumeration polynomial from lowest to highest degrees.
{"title":"Towards Finding Best Linear Codes for Side-Channel Protections","authors":"Wei Cheng, Yi Liu, S. Guilley, O. Rioul","doi":"10.29007/bnrc","DOIUrl":"https://doi.org/10.29007/bnrc","url":null,"abstract":"Side-channel attacks aim at extracting secret keys from cryptographic devices. Ran- domly masking the implementation is a provable way to protect the secrets against this threat. Recently, various masking schemes have converged to the “code-based masking” philosophy. In code-based masking, different codes allow for different levels of side-channel security. In practice, for a given leakage function, it is important to select the code which enables the best resistance, i.e., which forces the attacker to capture and analyze the largest number of side-channel traces.This paper is a first attempt to address the constructive selection of the optimal codes in the context of side-channel countermeasures, in particular for code-based masking when the device leaks information in the Hamming weight leakage model. We show that the problem is related to the weight enumeration of the extended dual of the masking code. We first present mathematical tools to study those weight enumeration polynomials, and then provide an efficient method to search for good codes, based on a lexicographic sorting of the weight enumeration polynomial from lowest to highest degrees.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69430467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cryptographic algorithms are fundamental to security. However, it has been shown that secret information could be effectively extracted through monitoring and analyzing the cache side-channel information (i.e., hit and miss) of cryptographic implementations. To mitigate such attacks, a large number of detection-based defenses have been proposed. To the best of our knowledge, almost all of them are achieved by collecting and analyzing hardware performance counter (HPC) data. But these low-level HPC data usually lacks semantic information and is easy to be interfered, which makes it difficult to determine the attack type by analyzing the HPC information only.Actually, the behavior of a cache attack is localized. In certain attack-related steps, the data accesses of cache memory blocks are intensive, while such behavior can be distributed sparsely among different attack steps. Based on this observation, in this paper, we pro- pose the locality-based cache side-channel attack detection method, which combines the low-level HPC running data with the high-level control flow graph (CFG) of the program to achieve locality-guided attack pattern extraction. Then we can use GNN graph clas- sification technology to learn such attack pattern and detect malicious attack programs. The experiments with a corpus of 1200 benchmarks show that our approach can achieve 99.44% accuracy and 99.47% F1-Score with a low performance overhead.
{"title":"Locality Based Cache Side-channel Attack Detection","authors":"Limin Wang, Lei Bu, Fu Song","doi":"10.29007/vbqt","DOIUrl":"https://doi.org/10.29007/vbqt","url":null,"abstract":"Cryptographic algorithms are fundamental to security. However, it has been shown that secret information could be effectively extracted through monitoring and analyzing the cache side-channel information (i.e., hit and miss) of cryptographic implementations. To mitigate such attacks, a large number of detection-based defenses have been proposed. To the best of our knowledge, almost all of them are achieved by collecting and analyzing hardware performance counter (HPC) data. But these low-level HPC data usually lacks semantic information and is easy to be interfered, which makes it difficult to determine the attack type by analyzing the HPC information only.Actually, the behavior of a cache attack is localized. In certain attack-related steps, the data accesses of cache memory blocks are intensive, while such behavior can be distributed sparsely among different attack steps. Based on this observation, in this paper, we pro- pose the locality-based cache side-channel attack detection method, which combines the low-level HPC running data with the high-level control flow graph (CFG) of the program to achieve locality-guided attack pattern extraction. Then we can use GNN graph clas- sification technology to learn such attack pattern and detect malicious attack programs. The experiments with a corpus of 1200 benchmarks show that our approach can achieve 99.44% accuracy and 99.47% F1-Score with a low performance overhead.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69451562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given that large-scale quantum computers can eventually compute discrete logarithm and integer factorization in polynomial time [44], all asymmetric cryptographic schemes will break down. Hence, replacing them becomes mandatory. For this purpose, the Na- tional Institute of Standards and Technology (NIST) initiated a standardization process for post-quantum schemes. These schemes are supposed to substitute classical cryptography in different use-cases, such as client-server authentication during the TLS handshake. How- ever, their signatures, public key sizes, and signature verification time impose difficulty, especially for resource-constrained devices. In this paper, we improve the TLS hand- shake performance relying on post-quantum signatures by combining the XMSS and the Dilithium signature schemes along the chain of certificates. We provide proof-of-concept implementation of our solution by integrating the two signature schemes in the WolfSSL library. Moreover, we evaluate the performance of our solution and establish that it re- duces the signature verification time considerably and minimizes the size of the chain of trust. We provide a security proof of the proposed chain of trust which is relies on the security of the XMSS scheme.
{"title":"XMSS-based Chain of Trust","authors":"Soundes Marzougui, Jean-Pierre Seifert","doi":"10.29007/2fv1","DOIUrl":"https://doi.org/10.29007/2fv1","url":null,"abstract":"Given that large-scale quantum computers can eventually compute discrete logarithm and integer factorization in polynomial time [44], all asymmetric cryptographic schemes will break down. Hence, replacing them becomes mandatory. For this purpose, the Na- tional Institute of Standards and Technology (NIST) initiated a standardization process for post-quantum schemes. These schemes are supposed to substitute classical cryptography in different use-cases, such as client-server authentication during the TLS handshake. How- ever, their signatures, public key sizes, and signature verification time impose difficulty, especially for resource-constrained devices. In this paper, we improve the TLS hand- shake performance relying on post-quantum signatures by combining the XMSS and the Dilithium signature schemes along the chain of certificates. We provide proof-of-concept implementation of our solution by integrating the two signature schemes in the WolfSSL library. Moreover, we evaluate the performance of our solution and establish that it re- duces the signature verification time considerably and minimizes the size of the chain of trust. We provide a security proof of the proposed chain of trust which is relies on the security of the XMSS scheme.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69420960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
%MinMax, a model of intra-gene translational elongation rate, relies on codon usage frequencies. Historically, %MinMax has used tables that measure codon usage bias for all genes in an organism, such as those found at HIVE-CUT. In this paper, we provide evidence that codon usage bias based on all genes is insufficient to accurately measure absolute translation rate. We show that alternative "High-ϕ" codon usage tables, generated by another model (ROC-SEMPPR), are a promising alternative. By creating a hybrid model, future codon usage analyses and their applications (e.g., codon harmonization) are likely to more accurately measure the "tempo" of translation elongation. We also suggest a High-ϕ alternative to the Codon Adaptation Index (CAI), a classic metric of codon usage bias based on highly expressed genes. Significantly, our new alternative is equally well correlated with empirical data as traditional CAI without using experimentally determined expression counts as input.
{"title":"A New Look at Codon Usage and Protein Expression.","authors":"Gabriel Wright, A. Rodríguez, P. Clark, S. Emrich","doi":"10.29007/d4tz","DOIUrl":"https://doi.org/10.29007/d4tz","url":null,"abstract":"%MinMax, a model of intra-gene translational elongation rate, relies on codon usage frequencies. Historically, %MinMax has used tables that measure codon usage bias for all genes in an organism, such as those found at HIVE-CUT. In this paper, we provide evidence that codon usage bias based on all genes is insufficient to accurately measure absolute translation rate. We show that alternative \"High-ϕ\" codon usage tables, generated by another model (ROC-SEMPPR), are a promising alternative. By creating a hybrid model, future codon usage analyses and their applications (e.g., codon harmonization) are likely to more accurately measure the \"tempo\" of translation elongation. We also suggest a High-ϕ alternative to the Codon Adaptation Index (CAI), a classic metric of codon usage bias based on highly expressed genes. Significantly, our new alternative is equally well correlated with empirical data as traditional CAI without using experimentally determined expression counts as input.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"60 1","pages":"104-112"},"PeriodicalIF":0.0,"publicationDate":"2019-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48325259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lukas Chrisantyo, Argo Wibowo, Maria Nila Anggiarini, Antonius Rachmat Chrismanto
Information technology continues to evolve unceasingly. In line with the evolvement, agricultural sciences also transform the sense of technology utilization in its information systems to improve its quality and service. The Government of Indonesia strongly supports the use of information system technology in agriculture. DutaTani research team has consistently developed Agricultural Information System (AIS) technology since 2016 to achieve precision agriculture. These developments must be followed by continuous improvement of information systems carried out sustainably following changes and developments in the technology used. Testing is sorely needed in the system repair phase so that changes or improvements do not cause conflicts or problems in any pre-existing functions. The number of technologies that are tried to be applied in the repair phase tends to cause high system failures when they are tested on users. Based on these problems, this study aims to implement Blackbox testing to increase the system's success rate before general users utilize it. Blackbox testing is considered capable of bridging the development team and random respondents representing general users later. This research also added iterations to increase the success rate of the system. Respondents are invited to use the system through several main scenarios, but they have to fill in the input with variables that they have never filled in before. Through several iterations and following a test scenario created by an independent test team with ten random respondents, this study increased the system's success rate by 11.79%.
{"title":"Blackbox Testing on the ReVAMP Results of The DutaTani Agricultural Information System","authors":"Lukas Chrisantyo, Argo Wibowo, Maria Nila Anggiarini, Antonius Rachmat Chrismanto","doi":"10.29007/1sx8","DOIUrl":"https://doi.org/10.29007/1sx8","url":null,"abstract":"Information technology continues to evolve unceasingly. In line with the evolvement, agricultural sciences also transform the sense of technology utilization in its information systems to improve its quality and service. The Government of Indonesia strongly supports the use of information system technology in agriculture. DutaTani research team has consistently developed Agricultural Information System (AIS) technology since 2016 to achieve precision agriculture. These developments must be followed by continuous improvement of information systems carried out sustainably following changes and developments in the technology used. Testing is sorely needed in the system repair phase so that changes or improvements do not cause conflicts or problems in any pre-existing functions. The number of technologies that are tried to be applied in the repair phase tends to cause high system failures when they are tested on users. Based on these problems, this study aims to implement Blackbox testing to increase the system's success rate before general users utilize it. Blackbox testing is considered capable of bridging the development team and random respondents representing general users later. This research also added iterations to increase the success rate of the system. Respondents are invited to use the system through several main scenarios, but they have to fill in the input with variables that they have never filled in before. Through several iterations and following a test scenario created by an independent test team with ten random respondents, this study increased the system's success rate by 11.79%.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69420547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cryptographic algorithms are an essential measure to ensure confidentiality and integrity of internet communication. The development of quantum computers (QCs) and their potential to utilize Shor’s Law, is increasingly recognized as a threat to asymmetric cryptography. In response, post-quantum cryptography (PQC) is gaining prominence as a notable field of research aiming to standardize quantum resistant algorithms before the operational usage of QCs. This paper is addressed to people with preliminary knowledge in the field of cryptography and QC. Based on a literature review, the authors provide an overview of challenges faced by the research community and elaborate the advancements in addressing post-quantum threats. A migration strategy from classical cryptosystems to PQC systems is in development, but obstacles such as time constraints and improper implementation complicate the process. Full implementation could take a decade or more. Until then, our paper aims to create awareness for potential challenges when transitioning towards PQC. As categorization scheme for these potential obstacles, we refer to a well- established model in cybersecurity – the McCumber Cube. Conclusions embrace preparing for risks of improper implementation and deriving a multi-step migration. Special attention is expected to be needed for data migration of existing data sets. As a request for future research in PQC, the authors identified the process of implementing post-cryptography standards, e.g., from the National Institute of Standards and Technology (NIST), and an assessment of the perceived readiness of industry to adapt.
{"title":"Post-Quantum Cryptography: An Introductory Overview and Implementation Challenges of Quantum-Resistant Algorithms","authors":"Sherdel A. Käppler, Bettina Schneider","doi":"10.29007/2tpw","DOIUrl":"https://doi.org/10.29007/2tpw","url":null,"abstract":"Cryptographic algorithms are an essential measure to ensure confidentiality and integrity of internet communication. The development of quantum computers (QCs) and their potential to utilize Shor’s Law, is increasingly recognized as a threat to asymmetric cryptography. In response, post-quantum cryptography (PQC) is gaining prominence as a notable field of research aiming to standardize quantum resistant algorithms before the operational usage of QCs. This paper is addressed to people with preliminary knowledge in the field of cryptography and QC. Based on a literature review, the authors provide an overview of challenges faced by the research community and elaborate the advancements in addressing post-quantum threats. A migration strategy from classical cryptosystems to PQC systems is in development, but obstacles such as time constraints and improper implementation complicate the process. Full implementation could take a decade or more. Until then, our paper aims to create awareness for potential challenges when transitioning towards PQC. As categorization scheme for these potential obstacles, we refer to a well- established model in cybersecurity – the McCumber Cube. Conclusions embrace preparing for risks of improper implementation and deriving a multi-step migration. Special attention is expected to be needed for data migration of existing data sets. As a request for future research in PQC, the authors identified the process of implementing post-cryptography standards, e.g., from the National Institute of Standards and Technology (NIST), and an assessment of the perceived readiness of industry to adapt.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69420636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Protein functions are strongly related to their 3D structure. Therefore, it is crucial to identify their structure to understand how they behave. Studies have shown that numerous numbers of proteins cross a biological membrane, called Transmembrane (TM) proteins, and many of them adopt alpha helices shape. Unlike the current contact prediction methods that use inductive learning to predict transmembrane protein inter-helical residues contact, we adopt a transductive learning approach. The idea of transductive learning can be very useful when the test set is much bigger than the training set, which is usually the case in amino acids residues contacts prediction. We test this approach on a set of transmembrane protein sequences to identify helix-helix residues contacts, compare transductive and inductive approaches, and identify conditions and limitations where TSVM outperforms inductive SVM. In addition, we investigate the performance degradation of the traditional TSVM and explore the proposed solutions in the literature. Moreover, we propose an early stop technique that can outperform the state of art TSVM and produce a more accurate prediction.
{"title":"Transmembrane Protein Inter-Helical Residue Contacts Prediction Using Transductive Support Vector Machines","authors":"Bander Almalki, Aman Sawhney, Li Liao","doi":"10.29007/3ztg","DOIUrl":"https://doi.org/10.29007/3ztg","url":null,"abstract":"Protein functions are strongly related to their 3D structure. Therefore, it is crucial to identify their structure to understand how they behave. Studies have shown that numerous numbers of proteins cross a biological membrane, called Transmembrane (TM) proteins, and many of them adopt alpha helices shape. Unlike the current contact prediction methods that use inductive learning to predict transmembrane protein inter-helical residues contact, we adopt a transductive learning approach. The idea of transductive learning can be very useful when the test set is much bigger than the training set, which is usually the case in amino acids residues contacts prediction. We test this approach on a set of transmembrane protein sequences to identify helix-helix residues contacts, compare transductive and inductive approaches, and identify conditions and limitations where TSVM outperforms inductive SVM. In addition, we investigate the performance degradation of the traditional TSVM and explore the proposed solutions in the literature. Moreover, we propose an early stop technique that can outperform the state of art TSVM and produce a more accurate prediction.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69421339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computational biology scientific software projects are continuously growing and the volume and the task of analyzing, designing, implementing, testing, and maintaining these projects to ensure high-quality software products are only getting harder and more complicated. Conventional software development methodologies are not sufficient in ensuring that scientific software is error-free or up to the standard or comparable to the software designed in the industry. For this reason, it is important to investigate projects that utilized the best software engineering practices during their development and find and understand the problems that arise during the development of those projects. Such understanding will serve as the first step in the process of developing high-quality software products and will enable us to design and propose solutions to the problems that commonly occur during the development of such projects. In this paper, we will discuss different studies that applied software engineering practices and approaches in their computational biology projects. The challenges they encountered and the benefits they gained from employing software engineering quality assurance and testing techniques. In addition, we will demonstrate some of our own experiences when designing, developing, and testing computational biology projects within academic settings. We will also present, based on our experience, some solutions, methodologies, and practices that when adopted will benefit the scientific computational biology software community throughout the process of designing and testing the software.
{"title":"Challenges of Software Engineering in Computational Biology and Bioinformatics Scientific Software Projects","authors":"Tamer Aldwairi","doi":"10.29007/3q66","DOIUrl":"https://doi.org/10.29007/3q66","url":null,"abstract":"Computational biology scientific software projects are continuously growing and the volume and the task of analyzing, designing, implementing, testing, and maintaining these projects to ensure high-quality software products are only getting harder and more complicated. Conventional software development methodologies are not sufficient in ensuring that scientific software is error-free or up to the standard or comparable to the software designed in the industry. For this reason, it is important to investigate projects that utilized the best software engineering practices during their development and find and understand the problems that arise during the development of those projects. Such understanding will serve as the first step in the process of developing high-quality software products and will enable us to design and propose solutions to the problems that commonly occur during the development of such projects. In this paper, we will discuss different studies that applied software engineering practices and approaches in their computational biology projects. The challenges they encountered and the benefits they gained from employing software engineering quality assurance and testing techniques. In addition, we will demonstrate some of our own experiences when designing, developing, and testing computational biology projects within academic settings. We will also present, based on our experience, some solutions, methodologies, and practices that when adopted will benefit the scientific computational biology software community throughout the process of designing and testing the software.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69421452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Waleed Afandi, S. M. A. H. Bukhari, M. U. Khan, Tahir Maqsood, S. Khan
As the number of video streaming platforms is growing, the risk factor associated with illegal and inappropriate content streaming is increasing exponentially. Therefore, mon- itoring such content is essential. Many researches have been conducted on classifying encrypted videos. However, most existing techniques only pass raw traffic data into clas- sification models, which is an ineffective way of training a model. This research proposes a bucket-based data pre-processing technique for a video identification in network traffic. The bucketed traffic is then incorporated with a fine-tuned word2vec-based neural net- work to produce an effective encrypted video classifier. Experiments are carried out with different numbers and sizes of buckets to determine the best configuration. Furthermore, previous research has overlooked the phenomenon of concept drift, which reduces the effec- tiveness of a model. This paper also compares the severity of concept drift on the proposed and previous technique. The results indicate that the model can predict new samples of videos with an overall accuracy of 81% even after 20 days of training.
{"title":"A Bucket-Based Data Pre-Processing Method for Encrypted Video Detection","authors":"Waleed Afandi, S. M. A. H. Bukhari, M. U. Khan, Tahir Maqsood, S. Khan","doi":"10.29007/4rnp","DOIUrl":"https://doi.org/10.29007/4rnp","url":null,"abstract":"As the number of video streaming platforms is growing, the risk factor associated with illegal and inappropriate content streaming is increasing exponentially. Therefore, mon- itoring such content is essential. Many researches have been conducted on classifying encrypted videos. However, most existing techniques only pass raw traffic data into clas- sification models, which is an ineffective way of training a model. This research proposes a bucket-based data pre-processing technique for a video identification in network traffic. The bucketed traffic is then incorporated with a fine-tuned word2vec-based neural net- work to produce an effective encrypted video classifier. Experiments are carried out with different numbers and sizes of buckets to determine the best configuration. Furthermore, previous research has overlooked the phenomenon of concept drift, which reduces the effec- tiveness of a model. This paper also compares the severity of concept drift on the proposed and previous technique. The results indicate that the model can predict new samples of videos with an overall accuracy of 81% even after 20 days of training.","PeriodicalId":93549,"journal":{"name":"EPiC series in computing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69422000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}