首页 > 最新文献

2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)最新文献

英文 中文
IDSTA 2022 Cover Page idsta2022封面
{"title":"IDSTA 2022 Cover Page","authors":"","doi":"10.1109/idsta55301.2022.9923038","DOIUrl":"https://doi.org/10.1109/idsta55301.2022.9923038","url":null,"abstract":"","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128465852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dark Web Analytics: A Comparative Study of Feature Selection and Prediction Algorithms 暗网分析:特征选择和预测算法的比较研究
Ahmad Al-Omari, A. Allhusen, A. Wahbeh, M. Al-Ramahi, I. Alsmadi
The value and size of information exchanged through dark-web pages are remarkable. Recently Many researches showed values and interests in using machine-learning methods to extract security-related useful knowledge from those dark-web pages. In this scope, our goals in this research focus on evaluating best prediction models while analyzing traffic level data coming from the dark web. Results and analysis showed that feature selection played an important role when trying to identify the best models. Sometimes the right combination of features would increase the model’s accuracy. For some feature set and classifier combinations, the Src Port and Dst Port both proved to be important features. When available, they were always selected over most other features. When absent, it resulted in many other features being selected to compensate for the information they provided. The Protocol feature was never selected as a feature, regardless of whether Src Port and Dst Port were available.
通过暗网页面交换的信息的价值和规模是惊人的。近年来,许多研究对利用机器学习方法从这些暗网页中提取与安全相关的有用知识表现出了价值和兴趣。在这个范围内,我们的研究目标集中在评估最佳预测模型,同时分析来自暗网的流量水平数据。结果和分析表明,特征选择在识别最佳模型时起着重要作用。有时,正确的特征组合会提高模型的准确性。对于一些特征集和分类器组合,Src端口和Dst端口都被证明是重要的特征。当可用时,它们总是被选择在大多数其他功能之上。如果不存在,则会选择许多其他特性来补偿它们提供的信息。无论Src端口和Dst端口是否可用,协议特性都不会被选中作为一个特性。
{"title":"Dark Web Analytics: A Comparative Study of Feature Selection and Prediction Algorithms","authors":"Ahmad Al-Omari, A. Allhusen, A. Wahbeh, M. Al-Ramahi, I. Alsmadi","doi":"10.1109/IDSTA55301.2022.9923042","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923042","url":null,"abstract":"The value and size of information exchanged through dark-web pages are remarkable. Recently Many researches showed values and interests in using machine-learning methods to extract security-related useful knowledge from those dark-web pages. In this scope, our goals in this research focus on evaluating best prediction models while analyzing traffic level data coming from the dark web. Results and analysis showed that feature selection played an important role when trying to identify the best models. Sometimes the right combination of features would increase the model’s accuracy. For some feature set and classifier combinations, the Src Port and Dst Port both proved to be important features. When available, they were always selected over most other features. When absent, it resulted in many other features being selected to compensate for the information they provided. The Protocol feature was never selected as a feature, regardless of whether Src Port and Dst Port were available.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132792667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Discrete Sequencing for Demand Forecasting: A novel data sampling technique for time series forecasting 离散序列需求预测:一种新的时间序列预测数据采样技术
N. Menon, Shantanu Saboo, Tanmay Ambadkar, Umesh Uppili
Accurately forecasting energy consumption for buildings has become increasingly important over the years owing to the increasing prices of energy. A good forecast gives an understanding of how much the expected load (demand) of the building would be in the coming days and months. This could be used in further planning of energy usage within the building. This also becomes important due to the dynamic nature of energy rates. With an accurate forecast, one could also aim for spot trading by which the energy is bought and sold at different rates in a daily fashion. We target short-term and medium-term demand forecasting for buildings. Data Sampling is an integral part of training time-series models. The temporal horizon along with the patterns captured contribute to the model learning and thus its forecasts. When the data is aplenty with more than one value per day, the traditional sliding window method is unable to forecast for short-term forecasts without the actual truth values because of its continuous nature. The forecasts deviate very quickly and become unusable. In this paper, we present a novel data sampling technique called Discrete Sequencing. This samples data sequences in a lagged fashion which looks at a much larger temporal horizon with a smaller sequence size. We demonstrate the efficacy of our sampling technique by testing the forecasts on three different neural network architectures.
近年来,由于能源价格的不断上涨,准确预测建筑物的能源消耗变得越来越重要。一个好的预测可以让你了解未来几天或几个月建筑物的预期负荷(需求)是多少。这可以用于进一步规划建筑内的能源使用。由于能量率的动态特性,这一点也变得很重要。有了准确的预测,人们还可以瞄准现货交易,通过现货交易,能源每天以不同的价格买卖。我们的目标是预测建筑物的短期和中期需求。数据采样是时间序列模型训练的重要组成部分。时间视界以及捕获的模式有助于模型的学习,从而有助于模型的预测。当每天的数据量很大且不止一个值时,传统的滑动窗口法由于其连续性,无法在没有实际真值的情况下进行短期预测。这些预测偏差很快就会变得不可用。在本文中,我们提出了一种新的数据采样技术,称为离散测序。这以滞后的方式对数据序列进行采样,以较小的序列大小查看更大的时间范围。我们通过在三种不同的神经网络架构上测试我们的预测来证明我们的抽样技术的有效性。
{"title":"Discrete Sequencing for Demand Forecasting: A novel data sampling technique for time series forecasting","authors":"N. Menon, Shantanu Saboo, Tanmay Ambadkar, Umesh Uppili","doi":"10.1109/IDSTA55301.2022.9923044","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923044","url":null,"abstract":"Accurately forecasting energy consumption for buildings has become increasingly important over the years owing to the increasing prices of energy. A good forecast gives an understanding of how much the expected load (demand) of the building would be in the coming days and months. This could be used in further planning of energy usage within the building. This also becomes important due to the dynamic nature of energy rates. With an accurate forecast, one could also aim for spot trading by which the energy is bought and sold at different rates in a daily fashion. We target short-term and medium-term demand forecasting for buildings. Data Sampling is an integral part of training time-series models. The temporal horizon along with the patterns captured contribute to the model learning and thus its forecasts. When the data is aplenty with more than one value per day, the traditional sliding window method is unable to forecast for short-term forecasts without the actual truth values because of its continuous nature. The forecasts deviate very quickly and become unusable. In this paper, we present a novel data sampling technique called Discrete Sequencing. This samples data sequences in a lagged fashion which looks at a much larger temporal horizon with a smaller sequence size. We demonstrate the efficacy of our sampling technique by testing the forecasts on three different neural network architectures.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132088242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Evaluation of Permissioned-based Personal Data Vault Implemented Using Hyperledger Fabric v2.x 基于Hyperledger Fabric v2.x的个人数据仓库性能评估
Neha Mishra, H. Levkowitz
Blockchain is a fundamental technology that can decentralize how we organize, share, and preserve data and information. This paper evaluates and improves the performance of our Personal Data Vault (our ongoing framework) by focusing on Hyperledger Fabric (HLF) version 2.x (v2.x), one of the most popular open source and highly scalable permissioned blockchains, particularly taking advantage of their new chaincode lifecycle. We conducted several experiments using the Hyperledger Caliper Benchmark version 0.4.2 (v0.4.2), a performance measuring tool. First, we observed changes in performance by varying network parameters (e.g., block size, endorsement policy (EP), number of clients). Then, for further evaluation, we selected sets of network parameters that showed the best performance for a given number of clients. A first selected set of network parameters showed significant improvements in throughput and average latency compared to the parameters that were not selected. And, a second selected set of network parameters out-performed the first in almost every way. These improvements were obtained by using a faster smart contracts lifecycle.
区块链是一项基本技术,可以分散我们组织、共享和保存数据和信息的方式。本文通过关注Hyperledger Fabric (HLF)版本2来评估和改进我们的个人数据仓库(我们正在进行的框架)的性能。X (v2.x)是最受欢迎的开源和高度可扩展的许可区块链之一,特别是利用其新的链码生命周期。我们使用性能测量工具Hyperledger Caliper Benchmark 0.4.2 (v0.4.2)进行了几次实验。首先,我们通过改变网络参数(例如,块大小、背书策略(EP)、客户端数量)来观察性能的变化。然后,为了进一步评估,我们选择了在给定数量的客户机上显示最佳性能的网络参数集。与未选择的参数相比,第一组选择的网络参数在吞吐量和平均延迟方面显示出显著的改进。而且,第二组选择的网络参数几乎在所有方面都优于第一组。这些改进是通过使用更快的智能合约生命周期获得的。
{"title":"Performance Evaluation of Permissioned-based Personal Data Vault Implemented Using Hyperledger Fabric v2.x","authors":"Neha Mishra, H. Levkowitz","doi":"10.1109/IDSTA55301.2022.9923056","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923056","url":null,"abstract":"Blockchain is a fundamental technology that can decentralize how we organize, share, and preserve data and information. This paper evaluates and improves the performance of our Personal Data Vault (our ongoing framework) by focusing on Hyperledger Fabric (HLF) version 2.x (v2.x), one of the most popular open source and highly scalable permissioned blockchains, particularly taking advantage of their new chaincode lifecycle. We conducted several experiments using the Hyperledger Caliper Benchmark version 0.4.2 (v0.4.2), a performance measuring tool. First, we observed changes in performance by varying network parameters (e.g., block size, endorsement policy (EP), number of clients). Then, for further evaluation, we selected sets of network parameters that showed the best performance for a given number of clients. A first selected set of network parameters showed significant improvements in throughput and average latency compared to the parameters that were not selected. And, a second selected set of network parameters out-performed the first in almost every way. These improvements were obtained by using a faster smart contracts lifecycle.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129001034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical web security testing: Evolution of web application modules and open source testing tools 实用web安全测试:web应用模块和开源测试工具的发展
Mohammed Ali Kunda, I. Alsmadi
Web application security testing is vital for preventing any security flaws in the design of web applications. A major challenge in web security testing is the continuous change and evolution of web design tools and modules. As such, most open source tools may not be up to date with catching up with recent technologies. In this paper, we reported our effort and experience testing our recently developed website (https://mysmartsa.com/). We utilized and reported vulnerabilities from several open-source security testing tools. We also reported efforts to debug and fix those security issues throughout the development process.
Web应用程序安全性测试对于防止Web应用程序设计中的任何安全缺陷至关重要。web安全测试的一个主要挑战是web设计工具和模块的不断变化和演变。因此,大多数开源工具可能无法跟上最新的技术。在本文中,我们报告了我们的努力和经验测试我们最近开发的网站(https://mysmartsa.com/)。我们利用并报告了来自几个开源安全测试工具的漏洞。我们还报告了在整个开发过程中调试和修复这些安全问题的努力。
{"title":"Practical web security testing: Evolution of web application modules and open source testing tools","authors":"Mohammed Ali Kunda, I. Alsmadi","doi":"10.1109/IDSTA55301.2022.9923130","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923130","url":null,"abstract":"Web application security testing is vital for preventing any security flaws in the design of web applications. A major challenge in web security testing is the continuous change and evolution of web design tools and modules. As such, most open source tools may not be up to date with catching up with recent technologies. In this paper, we reported our effort and experience testing our recently developed website (https://mysmartsa.com/). We utilized and reported vulnerabilities from several open-source security testing tools. We also reported efforts to debug and fix those security issues throughout the development process.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114219067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
From Theory to Practice: Towards an OSINT Framework to Mitigate Arabic Social Cyber Attacks 从理论到实践:迈向OSINT框架,以减轻阿拉伯社会网络攻击
Ahmed Aleroud, Nour Alhussien, C. Albert
Ongoing research indicates the types of issues that need to be considered by a social cybersecurity researcher or practitioner. This study investigates the existing research on social cyber security on Arabic social media. It suggests the need to consider socio-political issues when presenting computational social media studies. We investigate the scope of social cyberattacks in Arabic. We show the need for new open-source intelligence (OSINT) framework to identify disinformation, bots, trolls, cyborgs, and memes. Recent studies have found that such attacks are spreading in 25 different languages including Arabic; some of these have caused injury and even death. We provide a comprehensive requirement analysis for developing open-source intelligence systems to detect social cyberattacks in Arabic. We show that while there are many OSINT Systems (OSINTs) to mitigate such attacks in English, systems needed to mitigate such attacks in other languages such as Arabic are still limited.
正在进行的研究表明,社会网络安全研究人员或从业者需要考虑的问题类型。本研究对阿拉伯社交媒体上的社交网络安全的现有研究进行了调查。这表明,在进行计算社交媒体研究时,需要考虑社会政治问题。我们调查了阿拉伯社会网络攻击的范围。我们展示了对新的开源情报(OSINT)框架的需求,以识别虚假信息、机器人、巨魔、半机械人和模因。最近的研究发现,这种攻击正在以25种不同的语言传播,包括阿拉伯语;其中一些已经造成了伤害甚至死亡。我们为开发开源情报系统提供全面的需求分析,以检测阿拉伯语的社交网络攻击。我们表明,虽然有许多OSINT系统(OSINT)可以缓解英语中的此类攻击,但缓解其他语言(如阿拉伯语)中的此类攻击所需的系统仍然有限。
{"title":"From Theory to Practice: Towards an OSINT Framework to Mitigate Arabic Social Cyber Attacks","authors":"Ahmed Aleroud, Nour Alhussien, C. Albert","doi":"10.1109/IDSTA55301.2022.9923049","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923049","url":null,"abstract":"Ongoing research indicates the types of issues that need to be considered by a social cybersecurity researcher or practitioner. This study investigates the existing research on social cyber security on Arabic social media. It suggests the need to consider socio-political issues when presenting computational social media studies. We investigate the scope of social cyberattacks in Arabic. We show the need for new open-source intelligence (OSINT) framework to identify disinformation, bots, trolls, cyborgs, and memes. Recent studies have found that such attacks are spreading in 25 different languages including Arabic; some of these have caused injury and even death. We provide a comprehensive requirement analysis for developing open-source intelligence systems to detect social cyberattacks in Arabic. We show that while there are many OSINT Systems (OSINTs) to mitigate such attacks in English, systems needed to mitigate such attacks in other languages such as Arabic are still limited.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124921027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Organization Committee 组织委员会
{"title":"Organization Committee","authors":"","doi":"10.1109/idsta55301.2022.9923046","DOIUrl":"https://doi.org/10.1109/idsta55301.2022.9923046","url":null,"abstract":"","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124300524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face Anti-spoofing based on Convolutional Neural Networks 基于卷积神经网络的人脸防欺骗
Siyamdumisa Maphisa, Duncan Coulter
Biometrics technologies have gained increasing attention across different sectors in the past decade. Face recognition has proven to be one of these successful biometric technologies. For example, law enforcement uses face recognition for faster investigations, banks for identity confirmation, and different organisations for access control. However, face recognition has shortcomings regardless of its high successes, just like any biometrics technology. Face recognition technology is still susceptible to face spoofing attacks despite great efforts made by different researchers to combat such attacks. The study proposes an anti-spoofing model based on deep learning methods. Three different pipelines are implemented based on convolutional neural network (CNN) architecture. A hyper tuned baseline CNN, a convolutional neural network based on AlexNet architecture, and a neural network based on VGG16 architecture. The study benchmarked pipelines using the available face anti-spoofing detection datasets - the NUAA and CelebA datasets. The study measures these performance metrics for all the pipelines: accuracy, precision, recall, F1 score, AUC, and Roc curve. All three pipelines provided good results when tested against the selected datasets.
在过去的十年里,生物识别技术在不同的领域得到了越来越多的关注。人脸识别已被证明是这些成功的生物识别技术之一。例如,执法部门使用人脸识别来加快调查速度,银行使用人脸识别来确认身份,不同的组织使用人脸识别来控制访问权限。然而,就像任何生物识别技术一样,人脸识别技术虽然取得了很大的成功,但也有缺点。人脸识别技术仍然容易受到人脸欺骗攻击,尽管不同的研究人员做出了巨大的努力来打击这种攻击。本研究提出了一种基于深度学习方法的反欺骗模型。基于卷积神经网络(CNN)架构实现了三种不同的管道。超调基线CNN,基于AlexNet架构的卷积神经网络,以及基于VGG16架构的神经网络。该研究使用可用的人脸抗欺骗检测数据集(NUAA和CelebA数据集)对管道进行基准测试。该研究测量了所有管道的这些性能指标:准确性、精密度、召回率、F1分数、AUC和Roc曲线。在针对选定的数据集进行测试时,所有三个管道都提供了良好的结果。
{"title":"Face Anti-spoofing based on Convolutional Neural Networks","authors":"Siyamdumisa Maphisa, Duncan Coulter","doi":"10.1109/IDSTA55301.2022.9923172","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923172","url":null,"abstract":"Biometrics technologies have gained increasing attention across different sectors in the past decade. Face recognition has proven to be one of these successful biometric technologies. For example, law enforcement uses face recognition for faster investigations, banks for identity confirmation, and different organisations for access control. However, face recognition has shortcomings regardless of its high successes, just like any biometrics technology. Face recognition technology is still susceptible to face spoofing attacks despite great efforts made by different researchers to combat such attacks. The study proposes an anti-spoofing model based on deep learning methods. Three different pipelines are implemented based on convolutional neural network (CNN) architecture. A hyper tuned baseline CNN, a convolutional neural network based on AlexNet architecture, and a neural network based on VGG16 architecture. The study benchmarked pipelines using the available face anti-spoofing detection datasets - the NUAA and CelebA datasets. The study measures these performance metrics for all the pipelines: accuracy, precision, recall, F1 score, AUC, and Roc curve. All three pipelines provided good results when tested against the selected datasets.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132848459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data Augmentation for Code Analysis 代码分析的数据增强
A. Shroyer, D. M. Swany
A key challenge of applying machine learning techniques to binary data is the lack of a large corpus of labeled training data. One solution to the lack of real-world data is to create synthetic data from real data through augmentation. In this paper, we demonstrate data augmentation techniques suitable for source code and compiled binary data. By augmenting existing data with semantically-similar sources, training set size is increased, and machine learning models better generalize to unseen data.
将机器学习技术应用于二进制数据的关键挑战是缺乏大量标记训练数据的语料库。缺乏真实数据的一个解决方案是通过增强从真实数据创建合成数据。在本文中,我们演示了适用于源代码和编译二进制数据的数据增强技术。通过使用语义相似的来源增加现有数据,可以增加训练集的大小,并且机器学习模型可以更好地泛化到未见过的数据。
{"title":"Data Augmentation for Code Analysis","authors":"A. Shroyer, D. M. Swany","doi":"10.1109/IDSTA55301.2022.9923033","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923033","url":null,"abstract":"A key challenge of applying machine learning techniques to binary data is the lack of a large corpus of labeled training data. One solution to the lack of real-world data is to create synthetic data from real data through augmentation. In this paper, we demonstrate data augmentation techniques suitable for source code and compiled binary data. By augmenting existing data with semantically-similar sources, training set size is increased, and machine learning models better generalize to unseen data.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124040416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Evaluation of Data Augmentation for Object Detection in XView Dataset XView数据集中目标检测的数据增强性能评价
John Olamofe, Xishuang Dong, Lijun Qian, Eric Shields
Object detection in overhead imagery is of great importance in computer vision. xView is one of the largest publicly available datasets of overhead imagery. Because limited amount of data/images is available for training, the performance of a typical object detection model is expected to be poor without enough training data. In this paper, data augmentation methods by changing/perturbing some of the properties of the images such as changing the color channel of the object, adding salt noise to the object, and enhancing contrast are applied to the xView dataset. Performance evaluation of object detection using YOLOv3 model and augmented data has been carried out. The results demonstrate that the effectiveness of the data augmentation methods depends on both the specific method and the object classes.
架空图像中的目标检测是计算机视觉中的一个重要问题。xView是最大的公开可用的开销图像数据集之一。由于可用于训练的数据/图像数量有限,如果没有足够的训练数据,典型的目标检测模型的性能可能会很差。在本文中,通过改变/干扰图像的某些属性,如改变对象的颜色通道,添加盐噪声,增强对比度等方法,对xView数据集进行了数据增强。利用YOLOv3模型和增强数据对目标检测进行了性能评价。结果表明,数据增强方法的有效性取决于具体方法和对象类。
{"title":"Performance Evaluation of Data Augmentation for Object Detection in XView Dataset","authors":"John Olamofe, Xishuang Dong, Lijun Qian, Eric Shields","doi":"10.1109/IDSTA55301.2022.9923040","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923040","url":null,"abstract":"Object detection in overhead imagery is of great importance in computer vision. xView is one of the largest publicly available datasets of overhead imagery. Because limited amount of data/images is available for training, the performance of a typical object detection model is expected to be poor without enough training data. In this paper, data augmentation methods by changing/perturbing some of the properties of the images such as changing the color channel of the object, adding salt noise to the object, and enhancing contrast are applied to the xView dataset. Performance evaluation of object detection using YOLOv3 model and augmented data has been carried out. The results demonstrate that the effectiveness of the data augmentation methods depends on both the specific method and the object classes.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130887727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1