2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)最新文献

英文中文

The Impact of Feature Selection on Different Machine Learning Models for Breast Cancer Classification 特征选择对不同机器学习模型对乳腺癌分类的影响

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00020

Atheer Algherairy, Wadha Almattar, Eman Bakri, Salma A. Albelali

Breast cancer appears to be a common type of cancer suffered by women globally, with considered high death rates. The survival rate of breast cancer patients decreases considerably for patients diagnosed at an advanced stage compared to those diagnosed at an early stage. The objective of this study is to investigate breast cancer classification and diagnosis task using the data from WBCD dataset. In our methodology, first, the breast cancer data was scaled. Then, four features selection methods were used to analyze the features. Pearson's Correlation method, Forward Selection method, Mutual Information and Univariate ROC-AUC were the used feature selectors. Next, different Machine Leaning models were applied including Support Vector Machine, Logistic Regression and XGBoost. Finally, the three models were cross-validated by 5-fold method. The ML models with different classifiers were evaluated based on several performance measures including accuracy, precision, recall, and F1-score. results show that Logistic Regression (LR) model with Forward Selection appeared to be the most successful classifier. The obtained classification accuracy, precision, and F1-score were 0.982, 0.983, 0.986; respectively. However, the highest recall score was 0.992 achieved by SVM model with Correlation feature selection. The developed model could potentially help the medical experts for the early diagnosis of breast cancer to decrease potential risk.

乳腺癌似乎是全球妇女罹患的一种常见癌症，死亡率很高。与早期诊断的患者相比，晚期诊断的乳腺癌患者的存活率大大降低。本研究的目的是利用WBCD数据集的数据来研究乳腺癌的分类和诊断任务。在我们的方法中，首先，乳腺癌的数据是按比例计算的。然后，采用四种特征选择方法对特征进行分析。采用Pearson相关法、正向选择法、互信息法和单变量ROC-AUC作为特征选择器。其次，应用了支持向量机、逻辑回归和XGBoost等不同的机器学习模型。最后，采用五重法对三个模型进行交叉验证。使用不同分类器的ML模型基于几个性能指标进行评估，包括准确性、精度、召回率和f1分数。结果表明，前向选择的Logistic回归模型是最成功的分类器。得到的分类准确度、精密度和f1评分分别为0.982、0.983、0.986;分别。而基于相关特征选择的SVM模型召回率最高，为0.992。该模型可以帮助医学专家对乳腺癌进行早期诊断，降低潜在风险。

{"title":"The Impact of Feature Selection on Different Machine Learning Models for Breast Cancer Classification","authors":"Atheer Algherairy, Wadha Almattar, Eman Bakri, Salma A. Albelali","doi":"10.1109/CDMA54072.2022.00020","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00020","url":null,"abstract":"Breast cancer appears to be a common type of cancer suffered by women globally, with considered high death rates. The survival rate of breast cancer patients decreases considerably for patients diagnosed at an advanced stage compared to those diagnosed at an early stage. The objective of this study is to investigate breast cancer classification and diagnosis task using the data from WBCD dataset. In our methodology, first, the breast cancer data was scaled. Then, four features selection methods were used to analyze the features. Pearson's Correlation method, Forward Selection method, Mutual Information and Univariate ROC-AUC were the used feature selectors. Next, different Machine Leaning models were applied including Support Vector Machine, Logistic Regression and XGBoost. Finally, the three models were cross-validated by 5-fold method. The ML models with different classifiers were evaluated based on several performance measures including accuracy, precision, recall, and F1-score. results show that Logistic Regression (LR) model with Forward Selection appeared to be the most successful classifier. The obtained classification accuracy, precision, and F1-score were 0.982, 0.983, 0.986; respectively. However, the highest recall score was 0.992 achieved by SVM model with Correlation feature selection. The developed model could potentially help the medical experts for the early diagnosis of breast cancer to decrease potential risk.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129951403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Proceedings 2022 7th International Conference on Data Science and Machine Learning Applications 2022第七届数据科学与机器学习应用国际会议论文集

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/cdma54072.2022.00001

The proceedings contain 39 papers. The topics discussed include: evaluation of machine learning to early detection of highly cited papers;towards using deep reinforcement learning for better COVID-19 vaccine distribution strategies;an investigation of forecasting Tadawul all share index (TASI) using machine learning;intelligent deep detection method for malicious tampering of cancer imagery;an empirical analysis of health-related campaigns on twitter Arabic hashtags;the accuracy performance of semantic segmentation network with different backbones;a comprehensive evaluation of statistical, machine learning and deep learning models for time series prediction;depression detection in Arabic using speech language recognition;a deep learning framework for temperature forecasting;improving relevance in a recommendation system to suggest charities without explicit user profiles using dual-autoencoders;and the impact of feature selection on different machine learning models for breast cancer classification.

会议记录包含39篇论文。讨论的主题包括:机器学习对高被引论文早期检测的评价;利用深度强化学习优化COVID-19疫苗分发策略的研究;利用机器学习预测Tadawul所有共享指数(TASI)的研究;针对癌症图像恶意篡改的智能深度检测方法;twitter阿拉伯标签上健康相关活动的实证分析综合评估统计、机器学习和深度学习模型用于时间序列预测;使用语音语言识别的阿拉伯语抑郁症检测;用于温度预测的深度学习框架;使用双自编码器提高推荐系统的相关性，以在没有明确用户资料的情况下推荐慈善机构;以及特征选择对不同机器学习模型对乳腺癌分类的影响。

引用次数: 0

A comprehensive evaluation of statistical, machine learning and deep learning models for time series prediction 时间序列预测的统计，机器学习和深度学习模型的综合评估

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00014

A. Xuan, Mengmeng Yin, Yupei Li, Xiyu Chen, Zhenliang Ma

How to choose the appropriate model to predict the time series is one of the most prominent activities of temporal data analysis. Empirical evidence is often adopted to select the most suitable model since there is no unified standard for matching data and models. Data characteristics affect model performance to a certain extent and maybe where the factors that determine the balance between prediction accuracy and model complexity are. In this article, Multi-Criteria Performance Measure method considering Mean of Absolute Value of the Residual Autocorrelation was adopted to address this problem. Case studies apply Time-Series Analysis decomposing datasets into trend, seasonality and residue and summarize the limitations and recommendations from the stochasticity of the residue. The results show that the statistical models perform best for datasets with low stochasticity, deep learning models specialize in forecasting fluctuant and long-term time series data, machine learning models could be candidates for datasets that possess numerical characters between the previous two categories. Conclusions could provide suggestions in selecting appropriate models and guide the research community in focusing the effort on more feasible or promising directions.

如何选择合适的模型进行时间序列预测是时间数据分析的重要内容之一。由于数据和模型的匹配没有统一的标准，因此通常采用经验证据来选择最合适的模型。数据特征在一定程度上影响模型的性能，也可能是决定预测精度和模型复杂性之间平衡的因素所在。本文采用考虑残差自相关绝对值均值的多准则性能度量方法来解决这一问题。案例研究应用时间序列分析将数据集分解为趋势、季节性和残差，并从残差的随机性中总结出局限性和建议。结果表明，统计模型对低随机性的数据集表现最好，深度学习模型专门用于预测波动和长期时间序列数据，机器学习模型可以用于预测介于前两类之间的数值特征的数据集。结论可以为选择合适的模型提供建议，并指导研究界将精力集中在更可行或更有前景的方向上。

{"title":"A comprehensive evaluation of statistical, machine learning and deep learning models for time series prediction","authors":"A. Xuan, Mengmeng Yin, Yupei Li, Xiyu Chen, Zhenliang Ma","doi":"10.1109/CDMA54072.2022.00014","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00014","url":null,"abstract":"How to choose the appropriate model to predict the time series is one of the most prominent activities of temporal data analysis. Empirical evidence is often adopted to select the most suitable model since there is no unified standard for matching data and models. Data characteristics affect model performance to a certain extent and maybe where the factors that determine the balance between prediction accuracy and model complexity are. In this article, Multi-Criteria Performance Measure method considering Mean of Absolute Value of the Residual Autocorrelation was adopted to address this problem. Case studies apply Time-Series Analysis decomposing datasets into trend, seasonality and residue and summarize the limitations and recommendations from the stochasticity of the residue. The results show that the statistical models perform best for datasets with low stochasticity, deep learning models specialize in forecasting fluctuant and long-term time series data, machine learning models could be candidates for datasets that possess numerical characters between the previous two categories. Conclusions could provide suggestions in selecting appropriate models and guide the research community in focusing the effort on more feasible or promising directions.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121257749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

One Voice is All You Need: A One-Shot Approach to Recognize Your Voice 一个声音就是你所需要的:一次识别你的声音的方法

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00022

Priata Nowshin, Shahriar Rumi Dipto, Intesur Ahmed, Deboraj Chowdhury, Galib Abdun Noor, Amitabha Chakrabarty, Muhammad Tahmeed Abdullah, Moshiur Rahman

In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.

在计算机视觉领域，单次学习已经被证明是有效的，因为它可以准确地使用单个标记的训练样例和少量的训练集。在单次学习中，我们必须根据每个新类的一个样本准确地做出预测。在本文中，我们研究了一种学习暹罗神经网络的策略，该网络使用独特的结构来自动评估输入之间的相似性。本文的目标是通过提取特定的特征，将一次性学习的概念应用到音频分类中，其中使用三重损失来训练模型通过Siamese网络学习，并在通过支持集和查询集进行测试时计算相似率。我们已经在librisspeech ASR语料库上进行了实验。我们评估了我们在N-way-1-shot学习方面的工作，并在2-way(100%)、3-way(95%)、4-way(84%)和5-way(74%)方面产生了强有力的结果，大大优于现有的机器学习模型。据我们所知，这可能是第一篇研究使用Siamese网络在librisspeech ASR语料库上进行一次性人类语音识别的可能性的论文。

{"title":"One Voice is All You Need: A One-Shot Approach to Recognize Your Voice","authors":"Priata Nowshin, Shahriar Rumi Dipto, Intesur Ahmed, Deboraj Chowdhury, Galib Abdun Noor, Amitabha Chakrabarty, Muhammad Tahmeed Abdullah, Moshiur Rahman","doi":"10.1109/CDMA54072.2022.00022","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00022","url":null,"abstract":"In the field of computer vision, one-shot learning has proven to be effective, as it works accurately with a single labeled training example and a small number of training sets. In one-shot learning, we must accurately make predictions based on only one sample of each new class. In this paper, we look at a strategy for learning Siamese neural networks that use a distinctive structure to automatically evaluate the similarity between inputs. The goal of this paper is to apply the concept of one-shot learning to audio classification by extracting specific features, where it uses triplet loss to train the model to learn through Siamese network and calculates the rate of similarity while testing via a support set and a query set. We have executed our experiment on LibriSpeech ASR corpus. We evaluated our work on N-way-1-shot learning and generated strong results for 2-way (100%), 3-way (95%), 4-way (84%), and 5-way (74%) that outperform existing machine learning models by a large margin. To the best of our knowledge, this may be the first paper to look at the possibility of one-shot human speech recognition on the LibriSpeech ASR corpus using the Siamese network.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131357346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Depression Detection in Arabic Using Speech Language Recognition 基于语音语言识别的阿拉伯语抑郁检测

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00015

Zainab Alsharif, Salma Elhag, S. Alfakeh

Depression is one of the most common mental illnesses. Inaccurate assessments and misdiagnosis of illness are quite common for such mental disorder. In response to the issue of inaccurate assessment and misdiagnosis of depression, this study discusses the use of speech-language recognition to improve the detection of depression in Arabic speech. In this study, we extract speech features after collecting the dataset. These speech features can be obtained from both linguistic (uttered words) and para-linguistic (acoustic cues) features which we focus on. We classify the participants into two groups: clinically depressed and non-depressed. To do that, we start by recording speeches from interviews with the two groups. Then we extract para-linguistic features by using MFCC to help in building a model to detect depression. We use CNN to build the classification model. The accuracy of the classification model is 98% which will help in detecting depression depending on audio data.

抑郁症是最常见的精神疾病之一。对于这种精神障碍，不准确的评估和误诊是很常见的。针对抑郁症的不准确评估和误诊问题，本研究探讨了使用语音语言识别来提高阿拉伯语语音中抑郁症的检测。在本研究中，我们在收集数据集后提取语音特征。这些语音特征可以从我们关注的语言特征(发出的单词)和准语言特征(声音线索)中获得。我们将参与者分为两组:临床抑郁和非抑郁。为了做到这一点，我们首先记录了对这两组人的采访。然后，我们利用MFCC提取准语言特征，帮助建立抑郁症检测模型。我们使用CNN来建立分类模型。该分类模型的准确率为98%，有助于根据音频数据检测抑郁症。

引用次数: 0

An Empirical Analysis of Health-Related Campaigns on Twitter Arabic Hashtags Twitter阿拉伯语标签上健康相关活动的实证分析

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00011

Niddal H. Imam, V. Vassilakis, Dimitris Kolovos

Twitter trending hashtags are a primary feature, where users regularly visit to get news or chat with each other. However, this valuable feature has been abused by malicious campaigns that use Twitter hashtags to disseminate religious hatred, promote terrorist propaganda, distribute fake financial news, and spread healthcare rumours. In recent years, some health-related campaigns flooded Arabic trending hashtags in Twitter. These campaigns not only irritate users, but they also distribute malicious content. In this paper, a comprehensive empirical analysis of the ongoing health-related campaigns on Twitter Arabic hashtags is presented. After collecting and an-notating tweets posted by these campaigns, we qualitatively analyzed the characteristics and behaviours of these tweets. We seek to find out what makes some of the tweets posted by these campaigns difficult to detect. Two main findings were identified: (1) these campaigns exhibit some spamming activities, such as using bots and trolls, (2) they use unique hijacked accounts as adversarial examples to obfuscate detection. This study is the first to qualitatively analyze health-related campaigns on Twitter Arabic hashtags from security point of view. Our findings suggest that some of the tweets posted by these campaigns need to be treated as adversarial examples that have not only been crafted to evade detection but also to undermine the deployed detection system.

Twitter的热门话题标签是一个主要功能，用户可以定期访问这些标签获取新闻或相互聊天。然而，这一有价值的功能被恶意活动滥用，这些恶意活动利用Twitter标签传播宗教仇恨、宣传恐怖主义宣传、传播虚假金融新闻和传播医疗谣言。近年来，一些与健康相关的活动在推特上的阿拉伯语热门标签上泛滥。这些活动不仅会激怒用户，还会传播恶意内容。在本文中，对Twitter阿拉伯语标签上正在进行的健康相关运动进行了全面的实证分析。在收集并标注这些活动发布的推文后，我们定性地分析了这些推文的特征和行为。我们试图找出是什么让这些活动发布的一些推文难以被发现。研究发现了两个主要发现:(1)这些活动展示了一些垃圾邮件活动，例如使用机器人和巨魔;(2)他们使用独特的被劫持账户作为对抗示例来混淆检测。这项研究首次从安全的角度定性分析了Twitter阿拉伯语标签上与健康相关的活动。我们的研究结果表明，这些活动发布的一些推文需要被视为敌对的例子，这些例子不仅是为了逃避检测，而且还破坏了部署的检测系统。

{"title":"An Empirical Analysis of Health-Related Campaigns on Twitter Arabic Hashtags","authors":"Niddal H. Imam, V. Vassilakis, Dimitris Kolovos","doi":"10.1109/CDMA54072.2022.00011","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00011","url":null,"abstract":"Twitter trending hashtags are a primary feature, where users regularly visit to get news or chat with each other. However, this valuable feature has been abused by malicious campaigns that use Twitter hashtags to disseminate religious hatred, promote terrorist propaganda, distribute fake financial news, and spread healthcare rumours. In recent years, some health-related campaigns flooded Arabic trending hashtags in Twitter. These campaigns not only irritate users, but they also distribute malicious content. In this paper, a comprehensive empirical analysis of the ongoing health-related campaigns on Twitter Arabic hashtags is presented. After collecting and an-notating tweets posted by these campaigns, we qualitatively analyzed the characteristics and behaviours of these tweets. We seek to find out what makes some of the tweets posted by these campaigns difficult to detect. Two main findings were identified: (1) these campaigns exhibit some spamming activities, such as using bots and trolls, (2) they use unique hijacked accounts as adversarial examples to obfuscate detection. This study is the first to qualitatively analyze health-related campaigns on Twitter Arabic hashtags from security point of view. Our findings suggest that some of the tweets posted by these campaigns need to be treated as adversarial examples that have not only been crafted to evade detection but also to undermine the deployed detection system.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124123482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards Using Deep Reinforcement Learning for Better COVID-19 Vaccine Distribution Strategies 利用深度强化学习优化COVID-19疫苗分配策略

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00007

F. Trad, Salah El Falou

Vaccination has been the most promising hope to get back to normal ever since the COVID-19 outbreak started. But as promising as this sounds, vaccinating all of the population at the same time is practically infeasible because of the limited supply of vaccines from one side and the high demand from the other side. So, the process cannot happen overnight, and this is why governments kept thinking about how they can distribute vaccines in a way that helps their citizens get back to normal with the least possible damages (infections and deaths). In this study, we investigate how Reinforcement Learning (RL) can be used to distribute vaccines more efficiently among the citizens of a country, given their age and profession. For this reason, we created an RL agent that learns vaccine distribution strategies through its interaction with a Monte Carlo (MC) simulation environment that we built. This environment runs an Agent-Based Model (ABM) where we have agents interacting with each other and with the environment where they live and based on their behavior, the virus will spread. The goal of the RL agent was to find vaccine distribution strategies that would minimize the number of infections and deaths in the environment where our agents live. After training our RL agent for 100 episodes, we compared the best strategy that RL gave us with some of the well-known strategies that countries adopt, and we found that the RL stratezy outperformed them.

自COVID-19爆发以来，疫苗接种一直是恢复正常的最有希望的希望。但是，尽管这听起来很有希望，但同时为所有人口接种实际上是不可行的，因为一方的疫苗供应有限，另一方的需求很高。因此，这个过程不可能在一夜之间发生，这就是为什么政府一直在思考如何以一种帮助其公民以尽可能少的损害(感染和死亡)恢复正常的方式分发疫苗。在这项研究中，我们研究了如何使用强化学习(RL)在一个国家的公民中更有效地分配疫苗，给定他们的年龄和职业。出于这个原因，我们创建了一个RL代理，它通过与我们构建的蒙特卡罗(MC)模拟环境的交互来学习疫苗分发策略。该环境运行基于代理的模型(ABM)，在该模型中，代理相互作用，并与它们所处的环境相互作用，根据它们的行为，病毒将传播。RL代理人的目标是找到疫苗分配策略，以最大限度地减少代理人所生活环境中的感染和死亡人数。在训练我们的RL代理100集之后，我们将RL提供给我们的最佳策略与各国采用的一些知名策略进行了比较，我们发现RL策略优于它们。

{"title":"Towards Using Deep Reinforcement Learning for Better COVID-19 Vaccine Distribution Strategies","authors":"F. Trad, Salah El Falou","doi":"10.1109/CDMA54072.2022.00007","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00007","url":null,"abstract":"Vaccination has been the most promising hope to get back to normal ever since the COVID-19 outbreak started. But as promising as this sounds, vaccinating all of the population at the same time is practically infeasible because of the limited supply of vaccines from one side and the high demand from the other side. So, the process cannot happen overnight, and this is why governments kept thinking about how they can distribute vaccines in a way that helps their citizens get back to normal with the least possible damages (infections and deaths). In this study, we investigate how Reinforcement Learning (RL) can be used to distribute vaccines more efficiently among the citizens of a country, given their age and profession. For this reason, we created an RL agent that learns vaccine distribution strategies through its interaction with a Monte Carlo (MC) simulation environment that we built. This environment runs an Agent-Based Model (ABM) where we have agents interacting with each other and with the environment where they live and based on their behavior, the virus will spread. The goal of the RL agent was to find vaccine distribution strategies that would minimize the number of infections and deaths in the environment where our agents live. After training our RL agent for 100 episodes, we compared the best strategy that RL gave us with some of the well-known strategies that countries adopt, and we found that the RL stratezy outperformed them.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122122297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Intrusion Detection on QUIC Traffic: A Machine Learning Approach 快速流量入侵检测:一种机器学习方法

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00037

Lama Al-Bakhat, Sultan Almuhammadi

Since the introduction of QUIC protocol, a major change has affected the Internet transport layer, which improves user experience with some security threats. Developed by Google in 2012, QUIC provides a low latency, connection-oriented and encrypted transport. In addition to the encryption capability of QUIC, it overcomes many issues found in the current transport protocols, such as the high-latency connection establishment in TCP. On the other hand, studies on the security analysis of QUIC's key establishment showed several drawbacks. Moreover, the encryption mechanism of the protocol allows adversarial Command & Control (C2) packets to blind with regular QUIC traffic without raising any alarms. Therefore, in this study, we develop a machine learning approach based on fingerprinting that can be used in intrusion detection systems to detect malicious C2 QUIC traffic. To demonstrate the effectiveness of this approach, we conducted an experiment and tested the performance of six machine learning classifiers. The results show that by utilizing the fingerprint, most of the classifiers recognized malicious C2 traffic with an average accuracy of 98%.

自QUIC协议引入以来，互联网传输层发生了重大变化，在一些安全威胁的情况下改善了用户体验。QUIC由谷歌于2012年开发，提供低延迟、面向连接和加密的传输。QUIC除了具有加密能力外，还克服了TCP中建立连接的高延迟等现有传输协议存在的问题。另一方面，对QUIC密钥建立安全性分析的研究也显示出一些不足。此外，该协议的加密机制允许对抗式命令与控制(C2)数据包与常规QUIC流量盲目，而不会引起任何警报。因此，在本研究中，我们开发了一种基于指纹的机器学习方法，可用于入侵检测系统，以检测恶意C2 QUIC流量。为了证明这种方法的有效性，我们进行了一个实验，并测试了六个机器学习分类器的性能。结果表明，利用指纹识别，大多数分类器识别恶意C2流量的平均准确率为98%。

{"title":"Intrusion Detection on QUIC Traffic: A Machine Learning Approach","authors":"Lama Al-Bakhat, Sultan Almuhammadi","doi":"10.1109/CDMA54072.2022.00037","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00037","url":null,"abstract":"Since the introduction of QUIC protocol, a major change has affected the Internet transport layer, which improves user experience with some security threats. Developed by Google in 2012, QUIC provides a low latency, connection-oriented and encrypted transport. In addition to the encryption capability of QUIC, it overcomes many issues found in the current transport protocols, such as the high-latency connection establishment in TCP. On the other hand, studies on the security analysis of QUIC's key establishment showed several drawbacks. Moreover, the encryption mechanism of the protocol allows adversarial Command & Control (C2) packets to blind with regular QUIC traffic without raising any alarms. Therefore, in this study, we develop a machine learning approach based on fingerprinting that can be used in intrusion detection systems to detect malicious C2 QUIC traffic. To demonstrate the effectiveness of this approach, we conducted an experiment and tested the performance of six machine learning classifiers. The results show that by utilizing the fingerprint, most of the classifiers recognized malicious C2 traffic with an average accuracy of 98%.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129884525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs 预处理:一个具有高级api的可定制数据预处理框架

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00039

Saif Kazi, Priyesh Vakharia, Parth Shah, Riya Gupta, Yash Tailor, Palak Mantry, Jash Rathod

Data preprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing structured data. Preprocessy pipelines come with sane defaults and the framework also provides low-level functions to build custom pipelines. The paper gives a brief overview of the features and the high-level APIs of Preprocessy along with a performance comparison against Scikit-learn and Pandas on two datasets. Preprocessy provides functions for handling missing data and outliers, data normalisation, feature selection and data sampling. The goal of Preprocessy is to be easy to use, flexible and performant. Preprocessy helps beginners and experts alike by making data preprocessing an easier and faster task.

数据预处理是数据挖掘和机器学习的重要前提。在本文中，我们介绍了Preprocessy，这是一个Python框架，它为处理结构化数据提供了可定制的数据预处理管道。预处理管道具有相同的默认值，框架还提供低级函数来构建自定义管道。本文简要概述了Preprocessy的特性和高级api，并在两个数据集上与Scikit-learn和Pandas进行了性能比较。预处理提供了缺失数据和异常值处理、数据归一化、特征选择和数据采样等功能。预处理的目标是易于使用、灵活和高性能。预处理使数据预处理变得更容易、更快，从而帮助初学者和专家。

引用次数: 0

Intelligent Maintenance Recommender System 智能维护推荐系统

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

Pub Date : 2022-03-01 DOI: 10.1109/CDMA54072.2022.00040

Abdullatif Al-Najim, Abrar Al-Amoudi, Kenji Ooishi, Mustafa Al-Nasser

Recommendation engine's techniques have proved their performance in different fields such as Amazon and Netflix. This paper discusses the usage of the recommendation engine concept in the industrial field, especially in maintenance operations. Nowadays, the plant maintenance team needs to make a maintenance plan against sudden asset failure, to reduce unscheduled production downtime. However, the planning takes a lot of time, because the appropriate maintenance countermeasures are chosen from many options depending on the failure condition and asset environment. Therefore, we try to suggest a reliable countermeasure against the failure conditions to make the planning time short. In this work, we propose two approaches for the maintenance recommender systems based on artificial intelligence techniques to recommend the maintenance actions. The first approach is a single-stage recommender system that reads the defect information and its description entered by the operator to recommend the maintenance action for similar defects found in the historical data. The second approach is a multi-stage recommender system where the system starts by estimating one of the maintenance attributes which be used as an input for the next stage to estimate the next maintenance attribute. Finally, we will evaluate the accuracy of the recommendation by using past maintenance report which contains defect condition and maintenance actions adopted actually in the past. We found that the multi-stage system outperformed the single-stage system in terms of accuracy, and the multistage system is possibly helped the maintenance team against the sudden asset failure with the maintenance action recommendation.

推荐引擎的技术已经在亚马逊和Netflix等不同领域证明了它们的性能。本文讨论了推荐引擎概念在工业领域，特别是在维修操作中的应用。如今，工厂维护团队需要针对突发资产故障制定维护计划，以减少计划外的生产停机时间。然而，计划需要花费大量时间，因为根据故障条件和资产环境从许多选项中选择适当的维护对策。因此，我们试图针对故障情况提出可靠的对策，以缩短规划时间。在这项工作中，我们提出了两种基于人工智能技术的维修推荐系统的维修动作推荐方法。第一种方法是单阶段推荐系统，它读取缺陷信息和操作员输入的描述，以推荐在历史数据中发现的类似缺陷的维护操作。第二种方法是多阶段推荐系统，系统首先估计一个维护属性，作为下一阶段估计下一个维护属性的输入。最后，我们将通过使用过去的维护报告来评估建议的准确性，该报告包含过去实际采取的缺陷状况和维护行动。我们发现，多级系统在准确率上优于单级系统，多级系统有可能帮助维修团队应对资产的突发故障，并提供维修行动建议。

{"title":"Intelligent Maintenance Recommender System","authors":"Abdullatif Al-Najim, Abrar Al-Amoudi, Kenji Ooishi, Mustafa Al-Nasser","doi":"10.1109/CDMA54072.2022.00040","DOIUrl":"https://doi.org/10.1109/CDMA54072.2022.00040","url":null,"abstract":"Recommendation engine's techniques have proved their performance in different fields such as Amazon and Netflix. This paper discusses the usage of the recommendation engine concept in the industrial field, especially in maintenance operations. Nowadays, the plant maintenance team needs to make a maintenance plan against sudden asset failure, to reduce unscheduled production downtime. However, the planning takes a lot of time, because the appropriate maintenance countermeasures are chosen from many options depending on the failure condition and asset environment. Therefore, we try to suggest a reliable countermeasure against the failure conditions to make the planning time short. In this work, we propose two approaches for the maintenance recommender systems based on artificial intelligence techniques to recommend the maintenance actions. The first approach is a single-stage recommender system that reads the defect information and its description entered by the operator to recommend the maintenance action for similar defects found in the historical data. The second approach is a multi-stage recommender system where the system starts by estimating one of the maintenance attributes which be used as an input for the next stage to estimate the next maintenance attribute. Finally, we will evaluate the accuracy of the recommendation by using past maintenance report which contains defect condition and maintenance actions adopted actually in the past. We found that the multi-stage system outperformed the single-stage system in terms of accuracy, and the multistage system is possibly helped the maintenance team against the sudden asset failure with the maintenance action recommendation.","PeriodicalId":313042,"journal":{"name":"2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129236375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 7th International Conference on Data Science and Machine Learning Applications (CDMA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀