Sophisticated cyber-attacks and ever-evolving threats have made securing networks highly complex due to the advent of Big data and Connected systems, and inaccuracy and incompetency of current Network Intrusion Detection Systems (NIDS). This poses a need for better network intrusion detection models to enhance network security and secure communication channels in the future. Over the years, machine learning and deep learning models have proven to be effective in detecting network intrusion and classification of attacks on networks. In this paper, we present our proposed NIDS based on machine learning and deep learning techniques to enhance the performance of current network intrusion detection systems. Decision tree, ensemble machine learning techniques like Random Forest and XGBoost, and Deep Neural Networks (DNN) have been used on the modern substitutes of the benchmark KDD CUP 99 dataset, the NSL KDD, and the UNSW NB-15. We apply unique feature selection methods and achieve competitive results. For Binary Classification, the results show that our models achieve high accuracies of more than 99.25% for the NSL KDD dataset and above 93% for UNSW NB15 dataset. For Multiclass Classification, our models achieve accuracies of more than 97.70% for NSL KDD and above S2.50% for the UNSW NB15 dataset.
由于大数据和互联系统的出现,以及当前网络入侵检测系统(NIDS)的不准确性和不能力,复杂的网络攻击和不断发展的威胁使网络安全变得高度复杂。这就要求未来需要更好的网络入侵检测模型来提高网络的安全性和通信通道的安全性。多年来,机器学习和深度学习模型已被证明在检测网络入侵和对网络攻击分类方面是有效的。在本文中,我们提出了基于机器学习和深度学习技术的NIDS,以提高当前网络入侵检测系统的性能。决策树、集成机器学习技术(如随机森林和XGBoost)和深度神经网络(DNN)已被用于基准KDD CUP 99数据集、NSL KDD和UNSW NB-15的现代替代品。我们采用独特的特征选择方法,取得了具有竞争力的结果。对于二元分类,我们的模型在NSL KDD数据集上的准确率超过99.25%,在UNSW NB15数据集上的准确率超过93%。对于多类分类,我们的模型在NSL KDD上的准确率超过97.70%,在UNSW NB15数据集上的准确率超过S2.50%。
{"title":"Improving the Classification Effectiveness of Network Intrusion Detection Using Ensemble Machine Learning Techniques and Deep Neural Networks","authors":"Yunpeng Zhang, Yash Gandhi, Zhixia Li, Zhiwen Xiao","doi":"10.1109/IDSTA55301.2022.9923205","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923205","url":null,"abstract":"Sophisticated cyber-attacks and ever-evolving threats have made securing networks highly complex due to the advent of Big data and Connected systems, and inaccuracy and incompetency of current Network Intrusion Detection Systems (NIDS). This poses a need for better network intrusion detection models to enhance network security and secure communication channels in the future. Over the years, machine learning and deep learning models have proven to be effective in detecting network intrusion and classification of attacks on networks. In this paper, we present our proposed NIDS based on machine learning and deep learning techniques to enhance the performance of current network intrusion detection systems. Decision tree, ensemble machine learning techniques like Random Forest and XGBoost, and Deep Neural Networks (DNN) have been used on the modern substitutes of the benchmark KDD CUP 99 dataset, the NSL KDD, and the UNSW NB-15. We apply unique feature selection methods and achieve competitive results. For Binary Classification, the results show that our models achieve high accuracies of more than 99.25% for the NSL KDD dataset and above 93% for UNSW NB15 dataset. For Multiclass Classification, our models achieve accuracies of more than 97.70% for NSL KDD and above S2.50% for the UNSW NB15 dataset.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122879410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-05DOI: 10.1109/IDSTA55301.2022.9923132
Dana Alsagheer, Hadi Mansourifar, Mohammad Mahdi Dehshibi, W. Shi
When English clubs and the game’s governing bodies and organizations turned off their Facebook, Twitter, and Instagram accounts from April 30 to May 1, 2021, the fight against online racism regained a new momentum. However, the Tokyo Olympics revealed new aspects of online bullying that athletes may face during major sporting events. Despite the significant effort put into online hate speech detection research in general, hate speech detection against athletes requires a separate investigation. We show in this paper that abusive language directed at athletes is more varied and difficult to detect. We began with the introduction of the collected data from online comments aimed at three athletes competing in the Tokyo Olympics 2020. Followed by conducting an extensive classification experiments of the collected data to demonstrate its diversity in comparison to other hate speech datasets. This was done to demonstrate that Active Learning outperforms Supervised Learning in hate speech detection against athletes.
{"title":"Detecting Hate Speech Against Athletes in Social Media","authors":"Dana Alsagheer, Hadi Mansourifar, Mohammad Mahdi Dehshibi, W. Shi","doi":"10.1109/IDSTA55301.2022.9923132","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923132","url":null,"abstract":"When English clubs and the game’s governing bodies and organizations turned off their Facebook, Twitter, and Instagram accounts from April 30 to May 1, 2021, the fight against online racism regained a new momentum. However, the Tokyo Olympics revealed new aspects of online bullying that athletes may face during major sporting events. Despite the significant effort put into online hate speech detection research in general, hate speech detection against athletes requires a separate investigation. We show in this paper that abusive language directed at athletes is more varied and difficult to detect. We began with the introduction of the collected data from online comments aimed at three athletes competing in the Tokyo Olympics 2020. Followed by conducting an extensive classification experiments of the collected data to demonstrate its diversity in comparison to other hate speech datasets. This was done to demonstrate that Active Learning outperforms Supervised Learning in hate speech detection against athletes.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133468726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-05DOI: 10.1109/IDSTA55301.2022.9923041
Naresh Kumar, Abdul Khadar Jilani, Pavan Kumar, Anastasija Nikiforova
The research problems on Object detection have been attracted with major issues in the computer vision domain. Object detection based on images from unmanned aerial vehicles (UAV) - drones, has versatile applications in both defence security, agriculture and GIS. However, real-time object detection in UAV scenarios remains quite a tedious problem due to environmental obstructions such as occlusion and view-invariant conditions despite the high number of solutions proposed to solve this task. This paper proposes an improved YOLOv3-tiny object detector by introducing a multi-dilated module between the convolution unit and the receptive field, where the problem of a small number of positive training samples is solved by a larger size of the predicted feature map thereby reducing the rate of label rewriting in YOLOv3-tiny. We find that the fusion of multi-scale receptive fields is effective in detecting even every single tiny object. We introduce a path aggregation module that merges the semantic information in a deeper layer and detailed information in an earlier layer. The analysis of the proposed solution shows that on the VisDrone2019-Det test set, our proposed model is more efficient and effective, running 2.96% times faster and increasing 4.0% AP50 than YOLOv3.
{"title":"Improved YOLOv3-tiny Object Detector with Dilated CNN for Drone-Captured Images","authors":"Naresh Kumar, Abdul Khadar Jilani, Pavan Kumar, Anastasija Nikiforova","doi":"10.1109/IDSTA55301.2022.9923041","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923041","url":null,"abstract":"The research problems on Object detection have been attracted with major issues in the computer vision domain. Object detection based on images from unmanned aerial vehicles (UAV) - drones, has versatile applications in both defence security, agriculture and GIS. However, real-time object detection in UAV scenarios remains quite a tedious problem due to environmental obstructions such as occlusion and view-invariant conditions despite the high number of solutions proposed to solve this task. This paper proposes an improved YOLOv3-tiny object detector by introducing a multi-dilated module between the convolution unit and the receptive field, where the problem of a small number of positive training samples is solved by a larger size of the predicted feature map thereby reducing the rate of label rewriting in YOLOv3-tiny. We find that the fusion of multi-scale receptive fields is effective in detecting even every single tiny object. We introduce a path aggregation module that merges the semantic information in a deeper layer and detailed information in an earlier layer. The analysis of the proposed solution shows that on the VisDrone2019-Det test set, our proposed model is more efficient and effective, running 2.96% times faster and increasing 4.0% AP50 than YOLOv3.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116091753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-05DOI: 10.1109/IDSTA55301.2022.9923087
Alankrit Mishra, N. Raj, Garima Bajwa
While capable of segregating visual data, humans take time to examine a single piece, let alone thousands or millions of samples. The deep learning models efficiently process sizeable information with the help of modern-day computing. However, their questionable decision-making process has raised considerable concerns. Recent studies have identified a new approach to extract image features from EEG signals and combine them with standard image features. These approaches make deep learning models more interpretable and also enables faster converging of models with fewer samples. Inspired by recent studies, we developed an efficient way of encoding EEG signals as images to facilitate a more subtle understanding of brain signals with deep learning models. Using two variations in such encoding methods, we classified the encoded EEG signals corresponding to 39 image classes with a benchmark accuracy of 70% on the layered dataset of six subjects, which is significantly higher than the existing work. Our image classification approach with combined EEG features achieved an accuracy of 82% compared to the slightly better accuracy of a pure deep learning approach; nevertheless, it demonstrates the viability of the theory.
{"title":"EEG-based Image Feature Extraction for Visual Classification using Deep Learning","authors":"Alankrit Mishra, N. Raj, Garima Bajwa","doi":"10.1109/IDSTA55301.2022.9923087","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923087","url":null,"abstract":"While capable of segregating visual data, humans take time to examine a single piece, let alone thousands or millions of samples. The deep learning models efficiently process sizeable information with the help of modern-day computing. However, their questionable decision-making process has raised considerable concerns. Recent studies have identified a new approach to extract image features from EEG signals and combine them with standard image features. These approaches make deep learning models more interpretable and also enables faster converging of models with fewer samples. Inspired by recent studies, we developed an efficient way of encoding EEG signals as images to facilitate a more subtle understanding of brain signals with deep learning models. Using two variations in such encoding methods, we classified the encoded EEG signals corresponding to 39 image classes with a benchmark accuracy of 70% on the layered dataset of six subjects, which is significantly higher than the existing work. Our image classification approach with combined EEG features achieved an accuracy of 82% compared to the slightly better accuracy of a pure deep learning approach; nevertheless, it demonstrates the viability of the theory.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116142075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-22DOI: 10.1109/IDSTA55301.2022.9923036
Alexander Kalinowski, Yuan An
The majority of knowledge graph embedding techniques treat entities and predicates as separate embedding matrices, using aggregation functions to build a representation of the input triple. However, these aggregations are lossy, i.e. they do not capture the semantics of the original triples, such as information contained in the predicates. To combat these shortcomings, current methods learn triple embeddings from scratch without utilizing entity and predicate embeddings from pre-trained models. In this paper, we design a novel fine-tuning approach for learning triple embeddings by creating weak supervision signals from pre-trained knowledge graph embeddings. We develop a method for automatically sampling triples from a knowledge graph and estimating their pairwise similarities from pre-trained embedding models. These pairwise similarity scores are then fed to a Siamese-like neural architecture to fine-tune triple representations. We evaluate the proposed method on two widely studied knowledge graphs and show consistent improvement over other state-of-the-art triple embedding methods on triple classification and triple clustering tasks.
{"title":"Repurposing Knowledge Graph Embeddings for Triple Representation via Weak Supervision","authors":"Alexander Kalinowski, Yuan An","doi":"10.1109/IDSTA55301.2022.9923036","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923036","url":null,"abstract":"The majority of knowledge graph embedding techniques treat entities and predicates as separate embedding matrices, using aggregation functions to build a representation of the input triple. However, these aggregations are lossy, i.e. they do not capture the semantics of the original triples, such as information contained in the predicates. To combat these shortcomings, current methods learn triple embeddings from scratch without utilizing entity and predicate embeddings from pre-trained models. In this paper, we design a novel fine-tuning approach for learning triple embeddings by creating weak supervision signals from pre-trained knowledge graph embeddings. We develop a method for automatically sampling triples from a knowledge graph and estimating their pairwise similarities from pre-trained embedding models. These pairwise similarity scores are then fed to a Siamese-like neural architecture to fine-tune triple representations. We evaluate the proposed method on two widely studied knowledge graphs and show consistent improvement over other state-of-the-art triple embedding methods on triple classification and triple clustering tasks.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126100549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-09DOI: 10.1109/IDSTA55301.2022.9923047
A.A. Govoruhina, Anastasija Nikiforova
Today, more and more people are reporting allergies, which can range from simple reactions / discomfort to anaphylactic shocks. Other people may not be allergic but avoid certain foods for personal reasons. Daily food shopping of these people is hampered by the fact that unwanted ingredients can be hidden in any food, and it is difficult to find them all. The paper presents a digital health shopping assistant called “Diet Helper”, aimed to make life easier for such people by making it easy to determine whether a product is suitable for consumption, according to the specific dietary requirements of both types - existing diet and self-defined. This is achieved by capturing ingredient label, received by the app as an input, which the app analyses, converting the captured label to text, and filters out unwanted ingredients that according to the user should be avoided as either allergens or products to which the consumer is intolerant, helping the user decide if the product is suitable for consumption. This should make daily grocery shopping easier by providing the user with more accurate and simplified product selection in seconds, reducing the total time spent in the grocery stores, which is especially relevant in light of COVID-19, although it was and will remain out of it due to the busy schedules and active rhythm of life of modern society. The app is developed using the React Native framework and Google Firebase platform, which makes it easy to develop, use and extend such solutions thereby encouraging to start actively developing solutions that could improve wellbeing.
{"title":"Digital health shopping assistant with React Native: a simple technological solution to a complex health problem","authors":"A.A. Govoruhina, Anastasija Nikiforova","doi":"10.1109/IDSTA55301.2022.9923047","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923047","url":null,"abstract":"Today, more and more people are reporting allergies, which can range from simple reactions / discomfort to anaphylactic shocks. Other people may not be allergic but avoid certain foods for personal reasons. Daily food shopping of these people is hampered by the fact that unwanted ingredients can be hidden in any food, and it is difficult to find them all. The paper presents a digital health shopping assistant called “Diet Helper”, aimed to make life easier for such people by making it easy to determine whether a product is suitable for consumption, according to the specific dietary requirements of both types - existing diet and self-defined. This is achieved by capturing ingredient label, received by the app as an input, which the app analyses, converting the captured label to text, and filters out unwanted ingredients that according to the user should be avoided as either allergens or products to which the consumer is intolerant, helping the user decide if the product is suitable for consumption. This should make daily grocery shopping easier by providing the user with more accurate and simplified product selection in seconds, reducing the total time spent in the grocery stores, which is especially relevant in light of COVID-19, although it was and will remain out of it due to the busy schedules and active rhythm of life of modern society. The app is developed using the React Native framework and Google Firebase platform, which makes it easy to develop, use and extend such solutions thereby encouraging to start actively developing solutions that could improve wellbeing.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134029717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-04DOI: 10.1109/IDSTA55301.2022.9923043
Xishuang Dong, Lijun Qian
Social media has become an effective platform to generate and spread fake news that can mislead people and even distort public opinion. Centralized methods for fake news detection, however, cannot effectively protect user privacy during the process of centralized data collection for training models. Moreover, it cannot fully involve user feedback in the loop of learning detection models for further enhancing fake news detection. To overcome these challenges, this paper proposed a novel decentralized method, Human-in-the-loop Based Swarm Learning (HBSL), to integrate user feedback into the loop of learning and inference for recognizing fake news without violating user privacy in a decentralized manner. It consists of distributed nodes that are able to independently learn and detect fake news on local data. Furthermore, detection models trained on these nodes can be enhanced through decentralized model merging. Experimental results demonstrate that the proposed method outperforms the state-of-the-art decentralized method in regard of detecting fake news on a benchmark dataset.
社交媒体已经成为制造和传播虚假新闻的有效平台,这些新闻可以误导人们,甚至扭曲公众舆论。然而,在训练模型的集中数据收集过程中,集中式假新闻检测方法无法有效保护用户隐私。此外,无法将用户反馈充分纳入学习检测模型的循环中,以进一步增强假新闻检测。为了克服这些挑战,本文提出了一种新的去中心化方法——基于人在环的群体学习(Human-in-the-loop Based Swarm Learning, HBSL),将用户反馈整合到学习和推理的循环中,在不以去中心化的方式侵犯用户隐私的情况下识别假新闻。它由分布式节点组成,这些节点能够独立学习和检测本地数据上的假新闻。此外,在这些节点上训练的检测模型可以通过分散的模型合并来增强。实验结果表明,该方法在检测基准数据集上的假新闻方面优于最先进的分散方法。
{"title":"Integrating Human-in-the-loop into Swarm Learning for Decentralized Fake News Detection","authors":"Xishuang Dong, Lijun Qian","doi":"10.1109/IDSTA55301.2022.9923043","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923043","url":null,"abstract":"Social media has become an effective platform to generate and spread fake news that can mislead people and even distort public opinion. Centralized methods for fake news detection, however, cannot effectively protect user privacy during the process of centralized data collection for training models. Moreover, it cannot fully involve user feedback in the loop of learning detection models for further enhancing fake news detection. To overcome these challenges, this paper proposed a novel decentralized method, Human-in-the-loop Based Swarm Learning (HBSL), to integrate user feedback into the loop of learning and inference for recognizing fake news without violating user privacy in a decentralized manner. It consists of distributed nodes that are able to independently learn and detect fake news on local data. Furthermore, detection models trained on these nodes can be enhanced through decentralized model merging. Experimental results demonstrate that the proposed method outperforms the state-of-the-art decentralized method in regard of detecting fake news on a benchmark dataset.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128747814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-03DOI: 10.1109/IDSTA55301.2022.9923169
S. Chowdhury, Yuxiao Lin, Bor-Shuang Liaw, L. Kerby
Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine learning models. In this work, tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset with multicollinearity and compared. Several techniques are necessary, such as data transformation, to achieve a good multiple linear regression model with this dataset; the most useful techniques are discussed. With these techniques, the best multiple linear regression model achieved an $R^{2} = 81. 23%$ and exhibited no multicollinearity effect for the dataset used in this study. Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity. We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of $R^{2} =97.73%$. This study explains why tree-based regressions promise as a machine learning model for non-normally distributed, multicollinear data.
{"title":"Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance","authors":"S. Chowdhury, Yuxiao Lin, Bor-Shuang Liaw, L. Kerby","doi":"10.1109/IDSTA55301.2022.9923169","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923169","url":null,"abstract":"Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine learning models. In this work, tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset with multicollinearity and compared. Several techniques are necessary, such as data transformation, to achieve a good multiple linear regression model with this dataset; the most useful techniques are discussed. With these techniques, the best multiple linear regression model achieved an $R^{2} = 81. 23%$ and exhibited no multicollinearity effect for the dataset used in this study. Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity. We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of $R^{2} =97.73%$. This study explains why tree-based regressions promise as a machine learning model for non-normally distributed, multicollinear data.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-10-17DOI: 10.1109/IDSTA55301.2022.9923032
Debojyoti Seth
Electroencephalography (EEG) gained popularity over similar modalities like Functional Magnetic Resonance Imaging (fMRI) or Functional Near-Infrared Spectroscopy (fNRIS), for being simplistic and non-invasive. One of the biggest challenges of any Brain Computer Interfacing (BCI) techniques, is recovering maximum information from minimal input channels for realistic predictions. To choose EEG channels with highest accuracy, a novel concept of introducing sparsity in a Convolutional Neural Network (CNN) induced modified Common Spatial Pattern (CSP) algorithm is introduced in this paper. This approach helps developing optimized confusion matrices, which can extensively label the feature map in significantly lower number of iterations, to predict trends of growth of symptoms. The concept of compressed sensing is utilized to develop an optimization model for recovering the cosparse signal and retaining maximum information. The state-of-the-art Joint Sparsity Induced Modified Common Spatial Pattern Algorithm and Low Rank Optimization Model (SCSP-LROM) can detect the stage and extent of growth of malignant cells, hemorrhages and lesions.
{"title":"Joint SCSP-LROM: A novel approach to detect Cerebrovascular Anomalies from EEG signals","authors":"Debojyoti Seth","doi":"10.1109/IDSTA55301.2022.9923032","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923032","url":null,"abstract":"Electroencephalography (EEG) gained popularity over similar modalities like Functional Magnetic Resonance Imaging (fMRI) or Functional Near-Infrared Spectroscopy (fNRIS), for being simplistic and non-invasive. One of the biggest challenges of any Brain Computer Interfacing (BCI) techniques, is recovering maximum information from minimal input channels for realistic predictions. To choose EEG channels with highest accuracy, a novel concept of introducing sparsity in a Convolutional Neural Network (CNN) induced modified Common Spatial Pattern (CSP) algorithm is introduced in this paper. This approach helps developing optimized confusion matrices, which can extensively label the feature map in significantly lower number of iterations, to predict trends of growth of symptoms. The concept of compressed sensing is utilized to develop an optimization model for recovering the cosparse signal and retaining maximum information. The state-of-the-art Joint Sparsity Induced Modified Common Spatial Pattern Algorithm and Low Rank Optimization Model (SCSP-LROM) can detect the stage and extent of growth of malignant cells, hemorrhages and lesions.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134447633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}