首页 > 最新文献

2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)最新文献

英文 中文
Improving the Classification Effectiveness of Network Intrusion Detection Using Ensemble Machine Learning Techniques and Deep Neural Networks 利用集成机器学习技术和深度神经网络提高网络入侵检测的分类效率
Yunpeng Zhang, Yash Gandhi, Zhixia Li, Zhiwen Xiao
Sophisticated cyber-attacks and ever-evolving threats have made securing networks highly complex due to the advent of Big data and Connected systems, and inaccuracy and incompetency of current Network Intrusion Detection Systems (NIDS). This poses a need for better network intrusion detection models to enhance network security and secure communication channels in the future. Over the years, machine learning and deep learning models have proven to be effective in detecting network intrusion and classification of attacks on networks. In this paper, we present our proposed NIDS based on machine learning and deep learning techniques to enhance the performance of current network intrusion detection systems. Decision tree, ensemble machine learning techniques like Random Forest and XGBoost, and Deep Neural Networks (DNN) have been used on the modern substitutes of the benchmark KDD CUP 99 dataset, the NSL KDD, and the UNSW NB-15. We apply unique feature selection methods and achieve competitive results. For Binary Classification, the results show that our models achieve high accuracies of more than 99.25% for the NSL KDD dataset and above 93% for UNSW NB15 dataset. For Multiclass Classification, our models achieve accuracies of more than 97.70% for NSL KDD and above S2.50% for the UNSW NB15 dataset.
由于大数据和互联系统的出现,以及当前网络入侵检测系统(NIDS)的不准确性和不能力,复杂的网络攻击和不断发展的威胁使网络安全变得高度复杂。这就要求未来需要更好的网络入侵检测模型来提高网络的安全性和通信通道的安全性。多年来,机器学习和深度学习模型已被证明在检测网络入侵和对网络攻击分类方面是有效的。在本文中,我们提出了基于机器学习和深度学习技术的NIDS,以提高当前网络入侵检测系统的性能。决策树、集成机器学习技术(如随机森林和XGBoost)和深度神经网络(DNN)已被用于基准KDD CUP 99数据集、NSL KDD和UNSW NB-15的现代替代品。我们采用独特的特征选择方法,取得了具有竞争力的结果。对于二元分类,我们的模型在NSL KDD数据集上的准确率超过99.25%,在UNSW NB15数据集上的准确率超过93%。对于多类分类,我们的模型在NSL KDD上的准确率超过97.70%,在UNSW NB15数据集上的准确率超过S2.50%。
{"title":"Improving the Classification Effectiveness of Network Intrusion Detection Using Ensemble Machine Learning Techniques and Deep Neural Networks","authors":"Yunpeng Zhang, Yash Gandhi, Zhixia Li, Zhiwen Xiao","doi":"10.1109/IDSTA55301.2022.9923205","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923205","url":null,"abstract":"Sophisticated cyber-attacks and ever-evolving threats have made securing networks highly complex due to the advent of Big data and Connected systems, and inaccuracy and incompetency of current Network Intrusion Detection Systems (NIDS). This poses a need for better network intrusion detection models to enhance network security and secure communication channels in the future. Over the years, machine learning and deep learning models have proven to be effective in detecting network intrusion and classification of attacks on networks. In this paper, we present our proposed NIDS based on machine learning and deep learning techniques to enhance the performance of current network intrusion detection systems. Decision tree, ensemble machine learning techniques like Random Forest and XGBoost, and Deep Neural Networks (DNN) have been used on the modern substitutes of the benchmark KDD CUP 99 dataset, the NSL KDD, and the UNSW NB-15. We apply unique feature selection methods and achieve competitive results. For Binary Classification, the results show that our models achieve high accuracies of more than 99.25% for the NSL KDD dataset and above 93% for UNSW NB15 dataset. For Multiclass Classification, our models achieve accuracies of more than 97.70% for NSL KDD and above S2.50% for the UNSW NB15 dataset.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122879410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Hate Speech Against Athletes in Social Media 在社交媒体上发现针对运动员的仇恨言论
Dana Alsagheer, Hadi Mansourifar, Mohammad Mahdi Dehshibi, W. Shi
When English clubs and the game’s governing bodies and organizations turned off their Facebook, Twitter, and Instagram accounts from April 30 to May 1, 2021, the fight against online racism regained a new momentum. However, the Tokyo Olympics revealed new aspects of online bullying that athletes may face during major sporting events. Despite the significant effort put into online hate speech detection research in general, hate speech detection against athletes requires a separate investigation. We show in this paper that abusive language directed at athletes is more varied and difficult to detect. We began with the introduction of the collected data from online comments aimed at three athletes competing in the Tokyo Olympics 2020. Followed by conducting an extensive classification experiments of the collected data to demonstrate its diversity in comparison to other hate speech datasets. This was done to demonstrate that Active Learning outperforms Supervised Learning in hate speech detection against athletes.
从2021年4月30日到5月1日,英国俱乐部和足球管理机构关闭了他们的脸书、推特和Instagram账户,打击网络种族主义的斗争重新获得了新的动力。然而,东京奥运会揭示了运动员在重大体育赛事期间可能面临的网络欺凌的新方面。尽管在线仇恨言论检测研究在总体上付出了巨大的努力,但针对运动员的仇恨言论检测需要单独的调查。我们在这篇论文中表明,针对运动员的辱骂语言更加多样化,难以察觉。我们首先介绍了针对2020年东京奥运会三名运动员的在线评论收集的数据。然后对收集的数据进行广泛的分类实验,以证明其与其他仇恨言论数据集相比的多样性。这样做是为了证明主动学习在针对运动员的仇恨言论检测中优于监督学习。
{"title":"Detecting Hate Speech Against Athletes in Social Media","authors":"Dana Alsagheer, Hadi Mansourifar, Mohammad Mahdi Dehshibi, W. Shi","doi":"10.1109/IDSTA55301.2022.9923132","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923132","url":null,"abstract":"When English clubs and the game’s governing bodies and organizations turned off their Facebook, Twitter, and Instagram accounts from April 30 to May 1, 2021, the fight against online racism regained a new momentum. However, the Tokyo Olympics revealed new aspects of online bullying that athletes may face during major sporting events. Despite the significant effort put into online hate speech detection research in general, hate speech detection against athletes requires a separate investigation. We show in this paper that abusive language directed at athletes is more varied and difficult to detect. We began with the introduction of the collected data from online comments aimed at three athletes competing in the Tokyo Olympics 2020. Followed by conducting an extensive classification experiments of the collected data to demonstrate its diversity in comparison to other hate speech datasets. This was done to demonstrate that Active Learning outperforms Supervised Learning in hate speech detection against athletes.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133468726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved YOLOv3-tiny Object Detector with Dilated CNN for Drone-Captured Images 改进的YOLOv3-tiny目标检测器,扩展CNN用于无人机捕获图像
Naresh Kumar, Abdul Khadar Jilani, Pavan Kumar, Anastasija Nikiforova
The research problems on Object detection have been attracted with major issues in the computer vision domain. Object detection based on images from unmanned aerial vehicles (UAV) - drones, has versatile applications in both defence security, agriculture and GIS. However, real-time object detection in UAV scenarios remains quite a tedious problem due to environmental obstructions such as occlusion and view-invariant conditions despite the high number of solutions proposed to solve this task. This paper proposes an improved YOLOv3-tiny object detector by introducing a multi-dilated module between the convolution unit and the receptive field, where the problem of a small number of positive training samples is solved by a larger size of the predicted feature map thereby reducing the rate of label rewriting in YOLOv3-tiny. We find that the fusion of multi-scale receptive fields is effective in detecting even every single tiny object. We introduce a path aggregation module that merges the semantic information in a deeper layer and detailed information in an earlier layer. The analysis of the proposed solution shows that on the VisDrone2019-Det test set, our proposed model is more efficient and effective, running 2.96% times faster and increasing 4.0% AP50 than YOLOv3.
目标检测是计算机视觉领域的热点问题之一。基于无人机(UAV)图像的目标检测在国防安全、农业和地理信息系统中都有广泛的应用。然而,尽管提出了大量的解决方案来解决该任务,但由于遮挡和视图不变条件等环境障碍,无人机场景中的实时目标检测仍然是一个相当繁琐的问题。本文提出了一种改进的YOLOv3-tiny目标检测器,通过在卷积单元和接受野之间引入一个多扩展模块,通过更大的预测特征映射来解决正训练样本数量少的问题,从而降低了YOLOv3-tiny中的标签重写率。我们发现多尺度感受野的融合可以有效地检测到每一个微小的物体。我们引入了一个路径聚合模块,该模块将较深层的语义信息和较早期的详细信息合并在一起。分析表明,在VisDrone2019-Det测试集上,我们提出的模型比YOLOv3运行速度快2.96%,AP50提高4.0%,效率更高。
{"title":"Improved YOLOv3-tiny Object Detector with Dilated CNN for Drone-Captured Images","authors":"Naresh Kumar, Abdul Khadar Jilani, Pavan Kumar, Anastasija Nikiforova","doi":"10.1109/IDSTA55301.2022.9923041","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923041","url":null,"abstract":"The research problems on Object detection have been attracted with major issues in the computer vision domain. Object detection based on images from unmanned aerial vehicles (UAV) - drones, has versatile applications in both defence security, agriculture and GIS. However, real-time object detection in UAV scenarios remains quite a tedious problem due to environmental obstructions such as occlusion and view-invariant conditions despite the high number of solutions proposed to solve this task. This paper proposes an improved YOLOv3-tiny object detector by introducing a multi-dilated module between the convolution unit and the receptive field, where the problem of a small number of positive training samples is solved by a larger size of the predicted feature map thereby reducing the rate of label rewriting in YOLOv3-tiny. We find that the fusion of multi-scale receptive fields is effective in detecting even every single tiny object. We introduce a path aggregation module that merges the semantic information in a deeper layer and detailed information in an earlier layer. The analysis of the proposed solution shows that on the VisDrone2019-Det test set, our proposed model is more efficient and effective, running 2.96% times faster and increasing 4.0% AP50 than YOLOv3.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116091753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
EEG-based Image Feature Extraction for Visual Classification using Deep Learning 基于脑电图的图像特征提取与深度学习视觉分类
Alankrit Mishra, N. Raj, Garima Bajwa
While capable of segregating visual data, humans take time to examine a single piece, let alone thousands or millions of samples. The deep learning models efficiently process sizeable information with the help of modern-day computing. However, their questionable decision-making process has raised considerable concerns. Recent studies have identified a new approach to extract image features from EEG signals and combine them with standard image features. These approaches make deep learning models more interpretable and also enables faster converging of models with fewer samples. Inspired by recent studies, we developed an efficient way of encoding EEG signals as images to facilitate a more subtle understanding of brain signals with deep learning models. Using two variations in such encoding methods, we classified the encoded EEG signals corresponding to 39 image classes with a benchmark accuracy of 70% on the layered dataset of six subjects, which is significantly higher than the existing work. Our image classification approach with combined EEG features achieved an accuracy of 82% compared to the slightly better accuracy of a pure deep learning approach; nevertheless, it demonstrates the viability of the theory.
虽然能够分离视觉数据,但人类需要时间来检查单个片段,更不用说数千或数百万个样本了。深度学习模型在现代计算的帮助下有效地处理大量信息。然而,它们有问题的决策过程引起了相当大的关注。近年来的研究提出了一种从脑电信号中提取图像特征并将其与标准图像特征相结合的新方法。这些方法使深度学习模型更具可解释性,并且可以在较少样本的情况下更快地收敛模型。受最近研究的启发,我们开发了一种有效的方法,将脑电图信号编码为图像,以促进深度学习模型对大脑信号的更微妙的理解。采用两种不同的编码方法,在6个受试者的分层数据集上对39个图像类别对应的编码脑电信号进行了分类,基准准确率达到70%,明显高于已有的工作。与纯深度学习方法相比,我们结合EEG特征的图像分类方法的准确率达到82%;然而,它证明了该理论的可行性。
{"title":"EEG-based Image Feature Extraction for Visual Classification using Deep Learning","authors":"Alankrit Mishra, N. Raj, Garima Bajwa","doi":"10.1109/IDSTA55301.2022.9923087","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923087","url":null,"abstract":"While capable of segregating visual data, humans take time to examine a single piece, let alone thousands or millions of samples. The deep learning models efficiently process sizeable information with the help of modern-day computing. However, their questionable decision-making process has raised considerable concerns. Recent studies have identified a new approach to extract image features from EEG signals and combine them with standard image features. These approaches make deep learning models more interpretable and also enables faster converging of models with fewer samples. Inspired by recent studies, we developed an efficient way of encoding EEG signals as images to facilitate a more subtle understanding of brain signals with deep learning models. Using two variations in such encoding methods, we classified the encoded EEG signals corresponding to 39 image classes with a benchmark accuracy of 70% on the layered dataset of six subjects, which is significantly higher than the existing work. Our image classification approach with combined EEG features achieved an accuracy of 82% compared to the slightly better accuracy of a pure deep learning approach; nevertheless, it demonstrates the viability of the theory.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116142075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Repurposing Knowledge Graph Embeddings for Triple Representation via Weak Supervision 基于弱监督的三重表示知识图嵌入的再利用
Alexander Kalinowski, Yuan An
The majority of knowledge graph embedding techniques treat entities and predicates as separate embedding matrices, using aggregation functions to build a representation of the input triple. However, these aggregations are lossy, i.e. they do not capture the semantics of the original triples, such as information contained in the predicates. To combat these shortcomings, current methods learn triple embeddings from scratch without utilizing entity and predicate embeddings from pre-trained models. In this paper, we design a novel fine-tuning approach for learning triple embeddings by creating weak supervision signals from pre-trained knowledge graph embeddings. We develop a method for automatically sampling triples from a knowledge graph and estimating their pairwise similarities from pre-trained embedding models. These pairwise similarity scores are then fed to a Siamese-like neural architecture to fine-tune triple representations. We evaluate the proposed method on two widely studied knowledge graphs and show consistent improvement over other state-of-the-art triple embedding methods on triple classification and triple clustering tasks.
大多数知识图嵌入技术将实体和谓词作为单独的嵌入矩阵,使用聚合函数来构建输入三元组的表示。然而,这些聚合是有损的,也就是说,它们不能捕获原始三元组的语义,比如谓词中包含的信息。为了克服这些缺点,目前的方法从头开始学习三重嵌入,而不使用预训练模型中的实体和谓词嵌入。在本文中,我们设计了一种新的微调方法,通过从预训练的知识图嵌入中创建弱监督信号来学习三重嵌入。我们开发了一种从知识图中自动采样三元组并从预训练的嵌入模型中估计其成对相似度的方法。然后将这些两两相似性分数输入到类似暹罗的神经结构中,以微调三重表示。我们在两个广泛研究的知识图上评估了所提出的方法,并在三重分类和三重聚类任务上显示出与其他最先进的三重嵌入方法一致的改进。
{"title":"Repurposing Knowledge Graph Embeddings for Triple Representation via Weak Supervision","authors":"Alexander Kalinowski, Yuan An","doi":"10.1109/IDSTA55301.2022.9923036","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923036","url":null,"abstract":"The majority of knowledge graph embedding techniques treat entities and predicates as separate embedding matrices, using aggregation functions to build a representation of the input triple. However, these aggregations are lossy, i.e. they do not capture the semantics of the original triples, such as information contained in the predicates. To combat these shortcomings, current methods learn triple embeddings from scratch without utilizing entity and predicate embeddings from pre-trained models. In this paper, we design a novel fine-tuning approach for learning triple embeddings by creating weak supervision signals from pre-trained knowledge graph embeddings. We develop a method for automatically sampling triples from a knowledge graph and estimating their pairwise similarities from pre-trained embedding models. These pairwise similarity scores are then fed to a Siamese-like neural architecture to fine-tune triple representations. We evaluate the proposed method on two widely studied knowledge graphs and show consistent improvement over other state-of-the-art triple embedding methods on triple classification and triple clustering tasks.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126100549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Digital health shopping assistant with React Native: a simple technological solution to a complex health problem 带有React Native的数字健康购物助手:复杂健康问题的简单技术解决方案
A.A. Govoruhina, Anastasija Nikiforova
Today, more and more people are reporting allergies, which can range from simple reactions / discomfort to anaphylactic shocks. Other people may not be allergic but avoid certain foods for personal reasons. Daily food shopping of these people is hampered by the fact that unwanted ingredients can be hidden in any food, and it is difficult to find them all. The paper presents a digital health shopping assistant called “Diet Helper”, aimed to make life easier for such people by making it easy to determine whether a product is suitable for consumption, according to the specific dietary requirements of both types - existing diet and self-defined. This is achieved by capturing ingredient label, received by the app as an input, which the app analyses, converting the captured label to text, and filters out unwanted ingredients that according to the user should be avoided as either allergens or products to which the consumer is intolerant, helping the user decide if the product is suitable for consumption. This should make daily grocery shopping easier by providing the user with more accurate and simplified product selection in seconds, reducing the total time spent in the grocery stores, which is especially relevant in light of COVID-19, although it was and will remain out of it due to the busy schedules and active rhythm of life of modern society. The app is developed using the React Native framework and Google Firebase platform, which makes it easy to develop, use and extend such solutions thereby encouraging to start actively developing solutions that could improve wellbeing.
今天,越来越多的人报告过敏,从简单的反应/不适到过敏性休克。其他人可能不会过敏,但出于个人原因避免食用某些食物。这些人的日常食品购物受到阻碍,因为任何食物中都可能隐藏着不需要的成分,而且很难全部找到。本文提出了一种名为“饮食助手”的数字健康购物助手,旨在通过根据两种类型(现有饮食和自定义饮食)的特定饮食要求,轻松确定产品是否适合消费,从而使这些人的生活更轻松。这是通过捕获成分标签来实现的,应用程序接收成分标签作为输入,应用程序对其进行分析,将捕获的标签转换为文本,并过滤掉用户认为应该避免的不需要的成分,这些成分要么是过敏原,要么是消费者不耐受的产品,帮助用户决定该产品是否适合消费。通过在几秒钟内为用户提供更准确和简化的产品选择,这将使日常杂货购物更容易,减少在杂货店花费的总时间,这在2019冠状病毒病(COVID-19)的背景下尤其重要,尽管由于现代社会繁忙的日程安排和活跃的生活节奏,它已经并且将继续被排除在外。该应用程序是使用React Native框架和谷歌Firebase平台开发的,这使得开发,使用和扩展这些解决方案变得容易,从而鼓励开始积极开发可以改善健康的解决方案。
{"title":"Digital health shopping assistant with React Native: a simple technological solution to a complex health problem","authors":"A.A. Govoruhina, Anastasija Nikiforova","doi":"10.1109/IDSTA55301.2022.9923047","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923047","url":null,"abstract":"Today, more and more people are reporting allergies, which can range from simple reactions / discomfort to anaphylactic shocks. Other people may not be allergic but avoid certain foods for personal reasons. Daily food shopping of these people is hampered by the fact that unwanted ingredients can be hidden in any food, and it is difficult to find them all. The paper presents a digital health shopping assistant called “Diet Helper”, aimed to make life easier for such people by making it easy to determine whether a product is suitable for consumption, according to the specific dietary requirements of both types - existing diet and self-defined. This is achieved by capturing ingredient label, received by the app as an input, which the app analyses, converting the captured label to text, and filters out unwanted ingredients that according to the user should be avoided as either allergens or products to which the consumer is intolerant, helping the user decide if the product is suitable for consumption. This should make daily grocery shopping easier by providing the user with more accurate and simplified product selection in seconds, reducing the total time spent in the grocery stores, which is especially relevant in light of COVID-19, although it was and will remain out of it due to the busy schedules and active rhythm of life of modern society. The app is developed using the React Native framework and Google Firebase platform, which makes it easy to develop, use and extend such solutions thereby encouraging to start actively developing solutions that could improve wellbeing.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134029717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Human-in-the-loop into Swarm Learning for Decentralized Fake News Detection 将人在环集成到群学习中去中心化假新闻检测
Xishuang Dong, Lijun Qian
Social media has become an effective platform to generate and spread fake news that can mislead people and even distort public opinion. Centralized methods for fake news detection, however, cannot effectively protect user privacy during the process of centralized data collection for training models. Moreover, it cannot fully involve user feedback in the loop of learning detection models for further enhancing fake news detection. To overcome these challenges, this paper proposed a novel decentralized method, Human-in-the-loop Based Swarm Learning (HBSL), to integrate user feedback into the loop of learning and inference for recognizing fake news without violating user privacy in a decentralized manner. It consists of distributed nodes that are able to independently learn and detect fake news on local data. Furthermore, detection models trained on these nodes can be enhanced through decentralized model merging. Experimental results demonstrate that the proposed method outperforms the state-of-the-art decentralized method in regard of detecting fake news on a benchmark dataset.
社交媒体已经成为制造和传播虚假新闻的有效平台,这些新闻可以误导人们,甚至扭曲公众舆论。然而,在训练模型的集中数据收集过程中,集中式假新闻检测方法无法有效保护用户隐私。此外,无法将用户反馈充分纳入学习检测模型的循环中,以进一步增强假新闻检测。为了克服这些挑战,本文提出了一种新的去中心化方法——基于人在环的群体学习(Human-in-the-loop Based Swarm Learning, HBSL),将用户反馈整合到学习和推理的循环中,在不以去中心化的方式侵犯用户隐私的情况下识别假新闻。它由分布式节点组成,这些节点能够独立学习和检测本地数据上的假新闻。此外,在这些节点上训练的检测模型可以通过分散的模型合并来增强。实验结果表明,该方法在检测基准数据集上的假新闻方面优于最先进的分散方法。
{"title":"Integrating Human-in-the-loop into Swarm Learning for Decentralized Fake News Detection","authors":"Xishuang Dong, Lijun Qian","doi":"10.1109/IDSTA55301.2022.9923043","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923043","url":null,"abstract":"Social media has become an effective platform to generate and spread fake news that can mislead people and even distort public opinion. Centralized methods for fake news detection, however, cannot effectively protect user privacy during the process of centralized data collection for training models. Moreover, it cannot fully involve user feedback in the loop of learning detection models for further enhancing fake news detection. To overcome these challenges, this paper proposed a novel decentralized method, Human-in-the-loop Based Swarm Learning (HBSL), to integrate user feedback into the loop of learning and inference for recognizing fake news without violating user privacy in a decentralized manner. It consists of distributed nodes that are able to independently learn and detect fake news on local data. Furthermore, detection models trained on these nodes can be enhanced through decentralized model merging. Experimental results demonstrate that the proposed method outperforms the state-of-the-art decentralized method in regard of detecting fake news on a benchmark dataset.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128747814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance 电池性能非正态分布数据的树回归与多元线性回归评价
S. Chowdhury, Yuxiao Lin, Bor-Shuang Liaw, L. Kerby
Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine learning models. In this work, tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset with multicollinearity and compared. Several techniques are necessary, such as data transformation, to achieve a good multiple linear regression model with this dataset; the most useful techniques are discussed. With these techniques, the best multiple linear regression model achieved an $R^{2} = 81. 23%$ and exhibited no multicollinearity effect for the dataset used in this study. Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity. We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of $R^{2} =97.73%$. This study explains why tree-based regressions promise as a machine learning model for non-normally distributed, multicollinear data.
电池性能数据集通常是非正态和多重共线的。外推这些数据集用于模型预测需要注意这些特征。本研究探讨了数据正态性在构建机器学习模型中的影响。在这项工作中,基于树的回归模型和多元线性回归模型分别建立在具有多重共线性的高度倾斜的非正态数据集上,并进行了比较。使用该数据集实现良好的多元线性回归模型需要一些技术,如数据转换;讨论了最有用的技术。使用这些技术,最佳的多元线性回归模型达到$R^{2} = 81。23%$,在本研究中使用的数据集没有显示多重共线性效应。基于树的模型在这个数据集上表现更好,因为它们是非参数的,能够处理变量之间的复杂关系,并且不受多重共线性的影响。我们表明,套袋,在使用随机森林,减少过拟合。我们最好的基于树的模型达到了$R^{2} = 97.73% $的准确率。这项研究解释了为什么基于树的回归有望成为非正态分布、多重共线性数据的机器学习模型。
{"title":"Evaluation of Tree Based Regression over Multiple Linear Regression for Non-normally Distributed Data in Battery Performance","authors":"S. Chowdhury, Yuxiao Lin, Bor-Shuang Liaw, L. Kerby","doi":"10.1109/IDSTA55301.2022.9923169","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923169","url":null,"abstract":"Battery performance datasets are typically non-normal and multicollinear. Extrapolating such datasets for model predictions needs attention to such characteristics. This study explores the impact of data normality in building machine learning models. In this work, tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset with multicollinearity and compared. Several techniques are necessary, such as data transformation, to achieve a good multiple linear regression model with this dataset; the most useful techniques are discussed. With these techniques, the best multiple linear regression model achieved an $R^{2} = 81. 23%$ and exhibited no multicollinearity effect for the dataset used in this study. Tree-based models perform better on this dataset, as they are non-parametric, capable of handling complex relationships among variables and not affected by multicollinearity. We show that bagging, in the use of Random Forests, reduces overfitting. Our best tree-based model achieved accuracy of $R^{2} =97.73%$. This study explains why tree-based regressions promise as a machine learning model for non-normally distributed, multicollinear data.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Joint SCSP-LROM: A novel approach to detect Cerebrovascular Anomalies from EEG signals 联合SCSP-LROM:一种从脑电图信号中检测脑血管异常的新方法
Debojyoti Seth
Electroencephalography (EEG) gained popularity over similar modalities like Functional Magnetic Resonance Imaging (fMRI) or Functional Near-Infrared Spectroscopy (fNRIS), for being simplistic and non-invasive. One of the biggest challenges of any Brain Computer Interfacing (BCI) techniques, is recovering maximum information from minimal input channels for realistic predictions. To choose EEG channels with highest accuracy, a novel concept of introducing sparsity in a Convolutional Neural Network (CNN) induced modified Common Spatial Pattern (CSP) algorithm is introduced in this paper. This approach helps developing optimized confusion matrices, which can extensively label the feature map in significantly lower number of iterations, to predict trends of growth of symptoms. The concept of compressed sensing is utilized to develop an optimization model for recovering the cosparse signal and retaining maximum information. The state-of-the-art Joint Sparsity Induced Modified Common Spatial Pattern Algorithm and Low Rank Optimization Model (SCSP-LROM) can detect the stage and extent of growth of malignant cells, hemorrhages and lesions.
脑电图(EEG)因其简单和非侵入性而比功能磁共振成像(fMRI)或功能近红外光谱(fNRIS)等类似模式更受欢迎。任何脑机接口(BCI)技术的最大挑战之一,是从最小的输入通道中恢复最大的信息,以实现现实的预测。为了以最高的准确率选择脑电信号通道,本文在卷积神经网络(CNN)诱导的改进公共空间模式(CSP)算法中引入了稀疏性的新概念。这种方法有助于开发优化的混淆矩阵,它可以在显著减少的迭代次数中广泛标记特征映射,以预测症状的增长趋势。利用压缩感知的概念,建立了一种恢复共稀疏信号并保留最大信息的优化模型。最先进的联合稀疏诱导的改进公共空间模式算法和低秩优化模型(SCSP-LROM)可以检测恶性细胞、出血和病变的生长阶段和程度。
{"title":"Joint SCSP-LROM: A novel approach to detect Cerebrovascular Anomalies from EEG signals","authors":"Debojyoti Seth","doi":"10.1109/IDSTA55301.2022.9923032","DOIUrl":"https://doi.org/10.1109/IDSTA55301.2022.9923032","url":null,"abstract":"Electroencephalography (EEG) gained popularity over similar modalities like Functional Magnetic Resonance Imaging (fMRI) or Functional Near-Infrared Spectroscopy (fNRIS), for being simplistic and non-invasive. One of the biggest challenges of any Brain Computer Interfacing (BCI) techniques, is recovering maximum information from minimal input channels for realistic predictions. To choose EEG channels with highest accuracy, a novel concept of introducing sparsity in a Convolutional Neural Network (CNN) induced modified Common Spatial Pattern (CSP) algorithm is introduced in this paper. This approach helps developing optimized confusion matrices, which can extensively label the feature map in significantly lower number of iterations, to predict trends of growth of symptoms. The concept of compressed sensing is utilized to develop an optimization model for recovering the cosparse signal and retaining maximum information. The state-of-the-art Joint Sparsity Induced Modified Common Spatial Pattern Algorithm and Low Rank Optimization Model (SCSP-LROM) can detect the stage and extent of growth of malignant cells, hemorrhages and lesions.","PeriodicalId":268343,"journal":{"name":"2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134447633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 International Conference on Intelligent Data Science Technologies and Applications (IDSTA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1