Big Data and Cognitive Computing最新文献

英文中文

Data-Driven Short-Term Load Forecasting for Multiple Locations: An Integrated Approach 数据驱动的多地点短期负荷预测：综合方法

Big Data and Cognitive Computing

Pub Date : 2024-01-26 DOI: 10.3390/bdcc8020012

Anik Baul, Gobinda Chandra Sarker, Prokash Sikder, Utpal Mozumder, A. Abdelgawad

Short-term load forecasting (STLF) plays a crucial role in the planning, management, and stability of a country’s power system operation. In this study, we have developed a novel approach that can simultaneously predict the load demand of different regions in Bangladesh. When making predictions for loads from multiple locations simultaneously, the overall accuracy of the forecast can be improved by incorporating features from the various areas while reducing the complexity of using multiple models. Accurate and timely load predictions for specific regions with distinct demographics and economic characteristics can assist transmission and distribution companies in properly allocating their resources. Bangladesh, being a relatively small country, is divided into nine distinct power zones for electricity transmission across the nation. In this study, we have proposed a hybrid model, combining the Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU), designed to forecast load demand seven days ahead for each of the nine power zones simultaneously. For our study, nine years of data from a historical electricity demand dataset (from January 2014 to April 2023) are collected from the Power Grid Company of Bangladesh (PGCB) website. Considering the nonstationary characteristics of the dataset, the Interquartile Range (IQR) method and load averaging are employed to deal effectively with the outliers. Then, for more granularity, this data set has been augmented with interpolation at every 1 h interval. The proposed CNN-GRU model, trained on this augmented and refined dataset, is evaluated against established algorithms in the literature, including Long Short-Term Memory Networks (LSTM), GRU, CNN-LSTM, CNN-GRU, and Transformer-based algorithms. Compared to other approaches, the proposed technique demonstrated superior forecasting accuracy in terms of mean absolute performance error (MAPE) and root mean squared error (RMSE). The dataset and the source code are openly accessible to motivate further research.

短期负荷预测（STLF）在国家电力系统的规划、管理和稳定运行中发挥着至关重要的作用。在这项研究中，我们开发了一种新方法，可以同时预测孟加拉国不同地区的负荷需求。在同时对多个地点的负荷进行预测时，通过结合不同地区的特征，可以提高预测的整体准确性，同时降低使用多个模型的复杂性。对具有不同人口和经济特征的特定地区进行准确及时的负荷预测，有助于输配电公司合理分配资源。孟加拉国是一个相对较小的国家，全国电力传输分为九个不同的电力区。在这项研究中，我们提出了一个混合模型，结合了卷积神经网络 (CNN) 和门控递归单元 (GRU)，旨在同时预测九个电力区中每个电力区七天前的负荷需求。在研究中，我们从孟加拉国电网公司（PGCB）网站上收集了九年的历史电力需求数据集（2014 年 1 月至 2023 年 4 月）。考虑到数据集的非平稳特性，我们采用了四分位距（IQR）法和负荷平均法来有效处理异常值。然后，为了提高粒度，每隔 1 小时对该数据集进行插值。在这个经过增强和改进的数据集上训练的 CNN-GRU 模型，与文献中已有的算法进行了对比评估，包括长短期记忆网络 (LSTM)、GRU、CNN-LSTM、CNN-GRU 和基于变换器的算法。与其他方法相比，所提出的技术在平均绝对性能误差（MAPE）和均方根误差（RMSE）方面表现出更高的预测准确性。数据集和源代码可公开访问，以促进进一步研究。

{"title":"Data-Driven Short-Term Load Forecasting for Multiple Locations: An Integrated Approach","authors":"Anik Baul, Gobinda Chandra Sarker, Prokash Sikder, Utpal Mozumder, A. Abdelgawad","doi":"10.3390/bdcc8020012","DOIUrl":"https://doi.org/10.3390/bdcc8020012","url":null,"abstract":"Short-term load forecasting (STLF) plays a crucial role in the planning, management, and stability of a country’s power system operation. In this study, we have developed a novel approach that can simultaneously predict the load demand of different regions in Bangladesh. When making predictions for loads from multiple locations simultaneously, the overall accuracy of the forecast can be improved by incorporating features from the various areas while reducing the complexity of using multiple models. Accurate and timely load predictions for specific regions with distinct demographics and economic characteristics can assist transmission and distribution companies in properly allocating their resources. Bangladesh, being a relatively small country, is divided into nine distinct power zones for electricity transmission across the nation. In this study, we have proposed a hybrid model, combining the Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU), designed to forecast load demand seven days ahead for each of the nine power zones simultaneously. For our study, nine years of data from a historical electricity demand dataset (from January 2014 to April 2023) are collected from the Power Grid Company of Bangladesh (PGCB) website. Considering the nonstationary characteristics of the dataset, the Interquartile Range (IQR) method and load averaging are employed to deal effectively with the outliers. Then, for more granularity, this data set has been augmented with interpolation at every 1 h interval. The proposed CNN-GRU model, trained on this augmented and refined dataset, is evaluated against established algorithms in the literature, including Long Short-Term Memory Networks (LSTM), GRU, CNN-LSTM, CNN-GRU, and Transformer-based algorithms. Compared to other approaches, the proposed technique demonstrated superior forecasting accuracy in terms of mean absolute performance error (MAPE) and root mean squared error (RMSE). The dataset and the source code are openly accessible to motivate further research.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"36 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139595275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI-Based User Empowerment for Empirical Social Research 基于人工智能的用户赋权实证社会研究

Big Data and Cognitive Computing

Pub Date : 2024-01-23 DOI: 10.3390/bdcc8020011

Thoralf Reis, Lukas Dumberger, Sebastian Bruchhaus, Thomas Krause, Verena Schreyer, M. X. Bornschlegl, Matthias L. Hemmje

Manual labeling and categorization are extremely time-consuming and, thus, costly. AI and ML-supported information systems can bridge this gap and support labor-intensive digital activities. Since it requires categorization, coding-based analysis, such as qualitative content analysis, reaches its limits with large amounts of data and could benefit from AI and ML-based support. Empirical social research, its application domain, benefits from Big Data’s ability to create more extensive human behavior and development models. A range of applications are available for statistical analysis to serve this purpose. This paper aims to implement an information system that supports researchers in empirical social research in performing AI-supported qualitative content analysis. AI2VIS4BigData is a reference model that standardizes use cases and artifacts for Big Data information systems that integrate AI and ML for user empowerment. Thus, this work’s concepts and implementations try to achieve an AI2VIS4BigData-compliant information system that supports social researchers in categorizing text data and creating insightful dashboards. Thereby, the text categorization is based on an existing ML component. Furthermore, it presents two evaluations that were conducted for these concepts and implementations: a qualitative cognitive walkthrough assessing the system’s usability and a quantitative user study with 18 participants revealed that though the users perceive AI support as more efficient, they need more time to reflect on the recommendations. The research revealed that AI support increased the correctness of the users’ categorizations but also slowed down their decision-making. The assumption that this is due to the UI design and additional information for processing requires follow-up research.

人工标注和分类非常耗时，因此成本也很高。人工智能和 ML 支持的信息系统可以弥补这一差距，支持劳动密集型数字活动。由于需要分类，基于编码的分析（如定性内容分析）在处理大量数据时会达到极限，因此可以从基于人工智能和 ML 的支持中获益。社会实证研究作为其应用领域，可受益于大数据创建更广泛的人类行为和发展模型的能力。有一系列统计分析应用程序可用于这一目的。本文旨在实施一个信息系统，以支持实证社会研究领域的研究人员进行人工智能支持的定性内容分析。AI2VIS4BigData 是一个参考模型，它对大数据信息系统的用例和工件进行了标准化，这些用例和工件集成了人工智能和 ML，以增强用户能力。因此，本作品的概念和实施试图实现一个符合 AI2VIS4BigData 标准的信息系统，以支持社会研究人员对文本数据进行分类并创建具有洞察力的仪表板。因此，文本分类基于现有的 ML 组件。此外，报告还介绍了针对这些概念和实施进行的两项评估：一项是评估系统可用性的定性认知演练，另一项是有 18 名参与者参加的定量用户研究，结果显示，虽然用户认为人工智能支持更高效，但他们需要更多时间来思考建议。研究表明，人工智能支持提高了用户分类的正确性，但也减缓了他们的决策速度。假设这是由于用户界面设计和需要处理的额外信息造成的，则需要进行后续研究。

{"title":"AI-Based User Empowerment for Empirical Social Research","authors":"Thoralf Reis, Lukas Dumberger, Sebastian Bruchhaus, Thomas Krause, Verena Schreyer, M. X. Bornschlegl, Matthias L. Hemmje","doi":"10.3390/bdcc8020011","DOIUrl":"https://doi.org/10.3390/bdcc8020011","url":null,"abstract":"Manual labeling and categorization are extremely time-consuming and, thus, costly. AI and ML-supported information systems can bridge this gap and support labor-intensive digital activities. Since it requires categorization, coding-based analysis, such as qualitative content analysis, reaches its limits with large amounts of data and could benefit from AI and ML-based support. Empirical social research, its application domain, benefits from Big Data’s ability to create more extensive human behavior and development models. A range of applications are available for statistical analysis to serve this purpose. This paper aims to implement an information system that supports researchers in empirical social research in performing AI-supported qualitative content analysis. AI2VIS4BigData is a reference model that standardizes use cases and artifacts for Big Data information systems that integrate AI and ML for user empowerment. Thus, this work’s concepts and implementations try to achieve an AI2VIS4BigData-compliant information system that supports social researchers in categorizing text data and creating insightful dashboards. Thereby, the text categorization is based on an existing ML component. Furthermore, it presents two evaluations that were conducted for these concepts and implementations: a qualitative cognitive walkthrough assessing the system’s usability and a quantitative user study with 18 participants revealed that though the users perceive AI support as more efficient, they need more time to reflect on the recommendations. The research revealed that AI support increased the correctness of the users’ categorizations but also slowed down their decision-making. The assumption that this is due to the UI design and additional information for processing requires follow-up research.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139602938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quality and Security of Critical Infrastructure Systems 关键基础设施系统的质量和安全

Big Data and Cognitive Computing

Pub Date : 2024-01-22 DOI: 10.3390/bdcc8010010

I. Izonin, Tetiana Hovorushchenko, Shishir K. Shandilya

The amount of information is constantly growing, and thus, the issue of information security is becoming more acute [...]

信息量在不断增长，因此信息安全问题也变得日益突出 [...]

引用次数: 0

Deep Learning and YOLOv8 Utilized in an Accurate Face Mask Detection System 深度学习和 YOLOv8 在精确人脸面具检测系统中的应用

Big Data and Cognitive Computing

Pub Date : 2024-01-16 DOI: 10.3390/bdcc8010009

Christine Dewi, Danny Manongga, Hendry, Evangs Mailoa, K. Hartomo

Face mask detection is a technological application that employs computer vision methodologies to ascertain the presence or absence of a face mask on an individual depicted in an image or video. This technology gained significant attention and adoption during the COVID-19 pandemic, as wearing face masks became an important measure to prevent the spread of the virus. Face mask detection helps to enforce mask-wearing guidelines, which can significantly reduce the spread of respiratory illnesses, including COVID-19. Wearing masks in densely populated areas provides individuals with protection and hinders the spread of airborne particles that transmit viruses. The application of deep learning models in object recognition has shown significant progress, leading to promising outcomes in the identification and localization of objects within images. The primary aim of this study is to annotate and classify face mask entities depicted in authentic images. To mitigate the spread of COVID-19 within public settings, individuals can employ the use of face masks created from materials specifically designed for medical purposes. This study utilizes YOLOv8, a state-of-the-art object detection algorithm, to accurately detect and identify face masks. To analyze this study, we conducted an experiment in which we combined the Face Mask Dataset (FMD) and the Medical Mask Dataset (MMD) into a single dataset. The detection performance of an earlier research study using the FMD and MMD was improved by the suggested model to a “Good” level of 99.1%, up from 98.6%. Our study demonstrates that the model scheme we have provided is a reliable method for detecting faces that are obscured by medical masks. Additionally, after the completion of the study, a comparative analysis was conducted to examine the findings in conjunction with those of related research. The proposed detector demonstrated superior performance compared to previous research in terms of both accuracy and precision.

口罩检测是一种技术应用，它利用计算机视觉方法来确定图像或视频中描述的个人是否戴有口罩。在 COVID-19 大流行期间，佩戴口罩成为防止病毒传播的一项重要措施，因此这项技术得到了广泛关注和采用。口罩检测有助于执行口罩佩戴指南，从而大大减少包括 COVID-19 在内的呼吸道疾病的传播。在人口稠密地区佩戴口罩可为个人提供保护，并阻止传播病毒的空气传播颗粒的扩散。深度学习模型在物体识别中的应用已取得重大进展，在识别和定位图像中的物体方面取得了可喜的成果。本研究的主要目的是对真实图像中描述的人脸面具实体进行注释和分类。为了减少 COVID-19 在公共场合的传播，个人可以使用由医疗专用材料制成的口罩。本研究利用最先进的物体检测算法 YOLOv8 来准确检测和识别人脸面具。为了分析这项研究，我们进行了一项实验，将人脸面具数据集（FMD）和医疗面具数据集（MMD）合并为一个数据集。通过建议的模型，早先使用 FMD 和 MMD 进行的一项研究的检测性能从 98.6% 提高到了 99.1% 的 "良好 "水平。我们的研究表明，我们提供的模型方案是检测被医用面罩遮挡的人脸的可靠方法。此外，研究完成后，我们还进行了对比分析，将研究结果与相关研究结果结合起来进行检验。与之前的研究相比，所提出的检测器在准确度和精确度方面都表现出了卓越的性能。

{"title":"Deep Learning and YOLOv8 Utilized in an Accurate Face Mask Detection System","authors":"Christine Dewi, Danny Manongga, Hendry, Evangs Mailoa, K. Hartomo","doi":"10.3390/bdcc8010009","DOIUrl":"https://doi.org/10.3390/bdcc8010009","url":null,"abstract":"Face mask detection is a technological application that employs computer vision methodologies to ascertain the presence or absence of a face mask on an individual depicted in an image or video. This technology gained significant attention and adoption during the COVID-19 pandemic, as wearing face masks became an important measure to prevent the spread of the virus. Face mask detection helps to enforce mask-wearing guidelines, which can significantly reduce the spread of respiratory illnesses, including COVID-19. Wearing masks in densely populated areas provides individuals with protection and hinders the spread of airborne particles that transmit viruses. The application of deep learning models in object recognition has shown significant progress, leading to promising outcomes in the identification and localization of objects within images. The primary aim of this study is to annotate and classify face mask entities depicted in authentic images. To mitigate the spread of COVID-19 within public settings, individuals can employ the use of face masks created from materials specifically designed for medical purposes. This study utilizes YOLOv8, a state-of-the-art object detection algorithm, to accurately detect and identify face masks. To analyze this study, we conducted an experiment in which we combined the Face Mask Dataset (FMD) and the Medical Mask Dataset (MMD) into a single dataset. The detection performance of an earlier research study using the FMD and MMD was improved by the suggested model to a “Good” level of 99.1%, up from 98.6%. Our study demonstrates that the model scheme we have provided is a reliable method for detecting faces that are obscured by medical masks. Additionally, after the completion of the study, a comparative analysis was conducted to examine the findings in conjunction with those of related research. The proposed detector demonstrated superior performance compared to previous research in terms of both accuracy and precision.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"52 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139527599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW 评估深度学习模型应对对抗性攻击的鲁棒性：利用 FGSM、PGD 和 CW 进行分析

Big Data and Cognitive Computing

Pub Date : 2024-01-16 DOI: 10.3390/bdcc8010008

W. Villegas-Ch., Ángel Jaramillo-Alcázar, Sergio Luján-Mora

This study evaluated the generation of adversarial examples and the subsequent robustness of an image classification model. The attacks were performed using the Fast Gradient Sign method, the Projected Gradient Descent method, and the Carlini and Wagner attack to perturb the original images and analyze their impact on the model’s classification accuracy. Additionally, image manipulation techniques were investigated as defensive measures against adversarial attacks. The results highlighted the model’s vulnerability to conflicting examples: the Fast Gradient Signed Method effectively altered the original classifications, while the Carlini and Wagner method proved less effective. Promising approaches such as noise reduction, image compression, and Gaussian blurring were presented as effective countermeasures. These findings underscore the importance of addressing the vulnerability of machine learning models and the need to develop robust defenses against adversarial examples. This article emphasizes the urgency of addressing the threat posed by harmful standards in machine learning models, highlighting the relevance of implementing effective countermeasures and image manipulation techniques to mitigate the effects of adversarial attacks. These efforts are crucial to safeguarding model integrity and trust in an environment marked by constantly evolving hostile threats. An average 25% decrease in accuracy was observed for the VGG16 model when exposed to the Fast Gradient Signed Method and Projected Gradient Descent attacks, and an even more significant 35% decrease with the Carlini and Wagner method.

本研究评估了对抗性示例的生成以及图像分类模型随后的鲁棒性。使用快速梯度符号法、投影梯度下降法以及 Carlini 和 Wagner 攻击法对原始图像进行扰动，并分析其对模型分类准确性的影响。此外，还研究了图像处理技术，作为抵御对抗性攻击的防御措施。结果凸显了该模型在冲突实例面前的脆弱性：快速梯度符号法有效地改变了原始分类，而卡利尼和瓦格纳法的效果较差。降噪、图像压缩和高斯模糊等有前途的方法被认为是有效的对策。这些发现强调了解决机器学习模型脆弱性问题的重要性，以及开发针对对抗性示例的强大防御措施的必要性。本文强调了解决机器学习模型中有害标准所带来的威胁的紧迫性，突出了实施有效的应对措施和图像处理技术以减轻对抗性攻击影响的相关性。在敌意威胁不断演变的环境中，这些努力对于保障模型的完整性和信任度至关重要。在受到快速梯度符号法和投射梯度下降法攻击时，VGG16 模型的准确率平均下降了 25%，而在受到 Carlini 和 Wagner 方法攻击时，准确率更是显著下降了 35%。

{"title":"Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW","authors":"W. Villegas-Ch., Ángel Jaramillo-Alcázar, Sergio Luján-Mora","doi":"10.3390/bdcc8010008","DOIUrl":"https://doi.org/10.3390/bdcc8010008","url":null,"abstract":"This study evaluated the generation of adversarial examples and the subsequent robustness of an image classification model. The attacks were performed using the Fast Gradient Sign method, the Projected Gradient Descent method, and the Carlini and Wagner attack to perturb the original images and analyze their impact on the model’s classification accuracy. Additionally, image manipulation techniques were investigated as defensive measures against adversarial attacks. The results highlighted the model’s vulnerability to conflicting examples: the Fast Gradient Signed Method effectively altered the original classifications, while the Carlini and Wagner method proved less effective. Promising approaches such as noise reduction, image compression, and Gaussian blurring were presented as effective countermeasures. These findings underscore the importance of addressing the vulnerability of machine learning models and the need to develop robust defenses against adversarial examples. This article emphasizes the urgency of addressing the threat posed by harmful standards in machine learning models, highlighting the relevance of implementing effective countermeasures and image manipulation techniques to mitigate the effects of adversarial attacks. These efforts are crucial to safeguarding model integrity and trust in an environment marked by constantly evolving hostile threats. An average 25% decrease in accuracy was observed for the VGG16 model when exposed to the Fast Gradient Signed Method and Projected Gradient Descent attacks, and an even more significant 35% decrease with the Carlini and Wagner method.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":" 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139620212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach 加强信用卡欺诈检测：一种集合机器学习方法

Big Data and Cognitive Computing

Pub Date : 2024-01-03 DOI: 10.3390/bdcc8010006

Abdul Rehman Khalid, Nsikak Owoh, O. Uthmani, Moses Ashawa, Jude Osamor, John Adejoh

In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. This paper delves into the application of machine learning models, specifically focusing on ensemble methods, to enhance credit card fraud detection. Through an extensive review of existing literature, we identified limitations in current fraud detection technologies, including issues like data imbalance, concept drift, false positives/negatives, limited generalisability, and challenges in real-time processing. To address some of these shortcomings, we propose a novel ensemble model that integrates a Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Bagging, and Boosting classifiers. This ensemble model tackles the dataset imbalance problem associated with most credit card datasets by implementing under-sampling and the Synthetic Over-sampling Technique (SMOTE) on some machine learning algorithms. The evaluation of the model utilises a dataset comprising transaction records from European credit card holders, providing a realistic scenario for assessment. The methodology of the proposed model encompasses data pre-processing, feature engineering, model selection, and evaluation, with Google Colab computational capabilities facilitating efficient model training and testing. Comparative analysis between the proposed ensemble model, traditional machine learning methods, and individual classifiers reveals the superior performance of the ensemble in mitigating challenges associated with credit card fraud detection. Across accuracy, precision, recall, and F1-score metrics, the ensemble outperforms existing models. This paper underscores the efficacy of ensemble methods as a valuable tool in the battle against fraudulent transactions. The findings presented lay the groundwork for future advancements in the development of more resilient and adaptive fraud detection systems, which will become crucial as credit card fraud techniques continue to evolve.

在数字技术不断进步的时代，信用卡欺诈行为不断升级，因此有必要开发稳健高效的欺诈检测系统。本文深入探讨了机器学习模型的应用，尤其侧重于集合方法，以加强信用卡欺诈检测。通过广泛查阅现有文献，我们发现了当前欺诈检测技术的局限性，包括数据不平衡、概念漂移、误报/负值、有限的通用性以及实时处理方面的挑战等问题。为了解决其中的一些不足，我们提出了一种新颖的集合模型，该模型集成了支持向量机（SVM）、K-近邻（KNN）、随机森林（RF）、Bagging 和 Boosting 分类器。该集合模型通过在一些机器学习算法上实施欠采样和合成过度采样技术（SMOTE），解决了与大多数信用卡数据集相关的数据集不平衡问题。对模型的评估使用了一个由欧洲信用卡持卡人交易记录组成的数据集，为评估提供了一个真实的场景。拟议模型的方法包括数据预处理、特征工程、模型选择和评估，Google Colab 的计算能力为高效的模型训练和测试提供了便利。建议的集合模型、传统机器学习方法和单个分类器之间的比较分析表明，集合模型在减轻与信用卡欺诈检测相关的挑战方面表现出色。在准确度、精确度、召回率和 F1 分数等指标上，集合模型都优于现有模型。本文强调了集合方法作为打击欺诈交易的重要工具的功效。随着信用卡欺诈技术的不断发展，这些发现为未来开发更具弹性和适应性的欺诈检测系统奠定了基础。

{"title":"Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach","authors":"Abdul Rehman Khalid, Nsikak Owoh, O. Uthmani, Moses Ashawa, Jude Osamor, John Adejoh","doi":"10.3390/bdcc8010006","DOIUrl":"https://doi.org/10.3390/bdcc8010006","url":null,"abstract":"In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. This paper delves into the application of machine learning models, specifically focusing on ensemble methods, to enhance credit card fraud detection. Through an extensive review of existing literature, we identified limitations in current fraud detection technologies, including issues like data imbalance, concept drift, false positives/negatives, limited generalisability, and challenges in real-time processing. To address some of these shortcomings, we propose a novel ensemble model that integrates a Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Bagging, and Boosting classifiers. This ensemble model tackles the dataset imbalance problem associated with most credit card datasets by implementing under-sampling and the Synthetic Over-sampling Technique (SMOTE) on some machine learning algorithms. The evaluation of the model utilises a dataset comprising transaction records from European credit card holders, providing a realistic scenario for assessment. The methodology of the proposed model encompasses data pre-processing, feature engineering, model selection, and evaluation, with Google Colab computational capabilities facilitating efficient model training and testing. Comparative analysis between the proposed ensemble model, traditional machine learning methods, and individual classifiers reveals the superior performance of the ensemble in mitigating challenges associated with credit card fraud detection. Across accuracy, precision, recall, and F1-score metrics, the ensemble outperforms existing models. This paper underscores the efficacy of ensemble methods as a valuable tool in the battle against fraudulent transactions. The findings presented lay the groundwork for future advancements in the development of more resilient and adaptive fraud detection systems, which will become crucial as credit card fraud techniques continue to evolve.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"57 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139451555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unveiling Sentiments: A Comprehensive Analysis of Arabic Hajj-Related Tweets from 2017–2022 Utilizing Advanced AI Models 揭示情感：利用先进的人工智能模型全面分析 2017-2022 年阿拉伯语朝觐相关推文

Big Data and Cognitive Computing

Pub Date : 2024-01-02 DOI: 10.3390/bdcc8010005

Hanan M. Alghamdi

Sentiment analysis plays a crucial role in understanding public opinion and social media trends. It involves analyzing the emotional tone and polarity of a given text. When applied to Arabic text, this task becomes particularly challenging due to the language’s complex morphology, right-to-left script, and intricate nuances in expressing emotions. Social media has emerged as a powerful platform for individuals to express their sentiments, especially regarding religious and cultural events. Consequently, studying sentiment analysis in the context of Hajj has become a captivating subject. This research paper presents a comprehensive sentiment analysis of tweets discussing the annual Hajj pilgrimage over a six-year period. By employing a combination of machine learning and deep learning models, this study successfully conducted sentiment analysis on a sizable dataset consisting of Arabic tweets. The process involves pre-processing, feature extraction, and sentiment classification. The objective was to uncover the prevailing sentiments associated with Hajj over different years, before, during, and after each Hajj event. Importantly, the results presented in this study highlight that BERT, an advanced transformer-based model, outperformed other models in accurately classifying sentiment. This underscores its effectiveness in capturing the complexities inherent in Arabic text.

情感分析在了解公众舆论和社交媒体趋势方面起着至关重要的作用。它涉及分析特定文本的情感基调和极性。当应用到阿拉伯语文本时，由于该语言的复杂形态、从右到左的文字以及表达情感时错综复杂的细微差别，这项任务变得尤其具有挑战性。社交媒体已成为个人表达情感的强大平台，尤其是针对宗教和文化事件。因此，研究朝觐背景下的情感分析已成为一个引人入胜的课题。本研究论文对六年来讨论年度朝觐的推文进行了全面的情感分析。通过结合使用机器学习和深度学习模型，本研究成功地对由阿拉伯语推文组成的大量数据集进行了情感分析。这一过程包括预处理、特征提取和情感分类。目的是揭示不同年份、每次朝觐活动之前、期间和之后与朝觐相关的普遍情绪。重要的是，本研究的结果表明，基于转换器的高级模型 BERT 在准确进行情感分类方面优于其他模型。这凸显了它在捕捉阿拉伯语文本固有的复杂性方面的有效性。

{"title":"Unveiling Sentiments: A Comprehensive Analysis of Arabic Hajj-Related Tweets from 2017–2022 Utilizing Advanced AI Models","authors":"Hanan M. Alghamdi","doi":"10.3390/bdcc8010005","DOIUrl":"https://doi.org/10.3390/bdcc8010005","url":null,"abstract":"Sentiment analysis plays a crucial role in understanding public opinion and social media trends. It involves analyzing the emotional tone and polarity of a given text. When applied to Arabic text, this task becomes particularly challenging due to the language’s complex morphology, right-to-left script, and intricate nuances in expressing emotions. Social media has emerged as a powerful platform for individuals to express their sentiments, especially regarding religious and cultural events. Consequently, studying sentiment analysis in the context of Hajj has become a captivating subject. This research paper presents a comprehensive sentiment analysis of tweets discussing the annual Hajj pilgrimage over a six-year period. By employing a combination of machine learning and deep learning models, this study successfully conducted sentiment analysis on a sizable dataset consisting of Arabic tweets. The process involves pre-processing, feature extraction, and sentiment classification. The objective was to uncover the prevailing sentiments associated with Hajj over different years, before, during, and after each Hajj event. Importantly, the results presented in this study highlight that BERT, an advanced transformer-based model, outperformed other models in accurately classifying sentiment. This underscores its effectiveness in capturing the complexities inherent in Arabic text.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":"70 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139452708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Big Data and Cognitive Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀