首页 > 最新文献

Egyptian Informatics Journal最新文献

英文 中文
An accurate similarity-based model for movie rating prediction and recommendation using an uncertainty score 一个精确的基于相似性的模型,用于电影评级预测和推荐,使用不确定性评分
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-20 DOI: 10.1016/j.eij.2025.100860
Youssef Hanyf , Hassan Silkan , Abdellatif Dahmouni , Abdelkaher Ait Abdelouahad
This paper presents a novel method for movie rating prediction and recommendation systems based on similarity between movies with an uncertainty score to control prediction confidence. The two traditional recommendation approaches, namely collaborative filtering and content-based, rely on the concept of similarity between movies and users. Although similarity plays a crucial role in recommendation systems, it has not been sufficiently explored in existing research. To bridge this gap, we propose a dissimilarity function for movies based on a thorough analysis of movie features. We also introduce an uncertainty score that quantifies the confidence in predictions based on the dissimilarity between the unseen movie and the nearest rated movie. The proposed method uses the uncertainty score for two purposes. First, it adjusts the predicted rating by shifting it toward the user’s mean rating when the uncertainty exceeds a predefined threshold. Second, it prioritizes recommendations based on the uncertainty score, allowing the system to recommend only movies with high prediction certainty. The experimental results show that the proposed method is significantly accurate at lower uncertainty thresholds (≤12%). Furthermore, the method also performs well in top-K movie recommendations, providing consistent performance regardless of the number of recommended movies when uncertainty is low. The proposed method is also compared with state-of-the-art machine learning models, such as Support Vector Machine Regression, Random Forest Regressor, and Gradient Boosting Regressor. The comparison shows that our approach outperforms these models at low uncertainty levels and provides more reliable and accurate recommendations.
本文提出了一种基于不确定评分的电影之间的相似性来控制预测置信度的电影评级预测和推荐系统的新方法。两种传统的推荐方法,即协同过滤和基于内容的推荐,都依赖于电影和用户之间的相似性概念。虽然相似度在推荐系统中起着至关重要的作用,但在现有的研究中还没有得到充分的探讨。为了弥补这一差距,我们在深入分析电影特征的基础上提出了电影的不相似函数。我们还引入了一个不确定性分数,它量化了基于未看过的电影和最近评级的电影之间的不相似性的预测的信心。提出的方法使用不确定性评分有两个目的。首先,当不确定性超过预定义的阈值时,它通过将其移向用户的平均评级来调整预测评级。其次,它根据不确定性评分对推荐进行优先级排序,允许系统只推荐具有高预测确定性的电影。实验结果表明,该方法在较低的不确定度阈值(≤12%)下具有显著的准确性。此外,该方法在top-K电影推荐中也表现良好,在不确定性较低的情况下,无论推荐电影的数量如何,都能提供一致的性能。该方法还与最先进的机器学习模型(如支持向量机回归、随机森林回归和梯度增强回归)进行了比较。比较表明,我们的方法在低不确定性水平下优于这些模型,并提供更可靠和准确的建议。
{"title":"An accurate similarity-based model for movie rating prediction and recommendation using an uncertainty score","authors":"Youssef Hanyf ,&nbsp;Hassan Silkan ,&nbsp;Abdellatif Dahmouni ,&nbsp;Abdelkaher Ait Abdelouahad","doi":"10.1016/j.eij.2025.100860","DOIUrl":"10.1016/j.eij.2025.100860","url":null,"abstract":"<div><div>This paper presents a novel method for movie rating prediction and recommendation systems based on similarity between movies with an uncertainty score to control prediction confidence. The two traditional recommendation approaches, namely collaborative filtering and content-based, rely on the concept of similarity between movies and users. Although similarity plays a crucial role in recommendation systems, it has not been sufficiently explored in existing research. To bridge this gap, we propose a dissimilarity function for movies based on a thorough analysis of movie features. We also introduce an uncertainty score that quantifies the confidence in predictions based on the dissimilarity between the unseen movie and the nearest rated movie. The proposed method uses the uncertainty score for two purposes. First, it adjusts the predicted rating by shifting it toward the user’s mean rating when the uncertainty exceeds a predefined threshold. Second, it prioritizes recommendations based on the uncertainty score, allowing the system to recommend only movies with high prediction certainty. The experimental results show that the proposed method is significantly accurate at lower uncertainty thresholds (≤12%). Furthermore, the method also performs well in top-K movie recommendations, providing consistent performance regardless of the number of recommended movies when uncertainty is low. The proposed method is also compared with state-of-the-art machine learning models, such as Support Vector Machine Regression, Random Forest Regressor, and Gradient Boosting Regressor. The comparison shows that our approach outperforms these models at low uncertainty levels and provides more reliable and accurate recommendations.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100860"},"PeriodicalIF":4.3,"publicationDate":"2025-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Channel-attentive YOLOv5 and capsule auto-encoder for pomegranate disease detection 通道关注型YOLOv5和胶囊型石榴病害检测编码器
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-19 DOI: 10.1016/j.eij.2025.100877
P. Sajitha , A. Diana Andrushia , N. Anand , Eva Lubloy
Fruits are the most vital items of global diets because of their rich nutritional value, thereby providing very high demand and agricultural revenues to the economy. Among the fruit crops, pomegranate is a valuable one due to its highest antioxidant potential. However, most crops of pomegranate suffer from diseases, which greatly reduce agricultural yield and productivity. Thus, along with the increasing demand of the fruit, early detection as well as classification of diseases will prove very crucial in boosting the yield and taking appropriate measures for prevention. We propose a segmentation-based model using deep learning in this paper to conduct disease identification in pomegranates The process begins with pre-processing images that is primarily an activity of cropping and resizing of the images, followed by enhanced Wiener filtering, which eliminates noise and enhances the clarity of the images The preprocessed images are then further segmented using a CA_YV5GC algorithm, (Channel Attentive YOLOv5-based Grab Cut), which isolates diseased regions from the images. Then the optimized ResNet-152 network is applied to acquire the fundamental features embedding the texture along with the shape characteristics which could identify ailments related symptoms. Coati Optimization is applied to choose the most dominant features in the lower dimensional representation of the extracted information for the classification of the disease. Ultimately, classification is performed using a Deep Capsule Canonical Auto-encoder (DC_CAENet) to classify the disease type with higher accuracy. Adaptive Osprey Optimization is used to optimize the parameters of the model. The existing methods are compared with that results proved this technique to be more accurate and efficient as compared to traditional techniques.
水果是全球饮食中最重要的食物,因为它们具有丰富的营养价值,从而为经济提供了非常高的需求和农业收入。在水果作物中,石榴因其最高的抗氧化潜力而成为一种有价值的作物。然而,石榴的大部分作物都遭受病害,这大大降低了农业产量和生产力。因此,随着水果需求量的增加,病害的早期发现和分类对于提高产量和采取适当的预防措施至关重要。本文提出了一种基于深度学习的基于分割的模型来进行石榴病害识别。该过程首先对图像进行预处理,主要是对图像进行裁剪和调整大小,然后进行增强的维纳滤波,消除噪声并提高图像的清晰度,然后使用CA_YV5GC算法对预处理后的图像进行进一步分割,即基于信道关注的yolov5的Grab Cut算法。从图像中分离出患病区域。然后利用优化后的ResNet-152网络获取嵌入纹理和形状特征的基本特征,从而识别疾病相关症状。应用Coati优化在提取的信息的低维表示中选择最显著的特征用于疾病分类。最终,使用深度胶囊规范自编码器(DC_CAENet)进行分类,以更高的准确率对疾病类型进行分类。采用自适应鱼鹰优化算法对模型参数进行优化。与现有方法进行了比较,结果表明,与传统方法相比,该方法具有更高的精度和效率。
{"title":"Channel-attentive YOLOv5 and capsule auto-encoder for pomegranate disease detection","authors":"P. Sajitha ,&nbsp;A. Diana Andrushia ,&nbsp;N. Anand ,&nbsp;Eva Lubloy","doi":"10.1016/j.eij.2025.100877","DOIUrl":"10.1016/j.eij.2025.100877","url":null,"abstract":"<div><div>Fruits are the most vital items of global diets because of their rich nutritional value, thereby providing very high demand and agricultural revenues to the economy. Among the fruit crops, pomegranate is a valuable one due to its highest antioxidant potential. However, most crops of pomegranate suffer from diseases, which greatly reduce agricultural yield and productivity. Thus, along with the increasing demand of the fruit, early detection as well as classification of diseases will prove very crucial in boosting the yield and taking appropriate measures for prevention. We propose a segmentation-based model using deep learning in this paper to conduct disease identification in pomegranates The process begins with pre-processing images that is primarily an activity of cropping and resizing of the images, followed by enhanced Wiener filtering, which eliminates noise and enhances the clarity of the images The preprocessed images are then further segmented using a CA_YV5GC algorithm, (Channel Attentive YOLOv5-based Grab Cut), which isolates diseased regions from the images. Then the optimized ResNet-152 network is applied to acquire the fundamental features embedding the texture along with the shape characteristics which could identify ailments related symptoms. Coati Optimization is applied to choose the most dominant features in the lower dimensional representation of the extracted information for the classification of the disease. Ultimately, classification is performed using a Deep Capsule Canonical Auto-encoder (DC_CAENet) to classify the disease type with higher accuracy. Adaptive Osprey Optimization is used to optimize the parameters of the model. The existing methods are compared with that results proved this technique to be more accurate and efficient as compared to traditional techniques.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100877"},"PeriodicalIF":4.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study on front vehicle collision warning method based on lightweight YOLOv8 and DeepSort 基于轻量级YOLOv8和DeepSort的前车碰撞预警方法研究
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-19 DOI: 10.1016/j.eij.2025.100861
Wenyu Zhang , Yajing Li , Jiaxuan Hu , Ning Wang
With the continuous increase in vehicle ownership, the frequency of traffic accidents has risen significantly, and higher demands have consequently been placed on active vehicle safety technologies. To address the challenges of insufficient real-time performance and high model complexity in traditional object detection methods under complex traffic conditions, an improved front-vehicle collision warning system has been proposed by integrating YOLOv8 and DeepSort. In this approach, the original YOLOv8 backbone network is replaced by the lightweight MobileNet V4, and the Convolutional Block Attention Module (CBAM) is incorporated to enhance feature extraction capabilities. A comprehensive algorithmic framework has been constructed, integrating multi-object recognition, front-vehicle distance estimation, ego-vehicle speed calculation, and hierarchical warning level output. Experimental results on the KITTI dataset have demonstrated a detection accuracy of 95.5 % and a total detection time of 2.6 ms per frame. Additionally, a 2.6 % improvement in mAP50–95 has been observed, accompanied by only a 0.1 % decrease in the recall rate. These findings suggest that the proposed method provides effective technical support for front-vehicle collision warning in intelligent transportation environments.
随着机动车保有量的不断增加,交通事故的发生频率显著上升,对车辆主动安全技术提出了更高的要求。针对传统目标检测方法在复杂交通条件下实时性不足、模型复杂度高的问题,将YOLOv8与DeepSort相结合,提出了一种改进的前车碰撞预警系统。在这种方法中,原始的YOLOv8骨干网络被轻量级的MobileNet V4取代,并加入卷积块注意模块(CBAM)来增强特征提取能力。构建了集多目标识别、前车距离估计、自车速度计算、预警等级输出于一体的综合算法框架。在KITTI数据集上的实验结果表明,检测准确率为95.5%,总检测时间为2.6 ms /帧。此外,观察到mAP50-95有2.6%的改善,同时召回率仅下降0.1%。研究结果表明,该方法为智能交通环境下的前车碰撞预警提供了有效的技术支持。
{"title":"A study on front vehicle collision warning method based on lightweight YOLOv8 and DeepSort","authors":"Wenyu Zhang ,&nbsp;Yajing Li ,&nbsp;Jiaxuan Hu ,&nbsp;Ning Wang","doi":"10.1016/j.eij.2025.100861","DOIUrl":"10.1016/j.eij.2025.100861","url":null,"abstract":"<div><div>With the continuous increase in vehicle ownership, the frequency of traffic accidents has risen significantly, and higher demands have consequently been placed on active vehicle safety technologies. To address the challenges of insufficient real-time performance and high model complexity in traditional object detection methods under complex traffic conditions, an improved front-vehicle collision warning system has been proposed by integrating YOLOv8 and DeepSort. In this approach, the original YOLOv8 backbone network is replaced by the lightweight MobileNet V4, and the Convolutional Block Attention Module (CBAM) is incorporated to enhance feature extraction capabilities. A comprehensive algorithmic framework has been constructed, integrating multi-object recognition, front-vehicle distance estimation, ego-vehicle speed calculation, and hierarchical warning level output. Experimental results on the KITTI dataset have demonstrated a detection accuracy of 95.5 % and a total detection time of 2.6 ms per frame. Additionally, a 2.6 % improvement in mAP50–95 has been observed, accompanied by only a 0.1 % decrease in the recall rate. These findings suggest that the proposed method provides effective technical support for front-vehicle collision warning in intelligent transportation environments.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100861"},"PeriodicalIF":4.3,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fostering Creative sports talents with transformer models for inclusive financial 以普惠金融变革模式培养体育创新人才
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.eij.2025.100838
Yang Gao , Wenjie wang , Yangyang Li
The study introduces a novel sports analytics approach and, for the first time, applies the TabTransformer model to predict attendance at fitness classes. The main objective is to uncover potential sports talent and create in-depth financial planning based on attendance patterns. Compared to the deep model and traditional model, the TabTransformer performs better with an accuracy of 0.710, precision of 0.738, recall of 0.707, F1 score of 0.722, and an AUC-ROC of 0.818. This is because the model can make use of textual embeddings to handle categorical features and linear transformations for numerical features, which are able to capture complex interactions between data. The results depict the high ability of the model to identify committed members in well-attended groups (e.g., Aqua and HIIT), but the model’s moderate recovery in poor-attendance groups (e.g., Strength and Cycling) directs us towards further investigation of barriers to access. These insights pave the way for designing targeted interventions and inclusive financial strategies, including membership subsidies and flexible schedules. Despite limitations such as the moderate size of the dataset and the lack of financial features, this research lays a strong foundation for the application of Transformer models in sports analytics. Ultimately, this study emphasizes the importance of using Transformer-based analytics to generate creative and equitable outcomes in fitness programs and is a step forward in identifying talent and promoting inclusion in sports.
该研究引入了一种新颖的运动分析方法,并首次应用TabTransformer模型来预测健身课程的出勤率。主要目标是发掘潜在的体育人才,并根据出勤模式制定深入的财务规划。与深度模型和传统模型相比,TabTransformer的准确率为0.710,精密度为0.738,召回率为0.707,F1得分为0.722,AUC-ROC为0.818。这是因为该模型可以利用文本嵌入来处理分类特征和数值特征的线性转换,从而能够捕获数据之间的复杂交互。结果表明,该模型在识别出勤率高的小组(例如,Aqua和HIIT)中的忠诚成员方面具有很高的能力,但该模型在出勤率低的小组(例如,力量和自行车)中的适度恢复,指导我们进一步调查进入障碍。这些见解为设计有针对性的干预措施和包容性金融战略铺平了道路,包括会员补贴和灵活的时间表。尽管存在数据集规模适中和缺乏金融特征等局限性,但本研究为Transformer模型在体育分析中的应用奠定了坚实的基础。最后,这项研究强调了在健身项目中使用基于transformer的分析来产生创造性和公平结果的重要性,并且是在识别人才和促进体育包容性方面迈出的一步。
{"title":"Fostering Creative sports talents with transformer models for inclusive financial","authors":"Yang Gao ,&nbsp;Wenjie wang ,&nbsp;Yangyang Li","doi":"10.1016/j.eij.2025.100838","DOIUrl":"10.1016/j.eij.2025.100838","url":null,"abstract":"<div><div>The study introduces a novel sports analytics approach and, for the first time, applies the TabTransformer model to predict attendance at fitness classes. The main objective is to uncover potential sports talent and create in-depth financial planning based on attendance patterns. Compared to the deep model and traditional model, the TabTransformer performs better with an accuracy of 0.710, precision of 0.738, recall of 0.707, F1 score of 0.722, and an AUC-ROC of 0.818. This is because the model can make use of textual embeddings to handle categorical features and linear transformations for numerical features, which are able to capture complex interactions between data. The results depict the high ability of the model to identify committed members in well-attended groups (e.g., Aqua and HIIT), but the model’s moderate recovery in poor-attendance groups (e.g., Strength and Cycling) directs us towards further investigation of barriers to access. These insights pave the way for designing targeted interventions and inclusive financial strategies, including membership subsidies and flexible schedules. Despite limitations such as the moderate size of the dataset and the lack of financial features, this research lays a strong foundation for the application of Transformer models in sports analytics. Ultimately, this study emphasizes the importance of using Transformer-based analytics to generate creative and equitable outcomes in fitness programs and is a step forward in identifying talent and promoting inclusion in sports.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100838"},"PeriodicalIF":4.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond 5G: PHWAN – A secure, low-latency, and cost-effective framework for Industry 4.0 smart manufacturing 超越5G: PHWAN——工业4.0智能制造的安全、低延迟和经济高效的框架
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.eij.2025.100859
Nurzati Iwani Othman , Hassan Jamil Syed , Athirah Mohd Ramly , Nur Hanis Sabrina binti Suhaimi , Aitizaz Ali , Mohamed Abdulnabi , Ahmad Fadzil Ismail
The digital transformation of Industry 4.0 requires networking solutions that deliver ultra-low latency, energy efficiency, and robust security. Conventional 5G architectures face limitations such as high infrastructure costs, performance bottlenecks, and vulnerabilities in mission-critical environments. This study proposes the Private Hybrid Wireless Access Network (PHWAN) framework, a novel architecture that combines localized spectrum management, edge–cloud orchestration, and blockchain-based Zero Trust security. A comprehensive cost–benefit model and MATLAB-based simulation of an industrial IoT environment were used to evaluate PHWAN against traditional 5G deployments. Results show that PHWAN reduces latency by 50 % (0.5 ms to 0.25 ms), lowers energy consumption by 61 % (5.4 mJ to 2.1 mJ), and improves bandwidth utilization by 108 %. Security analysis further demonstrates improved access control and data integrity without incurring significant overhead. These findings establish PHWAN as a scalable and cost-effective alternative to 5G for delay-sensitive and resource-constrained industrial IoT applications. Future research will extend validation to standardized platforms such as NS-3 and 5G-LENA and explore integration with 6G spectrum slicing, quantum-secured communications, and industrial metaverse applications to enhance resilience and interoperability in next-generation smart factories.
工业4.0的数字化转型需要提供超低延迟、能源效率和强大安全性的网络解决方案。传统的5G架构面临着基础设施成本高、性能瓶颈和关键任务环境中的漏洞等限制。本研究提出了专用混合无线接入网(PHWAN)框架,这是一种结合了本地化频谱管理、边缘云编排和基于区块链的零信任安全的新架构。采用综合成本效益模型和基于matlab的工业物联网环境仿真来评估PHWAN与传统5G部署的对比。结果表明,PHWAN将延迟降低50% (0.5 ms至0.25 ms),将能耗降低61% (5.4 mJ至2.1 mJ),并将带宽利用率提高108%。安全性分析进一步展示了改进的访问控制和数据完整性,而不会产生很大的开销。这些发现使PHWAN成为延迟敏感和资源受限的工业物联网应用中5G的可扩展且经济高效的替代方案。未来的研究将扩展验证到标准化平台,如NS-3和5G-LENA,并探索与6G频谱切片、量子安全通信和工业元宇宙应用的集成,以增强下一代智能工厂的弹性和互操作性。
{"title":"Beyond 5G: PHWAN – A secure, low-latency, and cost-effective framework for Industry 4.0 smart manufacturing","authors":"Nurzati Iwani Othman ,&nbsp;Hassan Jamil Syed ,&nbsp;Athirah Mohd Ramly ,&nbsp;Nur Hanis Sabrina binti Suhaimi ,&nbsp;Aitizaz Ali ,&nbsp;Mohamed Abdulnabi ,&nbsp;Ahmad Fadzil Ismail","doi":"10.1016/j.eij.2025.100859","DOIUrl":"10.1016/j.eij.2025.100859","url":null,"abstract":"<div><div>The digital transformation of Industry 4.0 requires networking solutions that deliver ultra-low latency, energy efficiency, and robust security. Conventional 5G architectures face limitations such as high infrastructure costs, performance bottlenecks, and vulnerabilities in mission-critical environments. This study proposes the Private Hybrid Wireless Access Network (PHWAN) framework, a novel architecture that combines localized spectrum management, edge–cloud orchestration, and blockchain-based Zero Trust security. A comprehensive cost–benefit model and MATLAB-based simulation of an industrial IoT environment were used to evaluate PHWAN against traditional 5G deployments. Results show that PHWAN reduces latency by 50 % (0.5 ms to 0.25 ms), lowers energy consumption by 61 % (5.4 mJ to 2.1 mJ), and improves bandwidth utilization by 108 %. Security analysis further demonstrates improved access control and data integrity without incurring significant overhead. These findings establish PHWAN as a scalable and cost-effective alternative to 5G for delay-sensitive and resource-constrained industrial IoT applications. Future research will extend validation to standardized platforms such as NS-3 and 5G-LENA and explore integration with 6G spectrum slicing, quantum-secured communications, and industrial metaverse applications to enhance resilience and interoperability in next-generation smart factories.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100859"},"PeriodicalIF":4.3,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145791748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Localized angle-based unsupervised outlier detection 基于局部角度的无监督离群点检测
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.eij.2025.100850
Wei Zheng , Lili Huang , Haiqiang Liu , Fa Zhu , Achyut Shankar , Imad Rida , Davide Moroni
The angle-based outlier detection (ABOD) is proposed to tackle the “curse of dimensionality” that exists in distance-related or density-related outlier detectors. However, ABOD may fail on multimodal datasets since it only considers global information. Furthermore, ABOD needs to calculate the angles between difference vectors from an instance to each pair of instances in the dataset except itself. Its time complexity reaches O (n3). In order to address these two issues, this paper proposes localized angle-based outlier detection (LABOD) which first finds the influence set, and then calculates the variance of angles between the difference vector from an instance to the mean of its neighbors in the influence set and the difference vectors from the instance to its neighbors in the influence set. The influence set consists of the nearest neighbor set and the reverse nearest neighbor set. Because the variance is defined by the angles in a local region, the proposed method can overcome the drawbacks of ABOD. The experiments performed on both synthetic and benchmark datasets demonstrate that LABOD is superior to ABOD.
提出了基于角度的离群点检测(ABOD),以解决距离相关或密度相关离群点检测器存在的“维数诅咒”问题。然而,ABOD在多模态数据集上可能会失败,因为它只考虑全局信息。此外,ABOD需要计算从一个实例到数据集中除自身之外的每对实例的差向量之间的角度。其时间复杂度达到0 (n3)。为了解决这两个问题,本文提出了基于局部角度的离群检测(LABOD)方法,该方法首先找到影响集,然后计算影响集中实例与相邻实例的均值之差向量和影响集中实例与相邻实例的差向量之差向量之间的角度方差。影响集由最近邻集和反向最近邻集组成。由于方差是由局部区域的角度来定义的,因此该方法克服了ABOD方法的缺点。在合成数据集和基准数据集上进行的实验表明,LABOD优于ABOD。
{"title":"Localized angle-based unsupervised outlier detection","authors":"Wei Zheng ,&nbsp;Lili Huang ,&nbsp;Haiqiang Liu ,&nbsp;Fa Zhu ,&nbsp;Achyut Shankar ,&nbsp;Imad Rida ,&nbsp;Davide Moroni","doi":"10.1016/j.eij.2025.100850","DOIUrl":"10.1016/j.eij.2025.100850","url":null,"abstract":"<div><div>The angle-based outlier detection (ABOD) is proposed to tackle the “curse of dimensionality” that exists in distance-related or density-related outlier detectors. However, ABOD may fail on multimodal datasets since it only considers global information. Furthermore, ABOD needs to calculate the angles between difference vectors from an instance to each pair of instances in the dataset except itself. Its time complexity reaches <em>O</em> (<em>n<sup>3</sup></em>). In order to address these two issues, this paper proposes localized angle-based outlier detection (LABOD) which first finds the influence set, and then calculates the variance of angles between the difference vector from an instance to the mean of its neighbors in the influence set and the difference vectors from the instance to its neighbors in the influence set. The influence set consists of the nearest neighbor set and the reverse nearest neighbor set. Because the variance is defined by the angles in a local region, the proposed method can overcome the drawbacks of ABOD. The experiments performed on both synthetic and benchmark datasets demonstrate that LABOD is superior to ABOD.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100850"},"PeriodicalIF":4.3,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145719068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive analysis of drifting test cases and critical areas for enhancing embedded systems using a Gaussian distribution methodology for multi-output analysis 漂移测试用例和关键区域的预测分析,增强嵌入式系统使用高斯分布方法进行多输出分析
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.eij.2025.100857
M.Lakshmi Prasad , R.Obulakonda Reddy , Sandeep Kautish , G.Suresh Reddy , Abdulaziz S. Almazyad , Ali Wagdy Mohamed , Seyed Jalaleddin Mousavirad
<div><div>Many sectors of the economy are impacted by embedded computer systems including tools, basic architecture and a range of other features that contribute to the success of these systems. It is vital to guarantee these systems’ functionality and dependability. However, instances in which drifting behaviour can occur in embedded systems as a result of things such as software upgrades, hardware deterioration, and environmental changes over time, which can lead to drifting behaviour. As a result, test cases may become antiquated or less effective in identifying important areas of concern. This study offers a new technique for the multi-output realm of Temperature Monitoring Nuclear Reactor Systems (TMCNRS) predictive analysis of drifting test cases and key regions in embedded systems using Gaussian distribution. The examination makes use of artificial intelligence practices and statistical tools to perceive and adjust to variations in the system’s behaviour. The suggested approach’s preliminary step is gathering historic test case and system behaviour data. Using this data, a baseline Gaussian distribution that replicates the anticipated behaviour of the embedded system and the test cases that go along with it is established. In the subsequent phase, the performance of the embedded system will be continuously monitored, and renewed data will gradually be collected as to its performance. Drift is the nonconformity of the system’s behaviour with the reference line distribution that has been set. Exploiting a multi-output Gaussian distribution model, the technique forecasts conceivable drift in every test case and crucial region. Advanced learning practices are incorporated in the third phase, which modifies the test cases and critical area recognition criteria based on identified drift. The algorithm may adaptively change test cases to increase their efficiency and more correctly identify new key regions by assessing the deviations from the baseline distribution. In order to authenticate the efficacy of the suggested methodology, a multitude of real-world embedded systems across diverse fields of application are subjected to intensive experimentation. According to our results, even in the face of drifting action, the predictive analysis that manipulates the multi-output Gaussian distribution greatly increases the accuracy of the test case as well as strengthens the capacity of the system to detect important locations within the system in the presence of drifting action. The creation of a reliable and flexible technique for identifying drifting test cases and crucial regions in integrated systems is where this study contributes. Through the use of Optimal Gaussian distribution (OGD) in the context of multiple outputs, the suggested methodology presents a novel way to preserve the dependability and efficiency of embedded systems, guaranteeing their capacity to function efficiently even in constantly evolving and dynamic surroundings. This study s
经济的许多部门都受到嵌入式计算机系统的影响,包括工具、基本架构和一系列有助于这些系统成功的其他功能。保证这些系统的功能和可靠性至关重要。然而,在嵌入式系统中,由于软件升级、硬件恶化和环境变化等因素,漂移行为可能会发生,这可能导致漂移行为。结果,测试用例可能会过时,或者在识别重要的关注领域方面不那么有效。本研究为温度监测核反应堆系统(TMCNRS)的多输出领域提供了一种基于高斯分布的嵌入式系统漂移测试用例和关键区域的预测分析新技术。该检查利用人工智能实践和统计工具来感知和调整系统行为的变化。建议的方法的初步步骤是收集历史测试用例和系统行为数据。使用这些数据,建立了一个基线高斯分布,该分布复制了嵌入式系统的预期行为以及与之相关的测试用例。在后续阶段,将持续监测嵌入式系统的性能,并逐步收集有关其性能的更新数据。漂移是指系统的行为与已设定的参考线分布不一致。该技术利用多输出高斯分布模型,预测每个测试用例和关键区域的可能漂移。高级学习实践被合并到第三阶段,它修改测试用例和基于已识别漂移的关键区域识别标准。该算法可以自适应地改变测试用例,以提高测试用例的效率,并通过评估基线分布的偏差来更正确地识别新的关键区域。为了验证所建议的方法的有效性,许多真实世界的嵌入式系统在不同的应用领域进行了密集的实验。根据我们的研究结果,即使面对漂移作用,操纵多输出高斯分布的预测分析也大大提高了测试用例的准确性,并增强了系统在存在漂移作用时检测系统内重要位置的能力。创建一种可靠和灵活的技术,用于识别集成系统中的漂移测试用例和关键区域,这是本研究的贡献所在。通过在多输出环境下使用最优高斯分布(OGD),所提出的方法提供了一种新颖的方法来保持嵌入式系统的可靠性和效率,保证它们即使在不断变化和动态的环境中也能有效地运行。本研究还应能够提高嵌入式系统的质量和可靠性,以便更有能力满足未来不断变化的社会和当前技术的需求。
{"title":"Predictive analysis of drifting test cases and critical areas for enhancing embedded systems using a Gaussian distribution methodology for multi-output analysis","authors":"M.Lakshmi Prasad ,&nbsp;R.Obulakonda Reddy ,&nbsp;Sandeep Kautish ,&nbsp;G.Suresh Reddy ,&nbsp;Abdulaziz S. Almazyad ,&nbsp;Ali Wagdy Mohamed ,&nbsp;Seyed Jalaleddin Mousavirad","doi":"10.1016/j.eij.2025.100857","DOIUrl":"10.1016/j.eij.2025.100857","url":null,"abstract":"&lt;div&gt;&lt;div&gt;Many sectors of the economy are impacted by embedded computer systems including tools, basic architecture and a range of other features that contribute to the success of these systems. It is vital to guarantee these systems’ functionality and dependability. However, instances in which drifting behaviour can occur in embedded systems as a result of things such as software upgrades, hardware deterioration, and environmental changes over time, which can lead to drifting behaviour. As a result, test cases may become antiquated or less effective in identifying important areas of concern. This study offers a new technique for the multi-output realm of Temperature Monitoring Nuclear Reactor Systems (TMCNRS) predictive analysis of drifting test cases and key regions in embedded systems using Gaussian distribution. The examination makes use of artificial intelligence practices and statistical tools to perceive and adjust to variations in the system’s behaviour. The suggested approach’s preliminary step is gathering historic test case and system behaviour data. Using this data, a baseline Gaussian distribution that replicates the anticipated behaviour of the embedded system and the test cases that go along with it is established. In the subsequent phase, the performance of the embedded system will be continuously monitored, and renewed data will gradually be collected as to its performance. Drift is the nonconformity of the system’s behaviour with the reference line distribution that has been set. Exploiting a multi-output Gaussian distribution model, the technique forecasts conceivable drift in every test case and crucial region. Advanced learning practices are incorporated in the third phase, which modifies the test cases and critical area recognition criteria based on identified drift. The algorithm may adaptively change test cases to increase their efficiency and more correctly identify new key regions by assessing the deviations from the baseline distribution. In order to authenticate the efficacy of the suggested methodology, a multitude of real-world embedded systems across diverse fields of application are subjected to intensive experimentation. According to our results, even in the face of drifting action, the predictive analysis that manipulates the multi-output Gaussian distribution greatly increases the accuracy of the test case as well as strengthens the capacity of the system to detect important locations within the system in the presence of drifting action. The creation of a reliable and flexible technique for identifying drifting test cases and crucial regions in integrated systems is where this study contributes. Through the use of Optimal Gaussian distribution (OGD) in the context of multiple outputs, the suggested methodology presents a novel way to preserve the dependability and efficiency of embedded systems, guaranteeing their capacity to function efficiently even in constantly evolving and dynamic surroundings. This study s","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"33 ","pages":"Article 100857"},"PeriodicalIF":4.3,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145718906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EmbryoSwin++: Enhanced swin transformer with supervised contrastive learning for embryo multi-stage classification in assisted reproductive technology embryoswin++:用于辅助生殖技术中胚胎多阶段分类的带有监督对比学习的增强swin变压器
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1016/j.eij.2025.100852
Ratheeshkumar A.M , D. Surendran
Accurate embryo stage classification is crucial for enhancing IVF success rates; however, existing automated methods often struggle to handle inter-stage similarity and class imbalance, leading to misclassification across morphologically similar developmental phases. Furthermore, conventional CNN-based approaches tend to lose fine-grained spatial details, which are critical for distinguishing closely related stages. The proposed EmbryoSwin++ model addresses these gaps by integrating a Swin Transformer backbone with a supervised contrastive learning head and a Balanced Batch Sampler, enabling the model to learn discriminative embeddings while ensuring equitable representation of all 15 developmental stages. This dual-loss framework, combining label-smoothed cross-entropy and contrastive loss, enhances robustness, mitigates overfitting, and improves generalization across datasets. Evaluated on the Human Embryo Time-Lapse Video Dataset, the model achieved a validation accuracy of 92.12 %, a macro F1-score of 0.9196, and high AUC values approaching 1.0 for all classes, demonstrating strong discriminative capability. Grad-CAM analysis confirmed the model’s focus on biologically relevant embryo regions, validating its interpretability.
准确的胚胎分期是提高体外受精成功率的关键;然而,现有的自动化方法往往难以处理阶段间的相似性和类不平衡,导致在形态相似的发育阶段之间的错误分类。此外,传统的基于cnn的方法往往会失去细粒度的空间细节,而这些细节对于区分密切相关的阶段至关重要。提出的embryoswin++模型通过将Swin Transformer主干与监督对比学习头和平衡批采样器集成在一起,解决了这些差距,使模型能够学习判别嵌入,同时确保所有15个发展阶段的公平代表。这种双损失框架结合了标签平滑交叉熵和对比损失,增强了鲁棒性,减轻了过拟合,并提高了跨数据集的泛化。在人类胚胎延时视频数据集上,该模型的验证准确率为92.12%,宏观f1得分为0.9196,所有类别的AUC值均接近1.0,具有较强的判别能力。Grad-CAM分析证实了该模型对生物学相关胚胎区域的关注,验证了其可解释性。
{"title":"EmbryoSwin++: Enhanced swin transformer with supervised contrastive learning for embryo multi-stage classification in assisted reproductive technology","authors":"Ratheeshkumar A.M ,&nbsp;D. Surendran","doi":"10.1016/j.eij.2025.100852","DOIUrl":"10.1016/j.eij.2025.100852","url":null,"abstract":"<div><div>Accurate embryo stage classification is crucial for enhancing IVF success rates; however, existing automated methods often struggle to handle inter-stage similarity and class imbalance, leading to misclassification across morphologically similar developmental phases. Furthermore, conventional CNN-based approaches tend to lose fine-grained spatial details, which are critical for distinguishing closely related stages. The proposed EmbryoSwin++ model addresses these gaps by integrating a Swin Transformer backbone with a supervised contrastive learning head and a Balanced Batch Sampler, enabling the model to learn discriminative embeddings while ensuring equitable representation of all 15 developmental stages. This dual-loss framework, combining label-smoothed cross-entropy and contrastive loss, enhances robustness, mitigates overfitting, and improves generalization across datasets. Evaluated on the Human Embryo Time-Lapse Video Dataset, the model achieved a validation accuracy of 92.12 %, a macro F1-score of 0.9196, and high AUC values approaching 1.0 for all classes, demonstrating strong discriminative capability. Grad-CAM analysis confirmed the model’s focus on biologically relevant embryo regions, validating its interpretability.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"32 ","pages":"Article 100852"},"PeriodicalIF":4.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wordcloud: Detecting mobile malware with deep learning and word of cloud Wordcloud:利用深度学习和云词检测移动恶意软件
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1016/j.eij.2025.100834
Che Akmal Che Yahaya , Ahmad Firdaus , Azlee Zabidi , Mohd Faizal Ab Razak , Fairuz Amalina Narudin
Mobile malware poses an increasing threat as it operates covertly and jeopardises user data and device functionality. Security professionals adopt two forms of analysis (static and dynamic) to detect malware. Dynamic analysis requires a significant amount of computational power and time to monitor, whereas static analysis uses fewer resources and processes more efficiently. However, finding the relevant features in static analysis is challenging, as the categories of features (permission, system calls, and intent) are constantly evolving, with too many options to choose from, and obfuscation presents additional challenges. In this paper, we present Wordcloud, a framework that uses static analysis, deep learning, and visual semantic representation to overcome these challenges. Our approach reverse-engineers Android APK files to smali code, then converts it into visual Wordcloud patterns that encode token frequency and structural patterns of the malware. These images are then classified using a convolutional neural network (CNN) to determine whether a sample is benign or malicious. Wordcloud survives obfuscation as it transforms the smali code into a visual “signature” that reflects statistical and structural traits of malware. We evaluated a real-world dataset of 50,000 Android samples from AMD and Androzoo, achieving 99.48% accuracy with high precision, recall, specificity, and F1 score metrics. Statistical tests were also performed to validate the performance improvement from our filtering process. This work contributes to a scalable, efficient malware prediction and detection solution for early threat identification in mobile security systems.
移动恶意软件构成了越来越大的威胁,因为它隐蔽地运行,危及用户数据和设备功能。安全专家采用两种分析形式(静态和动态)来检测恶意软件。动态分析需要大量的计算能力和时间来监控,而静态分析使用更少的资源和更有效的流程。然而,在静态分析中找到相关的特性是具有挑战性的,因为特性的类别(权限、系统调用和意图)在不断发展,有太多的选项可供选择,并且混淆带来了额外的挑战。在本文中,我们提出了Wordcloud,这是一个使用静态分析、深度学习和视觉语义表示来克服这些挑战的框架。我们的方法是将Android APK文件逆向工程为小代码,然后将其转换为可视化的Wordcloud模式,编码恶意软件的令牌频率和结构模式。然后使用卷积神经网络(CNN)对这些图像进行分类,以确定样本是良性的还是恶意的。Wordcloud将小代码转换成视觉“签名”,从而反映出恶意软件的统计和结构特征,从而在混淆中幸存下来。我们评估了来自AMD和Androzoo的50,000个Android样本的真实数据集,准确率达到99.48%,具有高精度、召回率、特异性和F1评分指标。还执行了统计测试,以验证我们的过滤过程对性能的改进。这项工作有助于为移动安全系统中的早期威胁识别提供可扩展、高效的恶意软件预测和检测解决方案。
{"title":"Wordcloud: Detecting mobile malware with deep learning and word of cloud","authors":"Che Akmal Che Yahaya ,&nbsp;Ahmad Firdaus ,&nbsp;Azlee Zabidi ,&nbsp;Mohd Faizal Ab Razak ,&nbsp;Fairuz Amalina Narudin","doi":"10.1016/j.eij.2025.100834","DOIUrl":"10.1016/j.eij.2025.100834","url":null,"abstract":"<div><div>Mobile malware poses an increasing threat as it operates covertly and jeopardises user data and device functionality. Security professionals adopt two forms of analysis (static and dynamic) to detect malware. Dynamic analysis requires a significant amount of computational power and time to monitor, whereas static analysis uses fewer resources and processes more efficiently. However, finding the relevant features in static analysis is challenging, as the categories of features (permission, system calls, and intent) are constantly evolving, with too many options to choose from, and obfuscation presents additional challenges. In this paper, we present Wordcloud, a framework that uses static analysis, deep learning, and visual semantic representation to overcome these challenges. Our approach reverse-engineers Android APK files to smali code, then converts it into visual Wordcloud patterns that encode token frequency and structural patterns of the malware. These images are then classified using a convolutional neural network (CNN) to determine whether a sample is benign or malicious. Wordcloud survives obfuscation as it transforms the smali code into a visual “signature” that reflects statistical and structural traits of malware. We evaluated a real-world dataset of 50,000 Android samples from AMD and Androzoo, achieving 99.48% accuracy with high precision, recall, specificity, and F1 score metrics. Statistical tests were also performed to validate the performance improvement from our filtering process. This work contributes to a scalable, efficient malware prediction and detection solution for early threat identification in mobile security systems.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"32 ","pages":"Article 100834"},"PeriodicalIF":4.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145693365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection cyberbullying using AI and sentiment analysis to examine psychological impacts on vulnerable groups 利用人工智能和情感分析检测网络欺凌,以检查对弱势群体的心理影响
IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1016/j.eij.2025.100856
Abdulnaser M. Fashakh , Mesut Çevik , Şenay Kocakoyun Aydoğan , Abdullahi Abdu Ibrahim
This study aims to assess the effectiveness of machine learning and deep learning models in detecting cyberbullying and evaluating its psychological impact on vulnerable groups using textual and emotional features. The models assessed include traditional classifiers—Logistic Regression, Decision Tree, and Random Forest and deep learning models, such as MLP, CNN, RNN, and (LSTM) networks. TF-IDF for text vectorization and TextBlob for sentiment analysis were utilized. In spite of TF-IDF's shortcoming. Its simplicity enabled quick prototyping and insight results. The dataset contained 58,000 tweets, with 46,000 obtained from Kaggle and 12,000 collected via the Twitter API. Tweets were labeled into cyberbullying_type (gender, age, religion, and ethnicity) and subcategories: gender (male, female, LGBT, other), age (adult, teenager, other), religion (Muslim, Christian, Jewish, other), and ethnicity (ethical, unethical, other). Keyword-based classification was used for Subcategory assignment. The emotional score derived from text served as a proxy for measuring psychological impact. We emphasize that this study is observational and does not rely on clinical psychological evaluation. Results showed that female and LGBT users experienced the highest levels of cyberbullying among gender subcategories. Teenagers were most affected by age-based bullying. Unethical content dominated ethnicity-based attacks, and Muslims faced the highest frequency of cyberbullying and negative sentiment in religious categories. Sentiment analysis assisted in identifying emotional patterns concerning online abuse. Among models RNN and LSTM models achieved the highest accuracy (0.98), outperforming others. Among the traditional models, Random Forest performed better, while Logistic Regression was the worst performing. The inclusion of sentiment features significantly improved calssification accuracy, particularly in LSTM. A multi-output LSTM model was created to predict cyberbullying_type, sub_category and sentiment all at once, providing an end-to-end detection system. This framwork enables proactive monitoring of online harm and support timely interventions.
本研究旨在评估机器学习和深度学习模型在检测网络欺凌方面的有效性,并利用文本和情感特征评估其对弱势群体的心理影响。评估的模型包括传统分类器-逻辑回归,决策树和随机森林以及深度学习模型,如MLP, CNN, RNN和(LSTM)网络。利用TF-IDF进行文本矢量化,利用TextBlob进行情感分析。尽管TF-IDF有缺点。它的简单性使快速原型和洞察结果成为可能。该数据集包含5.8万条推文,其中4.6万条来自Kaggle, 1.2万条来自Twitter API。推文被标记为网络欺凌类型(性别、年龄、宗教和种族)和子类别:性别(男性、女性、LGBT、其他)、年龄(成人、青少年、其他)、宗教(穆斯林、基督教、犹太教徒、其他)和种族(道德、不道德、其他)。子类别分配采用基于关键词的分类。从文本中得出的情感得分作为衡量心理影响的代理。我们强调,这项研究是观察性的,不依赖于临床心理评估。结果显示,在性别子类中,女性和LGBT用户经历的网络欺凌程度最高。青少年最容易受到基于年龄的欺凌。不道德的内容主导了基于种族的攻击,穆斯林面临最高频率的网络欺凌和宗教类别的负面情绪。情绪分析有助于识别与网络虐待有关的情绪模式。其中,RNN和LSTM模型的准确率最高(0.98),优于其他模型。在传统模型中,随机森林模型表现较好,而逻辑回归模型表现最差。情感特征的加入显著提高了分类精度,特别是在LSTM中。创建了一个多输出LSTM模型来同时预测网络欺凌类型、子类别和情绪,提供了一个端到端的检测系统。该框架能够主动监测网络危害并支持及时干预。
{"title":"Detection cyberbullying using AI and sentiment analysis to examine psychological impacts on vulnerable groups","authors":"Abdulnaser M. Fashakh ,&nbsp;Mesut Çevik ,&nbsp;Şenay Kocakoyun Aydoğan ,&nbsp;Abdullahi Abdu Ibrahim","doi":"10.1016/j.eij.2025.100856","DOIUrl":"10.1016/j.eij.2025.100856","url":null,"abstract":"<div><div>This study aims to assess the effectiveness of machine learning and deep learning models in detecting cyberbullying and evaluating its psychological impact on vulnerable groups using textual and emotional features. The models assessed include traditional classifiers—Logistic Regression, Decision Tree, and Random Forest and deep learning models, such as MLP, CNN, RNN, and (LSTM) networks. TF-IDF for text vectorization and TextBlob for sentiment analysis were utilized. In spite of TF-IDF's shortcoming. Its simplicity enabled quick prototyping and insight results. The dataset contained 58,000 tweets, with 46,000 obtained from Kaggle and 12,000 collected via the Twitter API. Tweets were labeled into cyberbullying_type (gender, age, religion, and ethnicity) and subcategories: gender (male, female, LGBT, other), age (adult, teenager, other), religion (Muslim, Christian, Jewish, other), and ethnicity (ethical, unethical, other). Keyword-based classification was used for Subcategory assignment. The emotional score derived from text served as a proxy for measuring psychological impact. We emphasize that this study is observational and does not rely on clinical psychological evaluation. Results showed that female and LGBT users experienced the highest levels of cyberbullying among gender subcategories. Teenagers were most affected by age-based bullying. Unethical content dominated ethnicity-based attacks, and Muslims faced the highest frequency of cyberbullying and negative sentiment in religious categories. Sentiment analysis assisted in identifying emotional patterns concerning online abuse. Among models RNN and LSTM models achieved the highest accuracy (0.98), outperforming others. Among the traditional models, Random Forest performed better, while Logistic Regression was the worst performing. The inclusion of sentiment features significantly improved calssification accuracy, particularly in LSTM. A multi-output LSTM model was created to predict cyberbullying_type, sub_category and sentiment all at once, providing an end-to-end detection system. This framwork enables proactive monitoring of online harm and support timely interventions.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"32 ","pages":"Article 100856"},"PeriodicalIF":4.3,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145747639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Egyptian Informatics Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1