首页 > 最新文献

Big Data Research最新文献

英文 中文
Research on adaptive long-term time series carbon dioxide emission prediction model based on improved multilayer perceptron 基于改进多层感知器的自适应长时间序列二氧化碳排放预测模型研究
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-11 DOI: 10.1016/j.bdr.2025.100572
Jiachen Xie , Jiwei Qin , Xizhong Qin , Daishun Cui , Qiang Li , Dezhi Sun
Carbon dioxide (CO2) emissions play a crucial role in driving global climate change. Precise and reliable predictions of CO2 emission trends are instrumental in fostering sustainable development and realizing dual-carbon goals. Due to complex human activities, economic development, and meteorological factors, accurate long-term time series prediction of CO2 emissions encounters numerous challenges, such as the long-term temporal dependencies and complicated non-linear correlation in long-term time series CO2 emissions. To address these challenges, we propose a long-term time series CO2 emissions prediction model called CarbonLinear. The proposed CarbonLinear is based on Multilayer Perceptron that can better capture non-linear relationships in long-term time series CO2 emissions through a flexible network structure and deep connectivity. The proposed CarbonLinear employs an adaptive global-local multiscale integrated modeling architecture to mitigate the data distribution shift problem adaptively. In addition, the proposed CarbonLinear introduces a sequence segmentation module, which allows CarbonLinear to model local features in a long-term time series CO2 emissions and improves the computational efficiency of the model. Experimental results show that the proposed CarbonLinear performs well on CO2 emissions datasets from multiple regions, significantly improving over other models. The proposed CarbonLinear provides scientists and policymakers with a more accurate and reliable tool for CO2 emissions prediction.
二氧化碳(CO2)排放在推动全球气候变化方面发挥着至关重要的作用。准确可靠地预测二氧化碳排放趋势有助于促进可持续发展和实现双碳目标。由于人类活动、经济发展和气象等因素的复杂性,CO2排放长期时间序列的准确预测面临着诸多挑战,如CO2排放长期时间序列的长期时间依赖性和复杂的非线性相关性。为了应对这些挑战,我们提出了一个长期时间序列二氧化碳排放预测模型,称为CarbonLinear。提出的CarbonLinear基于多层感知器,通过灵活的网络结构和深度连接,可以更好地捕捉长期时间序列二氧化碳排放的非线性关系。提出的CarbonLinear模型采用自适应全局-局部多尺度集成建模架构,自适应地缓解了数据分布偏移问题。此外,本文提出的CarbonLinear引入了序列分割模块,该模块允许CarbonLinear对长期时间序列CO2排放中的局部特征进行建模,提高了模型的计算效率。实验结果表明,本文提出的CarbonLinear模型在多区域CO2排放数据集上表现良好,明显优于其他模型。拟议中的CarbonLinear为科学家和决策者提供了一个更准确、更可靠的二氧化碳排放预测工具。
{"title":"Research on adaptive long-term time series carbon dioxide emission prediction model based on improved multilayer perceptron","authors":"Jiachen Xie ,&nbsp;Jiwei Qin ,&nbsp;Xizhong Qin ,&nbsp;Daishun Cui ,&nbsp;Qiang Li ,&nbsp;Dezhi Sun","doi":"10.1016/j.bdr.2025.100572","DOIUrl":"10.1016/j.bdr.2025.100572","url":null,"abstract":"<div><div>Carbon dioxide (CO<sub>2</sub>) emissions play a crucial role in driving global climate change. Precise and reliable predictions of CO<sub>2</sub> emission trends are instrumental in fostering sustainable development and realizing dual-carbon goals. Due to complex human activities, economic development, and meteorological factors, accurate long-term time series prediction of CO<sub>2</sub> emissions encounters numerous challenges, such as the long-term temporal dependencies and complicated non-linear correlation in long-term time series CO<sub>2</sub> emissions. To address these challenges, we propose a long-term time series CO<sub>2</sub> emissions prediction model called CarbonLinear. The proposed CarbonLinear is based on Multilayer Perceptron that can better capture non-linear relationships in long-term time series CO<sub>2</sub> emissions through a flexible network structure and deep connectivity. The proposed CarbonLinear employs an adaptive global-local multiscale integrated modeling architecture to mitigate the data distribution shift problem adaptively. In addition, the proposed CarbonLinear introduces a sequence segmentation module, which allows CarbonLinear to model local features in a long-term time series CO<sub>2</sub> emissions and improves the computational efficiency of the model. Experimental results show that the proposed CarbonLinear performs well on CO<sub>2</sub> emissions datasets from multiple regions, significantly improving over other models. The proposed CarbonLinear provides scientists and policymakers with a more accurate and reliable tool for CO<sub>2</sub> emissions prediction.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100572"},"PeriodicalIF":4.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PATH: A discrete-sequence dataset for evaluating online unsupervised anomaly detection approaches for multivariate time series PATH:一个用于评估多元时间序列在线无监督异常检测方法的离散序列数据集
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-07 DOI: 10.1016/j.bdr.2025.100573
Lucas Correia , Jan-Christoph Goos , Thomas Bäck , Anna V. Kononova
Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. Additionally, our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data. Furthermore, results show that the threshold used can have a large influence on detection performance, hence more work needs to be invested in methods to find a suitable threshold without the need for labelled data.
由于缺乏高质量的数据集,对多变量时间序列的异常检测方法进行基准测试是一项具有挑战性的任务。目前可公开获得的数据集太小,不多样化,并且具有琐碎的异常,这阻碍了该研究领域的可衡量进展。我们提出了一个解决方案:通过最先进的仿真工具生成一个多样化、广泛和重要的数据集,该数据集反映了汽车动力总成的真实行为,包括其多元、动态和可变状态属性。此外,我们的数据集代表了一个离散序列问题,这在以前的文献中提出的解决方案仍然没有解决。为了满足无监督和半监督异常检测设置,以及时间序列生成和预测,我们提供了不同版本的数据集,其中根据任务提供了污染和干净版本的训练和测试子集。我们还提供了基于确定性和变分自编码器以及非参数方法的选择方法的基线结果。正如预期的那样,基线实验表明,在数据集的半监督版本上训练的方法优于无监督版本,这突出了对受污染训练数据更鲁棒的方法的需求。此外,结果表明所使用的阈值对检测性能有很大影响,因此需要投入更多的工作来寻找不需要标记数据的合适阈值的方法。
{"title":"PATH: A discrete-sequence dataset for evaluating online unsupervised anomaly detection approaches for multivariate time series","authors":"Lucas Correia ,&nbsp;Jan-Christoph Goos ,&nbsp;Thomas Bäck ,&nbsp;Anna V. Kononova","doi":"10.1016/j.bdr.2025.100573","DOIUrl":"10.1016/j.bdr.2025.100573","url":null,"abstract":"<div><div>Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. Additionally, our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data. Furthermore, results show that the threshold used can have a large influence on detection performance, hence more work needs to be invested in methods to find a suitable threshold without the need for labelled data.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100573"},"PeriodicalIF":4.2,"publicationDate":"2025-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective adaptive res-BiGRU network for pest classification performance based on regionViT-yolov8-aided pest detection technique 基于区域v -yolov8辅助害虫检测技术的有效自适应res-BiGRU网络害虫分类性能
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-31 DOI: 10.1016/j.bdr.2025.100571
S.M. Mehzabeen, R Gayathri
Effective pest management often involves the use of appropriate pesticides, and early identification of pests is essential for protecting crops. Timely and accurate identification of pests using deep learning-based approaches have gained traction as effective solutions for addressing agricultural challenges, including the detection of plant diseases and pests. Thus, a pest recognition model is developed to improve the productivity of the crops based on effective deep learning method. This model is implemented by accumulating the required input images, which are then fed into the developed Region Vision Transformer-Yolov8 (RegViT- Yolov8) model for pest detection. Then, the RegionViT features of all Bounded Boxes are extracted from the detected outcome and concatenated together. Further, Principal Component Analysis (PCA) based feature reduction is executed by considering the concatenated features as input. Following feature reduction, pest classification is performed by using the developed Adaptive Residual Bidirectional Gated Recurrent Unit (AR-BiGRU). Moreover, the classification accuracy is enhanced by optimizing the system parameters using an Advanced Random Variable-based Preschool Education Optimization Algorithm (ARV-PEOA). Thus, the effectiveness of this framework is validated with diverse measures and the attained outcome is compared with the existing techniques to showcase its efficiency. While considering the accuracy measure, the proposed model has attained 96.7 % accurate result on the analysis based on 500 hidden neurons.
有效的病虫害管理通常涉及使用适当的杀虫剂,而及早发现病虫害对保护作物至关重要。利用基于深度学习的方法及时准确地识别有害生物,已成为应对农业挑战(包括检测植物病虫害)的有效解决方案。在此基础上,提出了一种基于有效深度学习方法的害虫识别模型,以提高作物的生产力。该模型是通过积累所需的输入图像来实现的,然后将这些图像输入到开发的区域视觉转换器-Yolov8 (RegViT- Yolov8)模型中,用于害虫检测。然后,从检测结果中提取所有有界框的RegionViT特征并将其连接在一起。进一步,通过考虑连接的特征作为输入,执行基于主成分分析(PCA)的特征约简。在特征缩减之后,害虫分类通过使用开发的自适应残留双向门控循环单元(AR-BiGRU)进行。此外,采用基于高级随机变量的学前教育优化算法(ARV-PEOA)对系统参数进行优化,提高了分类精度。因此,用不同的方法验证了该框架的有效性,并将所获得的结果与现有技术进行了比较,以展示其效率。在考虑准确率度量的同时,基于500个隐藏神经元的分析,该模型的准确率达到了96.7%。
{"title":"Effective adaptive res-BiGRU network for pest classification performance based on regionViT-yolov8-aided pest detection technique","authors":"S.M. Mehzabeen,&nbsp;R Gayathri","doi":"10.1016/j.bdr.2025.100571","DOIUrl":"10.1016/j.bdr.2025.100571","url":null,"abstract":"<div><div>Effective pest management often involves the use of appropriate pesticides, and early identification of pests is essential for protecting crops. Timely and accurate identification of pests using deep learning-based approaches have gained traction as effective solutions for addressing agricultural challenges, including the detection of plant diseases and pests. Thus, a pest recognition model is developed to improve the productivity of the crops based on effective deep learning method. This model is implemented by accumulating the required input images, which are then fed into the developed Region Vision Transformer-Yolov8 (RegViT- Yolov8) model for pest detection. Then, the RegionViT features of all Bounded Boxes are extracted from the detected outcome and concatenated together. Further, Principal Component Analysis (PCA) based feature reduction is executed by considering the concatenated features as input. Following feature reduction, pest classification is performed by using the developed Adaptive Residual Bidirectional Gated Recurrent Unit (AR-BiGRU). Moreover, the classification accuracy is enhanced by optimizing the system parameters using an Advanced Random Variable-based Preschool Education Optimization Algorithm (ARV-PEOA). Thus, the effectiveness of this framework is validated with diverse measures and the attained outcome is compared with the existing techniques to showcase its efficiency. While considering the accuracy measure, the proposed model has attained 96.7 % accurate result on the analysis based on 500 hidden neurons.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100571"},"PeriodicalIF":4.2,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tangible progress: Employing visual metaphors and physical interfaces in AI-based English language learning 有形进步:在基于人工智能的英语学习中运用视觉隐喻和物理界面
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-30 DOI: 10.1016/j.bdr.2025.100570
Mei Wang , Hai-Ning Liang , Yu Liu , Chengtao Ji , Lingyun Yu
In this study, we aim to explore an interactive system that integrates visual metaphors, AI-powered essay scoring techniques, and tangible feedback to enhance students' English language learning experience. Over the past decade, AI has made significant strides across various domains, including education. A prominent example of this is the integration of AI-driven language learning tools featuring Automated Essay Scoring (AES) systems. Traditionally, AES relied on predefined criteria and provided scores in simple text formats, which often lack depth and fail to engage students in understanding their progress or areas for improvement. To address these limitations and enhance learnability, we propose a system that harnesses AI-powered AES with a visualization approach. Our system includes three main components: an AI-driven scoring algorithm, a visualization interface translating scoring outcomes into visual metaphors, and tangible postcards for presenting scores. To evaluate the usage of our visualization system and tangible-formatted feedback in practice, we conducted domain expert interviews and a three-stage user study. The results indicate that the progressive visual feedback and tangible postcards increased practice frequency and significantly boosted study motivation. Tangible visual feedback showed positive effects on fostering progressive learning. Through this study, we recognized the potential of combining AI, visual metaphors, and tangible feedback in English education to encourage continuous and active learning.
在这项研究中,我们的目标是探索一个集成了视觉隐喻、人工智能作文评分技术和有形反馈的互动系统,以提高学生的英语学习体验。在过去的十年里,人工智能在包括教育在内的各个领域取得了重大进展。这方面的一个突出例子是集成了具有自动论文评分(AES)系统的人工智能驱动的语言学习工具。传统上,AES依赖于预定义的标准,并以简单的文本格式提供分数,这往往缺乏深度,无法让学生了解他们的进步或需要改进的地方。为了解决这些限制并提高可学习性,我们提出了一个利用可视化方法利用人工智能驱动的AES的系统。我们的系统包括三个主要组成部分:人工智能驱动的评分算法,将评分结果转换为视觉隐喻的可视化界面,以及用于显示分数的有形明信片。为了评估可视化系统和有形格式反馈在实践中的使用情况,我们进行了领域专家访谈和三个阶段的用户研究。结果表明,渐进式视觉反馈和有形明信片增加了练习频率,显著提高了学习动机。有形的视觉反馈对促进渐进式学习有积极作用。通过这项研究,我们认识到在英语教育中结合人工智能、视觉隐喻和有形反馈的潜力,以鼓励持续和主动的学习。
{"title":"Tangible progress: Employing visual metaphors and physical interfaces in AI-based English language learning","authors":"Mei Wang ,&nbsp;Hai-Ning Liang ,&nbsp;Yu Liu ,&nbsp;Chengtao Ji ,&nbsp;Lingyun Yu","doi":"10.1016/j.bdr.2025.100570","DOIUrl":"10.1016/j.bdr.2025.100570","url":null,"abstract":"<div><div>In this study, we aim to explore an interactive system that integrates visual metaphors, AI-powered essay scoring techniques, and tangible feedback to enhance students' English language learning experience. Over the past decade, AI has made significant strides across various domains, including education. A prominent example of this is the integration of AI-driven language learning tools featuring Automated Essay Scoring (AES) systems. Traditionally, AES relied on predefined criteria and provided scores in simple text formats, which often lack depth and fail to engage students in understanding their progress or areas for improvement. To address these limitations and enhance learnability, we propose a system that harnesses AI-powered AES with a visualization approach. Our system includes three main components: an AI-driven scoring algorithm, a visualization interface translating scoring outcomes into visual metaphors, and tangible postcards for presenting scores. To evaluate the usage of our visualization system and tangible-formatted feedback in practice, we conducted domain expert interviews and a three-stage user study. The results indicate that the progressive visual feedback and tangible postcards increased practice frequency and significantly boosted study motivation. Tangible visual feedback showed positive effects on fostering progressive learning. Through this study, we recognized the potential of combining AI, visual metaphors, and tangible feedback in English education to encourage continuous and active learning.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100570"},"PeriodicalIF":4.2,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145467311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exogenous variable driven cotton prices prediction: comparison of statistical model with sequence based deep learning models 外生变量驱动的棉花价格预测:统计模型与基于序列的深度学习模型的比较
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-28 DOI: 10.1016/j.bdr.2025.100569
G.Y. Chandan , Prity Kumari
This study investigates price forecasting model for cotton in Gujarat, India, using daily modal prices and arrival data sourced from Agmarknet spanning April 2002 to April 2023. Given the volatile and nonlinear nature of agricultural prices, this research integrates exogenous variables through statistical and advanced deep learning models to enhance predictive accuracy. The models tested include the Autoregressive Integrated Moving Average with Exogenous variables (ARIMAX), Artificial Neural Networks (ANN), Recurrent Neural Networks (RNN), Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM) and Stacked LSTM. Results reveal that Stacked LSTM model outperforms traditional statistical and basic neural network models, achieving the lowest values in accuracy metrics like Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE). With 365 days ahead forecast horizon, Stacked LSTM model yielded an error of 9.30% during pre-sowing season (May-June 2023) and 13.75% in harvesting season (October-November 2023). This precision in capturing seasonal price fluctuations can be attributed to the integration of relevant exogenous variables, which enhance the model’s ability to account for external market influences affecting cotton prices in Gujarat.
本文利用Agmarknet网站2002年4月至2023年4月期间的每日运输价格和到货数据,研究了印度古吉拉特邦棉花的价格预测模型。考虑到农产品价格的波动性和非线性,本研究通过统计和先进的深度学习模型整合外生变量,以提高预测精度。测试的模型包括带有外生变量的自回归综合移动平均(ARIMAX)、人工神经网络(ANN)、循环神经网络(RNN)、门控循环单元(GRU)、长短期记忆(LSTM)和堆叠LSTM。结果表明,堆叠LSTM模型优于传统的统计和基本神经网络模型,在均方根误差(RMSE)、平均绝对百分比误差(MAPE)和对称平均绝对百分比误差(SMAPE)等精度指标上均达到最低。叠置LSTM模型在提前365天预测时,预播期(2023年5 - 6月)误差为9.30%,收收期(2023年10 - 11月)误差为13.75%。这种捕捉季节性价格波动的精确度可归因于相关外生变量的整合,这增强了模型解释影响古吉拉特邦棉花价格的外部市场影响的能力。
{"title":"Exogenous variable driven cotton prices prediction: comparison of statistical model with sequence based deep learning models","authors":"G.Y. Chandan ,&nbsp;Prity Kumari","doi":"10.1016/j.bdr.2025.100569","DOIUrl":"10.1016/j.bdr.2025.100569","url":null,"abstract":"<div><div>This study investigates price forecasting model for cotton in Gujarat, India, using daily modal prices and arrival data sourced from Agmarknet spanning April 2002 to April 2023. Given the volatile and nonlinear nature of agricultural prices, this research integrates exogenous variables through statistical and advanced deep learning models to enhance predictive accuracy. The models tested include the Autoregressive Integrated Moving Average with Exogenous variables (ARIMAX), Artificial Neural Networks (ANN), Recurrent Neural Networks (RNN), Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM) and Stacked LSTM. Results reveal that Stacked LSTM model outperforms traditional statistical and basic neural network models, achieving the lowest values in accuracy metrics like Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE). With 365 days ahead forecast horizon, Stacked LSTM model yielded an error of 9.30% during pre-sowing season (May-June 2023) and 13.75% in harvesting season (October-November 2023). This precision in capturing seasonal price fluctuations can be attributed to the integration of relevant exogenous variables, which enhance the model’s ability to account for external market influences affecting cotton prices in Gujarat.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100569"},"PeriodicalIF":4.2,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145418507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STED: An encoder-decoder architecture for long-term spatio-temporal weather forecasting 一种用于长期时空天气预报的编码器-解码器架构
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-17 DOI: 10.1016/j.bdr.2025.100568
Haoran Gong, Lei Lei, Shan Ma, Chunyu Qiu
Meteorological data is closely related to everyone's daily life, and accurate weather forecasting is crucial for many socio-economic activities. However, as a typical spatio-temporal data type, the complex temporal nonlinearity and spatial dependencies in meteorological data greatly increase the difficulty of forecasting. This paper proposes a neural network model, STED (Spatio-Temporal Data Encoder-Decoder), based on an encoder-decoder architecture, which effectively handles the temporal dynamics of long time series and high-precision spatial dependencies. STED consists of three modules: a spatial encoder-decoder, a temporal encoder-decoder, and a predictor. The spatial encoder-decoder extracts spatial features, the temporal encoder-decoder extracts temporal features, and the predictor is used for forecasting. Experimental results show that STED performs similarly to current state-of-the-art (SoTA) spatio-temporal forecasting models in short-term temperature prediction tasks, but significantly outperforms other models in medium- and long-term temperature prediction tasks. Additionally, this paper compares different spatial encoder-decoders for forecasting tasks with varying node scales. The experimental results demonstrate that, for small-scale node tasks, the spatial encoder-decoder based on multilayer perceptrons achieves good accuracy and efficiency. In contrast, for large-scale node tasks, the spatial encoder-decoder based on convolutional neural networks exhibits superior performance.
气象数据与每个人的日常生活密切相关,准确的天气预报对许多社会经济活动至关重要。然而,气象数据作为一种典型的时空数据类型,其复杂的时间非线性和空间依赖性极大地增加了预测的难度。本文提出了一种基于编码器-解码器结构的神经网络模型——时空数据编码器-解码器(spatial - temporal Data Encoder-Decoder),该模型能有效地处理长时间序列的时间动态和高精度的空间依赖关系。STED由三个模块组成:空间编码器-解码器,时间编码器-解码器和预测器。空间编解码器提取空间特征,时间编解码器提取时间特征,预测器用于预测。实验结果表明,STED在短期温度预测任务中的表现与当前SoTA时空预测模型相似,但在中长期温度预测任务中表现明显优于其他模型。此外,本文还比较了不同空间编码器在不同节点尺度下的预测任务。实验结果表明,对于小规模节点任务,基于多层感知器的空间编解码器具有良好的精度和效率。相比之下,对于大规模节点任务,基于卷积神经网络的空间编解码器表现出优越的性能。
{"title":"STED: An encoder-decoder architecture for long-term spatio-temporal weather forecasting","authors":"Haoran Gong,&nbsp;Lei Lei,&nbsp;Shan Ma,&nbsp;Chunyu Qiu","doi":"10.1016/j.bdr.2025.100568","DOIUrl":"10.1016/j.bdr.2025.100568","url":null,"abstract":"<div><div>Meteorological data is closely related to everyone's daily life, and accurate weather forecasting is crucial for many socio-economic activities. However, as a typical spatio-temporal data type, the complex temporal nonlinearity and spatial dependencies in meteorological data greatly increase the difficulty of forecasting. This paper proposes a neural network model, STED (Spatio-Temporal Data Encoder-Decoder), based on an encoder-decoder architecture, which effectively handles the temporal dynamics of long time series and high-precision spatial dependencies. STED consists of three modules: a spatial encoder-decoder, a temporal encoder-decoder, and a predictor. The spatial encoder-decoder extracts spatial features, the temporal encoder-decoder extracts temporal features, and the predictor is used for forecasting. Experimental results show that STED performs similarly to current state-of-the-art (SoTA) spatio-temporal forecasting models in short-term temperature prediction tasks, but significantly outperforms other models in medium- and long-term temperature prediction tasks. Additionally, this paper compares different spatial encoder-decoders for forecasting tasks with varying node scales. The experimental results demonstrate that, for small-scale node tasks, the spatial encoder-decoder based on multilayer perceptrons achieves good accuracy and efficiency. In contrast, for large-scale node tasks, the spatial encoder-decoder based on convolutional neural networks exhibits superior performance.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100568"},"PeriodicalIF":4.2,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145364825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive spectral GNN and frequency enhanced self-attention for traffic forecasting 自适应频谱GNN和频率增强自关注交通预测
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-10-09 DOI: 10.1016/j.bdr.2025.100567
Yongpeng Yang , Zhenzhen Yang
In intelligent city, traffic forecasting has played a significant role in intelligent transportation system. Nowadays, many methods, which combine spectral graph neural network and self-attention, are proposed. However, they still have some limitations for traffic forecasting: 1) The polynomial basis of traditional spectral graph neural networks (GNN) is fixed, which limits their ability to learn spatial dependency of traffic data. 2) Some GNNs ignore the dynamic dependency of traffic data. 3) Traditional self-attention suffers from limited perception for long-term information, time delay, and global information. These defaults pose big challenge for traffic forecasting via limiting their ability of capturing spatial-temporal dependency, dynamic and heterogeneous nature in traffic data. From this perspective, we propose an adaptive spectral GNN and frequency enhanced self-attention (ASGFES) for traffic forecasting, which can effectively capture the spatial-temporal dependency, dynamic and heterogeneous nature in traffic data. Specifically, we first introduce an adaptive spectral graph neural network (ASGNN) for effectively capturing the spatial dependency via conducting adaptive polynomial basis. In addition, two dynamic long and short range attentive graphs are fed into the ASGNN for emphasizing the dynamicity in view of long and short range. Secondly, we introduce a normalized self-attention with damped exponential moving average (NSADEMA). Specifically, the normalized self-attention (NSA) can capture the necessary expressivity to learn all-pair interactions without the need for some extra operation such as positional encodings, multi-head operations, and so on. It can well obtain the temporal dependency and heterogeneity of traffic data. In addition, the DEMA, which is equipped into NSA, can enhance the perception for the inductive bias of traffic data in time domain. It can be aware of the time delay of traffic data. Thirdly, linear frequency learner with time-series decomposition (LFLTD) are developed for enhancing the ability of capturing the temporal dependency and heterogeneity. Specifically, time-series decomposition (TSD) facilitates the analysis and forecasting of complex time via capturing various hidden components such as the trend and seasonal components. Meanwhile, linear frequency learner (LFL) can learn global dependencies and concentrating on important part of frequency components with compact signal energy. At last, many experiments are performed on several public traffic datasets and demonstrate the proposed ASGFES can achieve better performance than other traffic forecasting methods.
在智慧城市中,交通预测在智能交通系统中起着重要的作用。目前,人们提出了许多将谱图神经网络与自关注相结合的方法。传统谱图神经网络(GNN)的多项式基是固定的,这限制了其学习交通数据空间依赖性的能力。2)部分gnn忽略了交通数据的动态依赖性。3)传统的自我注意存在对长时信息、时滞信息和全局信息感知有限的问题。这些默认值限制了它们捕捉交通数据的时空依赖性、动态性和异质性的能力,给交通预测带来了很大的挑战。为此,本文提出了一种基于自适应频谱GNN和频率增强自关注(ASGFES)的交通预测方法,该方法能够有效地捕捉交通数据的时空依赖性、动态性和异质性。具体来说,我们首先引入了一种自适应谱图神经网络(ASGNN),通过自适应多项式基有效地捕获空间依赖性。此外,在ASGNN中输入了两个动态的长程和短程关注图,以强调长程和短程的动态性。其次,我们引入了带阻尼指数移动平均的归一化自注意。具体来说,规范化自注意(NSA)可以捕获学习全对交互所需的表达能力,而不需要一些额外的操作,如位置编码、多头操作等。它可以很好地获得交通数据的时间依赖性和异质性。此外,将DEMA集成到NSA中,可以增强对交通数据在时域上的感应偏置的感知。它可以感知交通数据的时间延迟。第三,提出了基于时间序列分解的线性频率学习器(LFLTD),增强了捕获时间依赖性和异质性的能力。具体而言,时间序列分解(TSD)通过捕获各种隐藏成分,如趋势和季节成分,促进了复杂时间的分析和预测。同时,线性频率学习器(LFL)可以学习全局依赖关系,并以紧凑的信号能量集中在频率成分的重要部分。最后,在多个公共交通数据集上进行了大量实验,验证了该算法的性能优于其他交通预测方法。
{"title":"Adaptive spectral GNN and frequency enhanced self-attention for traffic forecasting","authors":"Yongpeng Yang ,&nbsp;Zhenzhen Yang","doi":"10.1016/j.bdr.2025.100567","DOIUrl":"10.1016/j.bdr.2025.100567","url":null,"abstract":"<div><div>In intelligent city, traffic forecasting has played a significant role in intelligent transportation system. Nowadays, many methods, which combine spectral graph neural network and self-attention, are proposed. However, they still have some limitations for traffic forecasting: 1) The polynomial basis of traditional spectral graph neural networks (GNN) is fixed, which limits their ability to learn spatial dependency of traffic data. 2) Some GNNs ignore the dynamic dependency of traffic data. 3) Traditional self-attention suffers from limited perception for long-term information, time delay, and global information. These defaults pose big challenge for traffic forecasting via limiting their ability of capturing spatial-temporal dependency, dynamic and heterogeneous nature in traffic data. From this perspective, we propose an adaptive spectral GNN and frequency enhanced self-attention (ASGFES) for traffic forecasting, which can effectively capture the spatial-temporal dependency, dynamic and heterogeneous nature in traffic data. Specifically, we first introduce an adaptive spectral graph neural network (ASGNN) for effectively capturing the spatial dependency via conducting adaptive polynomial basis. In addition, two dynamic long and short range attentive graphs are fed into the ASGNN for emphasizing the dynamicity in view of long and short range. Secondly, we introduce a normalized self-attention with damped exponential moving average (NSADEMA). Specifically, the normalized self-attention (NSA) can capture the necessary expressivity to learn all-pair interactions without the need for some extra operation such as positional encodings, multi-head operations, and so on. It can well obtain the temporal dependency and heterogeneity of traffic data. In addition, the DEMA, which is equipped into NSA, can enhance the perception for the inductive bias of traffic data in time domain. It can be aware of the time delay of traffic data. Thirdly, linear frequency learner with time-series decomposition (LFLTD) are developed for enhancing the ability of capturing the temporal dependency and heterogeneity. Specifically, time-series decomposition (TSD) facilitates the analysis and forecasting of complex time via capturing various hidden components such as the trend and seasonal components. Meanwhile, linear frequency learner (LFL) can learn global dependencies and concentrating on important part of frequency components with compact signal energy. At last, many experiments are performed on several public traffic datasets and demonstrate the proposed ASGFES can achieve better performance than other traffic forecasting methods.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"42 ","pages":"Article 100567"},"PeriodicalIF":4.2,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decentralized metaheuristic approach to feature selection inspired by social interactions within a societal framework, for handling datasets of diverse sizes 一种分散的元启发式方法,以社会框架内的社会互动为灵感,用于处理不同规模的数据集
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-28 DOI: 10.1016/j.bdr.2025.100556
Sobia Tariq Javed , Kashif Zafar , Irfan Younas
The rapid advancement of technology has led to the generation of big data. This vast and diverse data can uncover valuable patterns and yield promising results when effectively mined, processed, and analyzed. However, it also introduces the “curse of dimensionality,” which can negatively impact the performance of machine learning models. Feature Selection (FS) is a data preprocessing technique aimed at identifying the optimal feature set to enhance model efficiency and reduce processing time. Numerous metaheuristic wrapper-based FS techniques have been explored in the literature. However, a significant drawback of many of these algorithms is their dependence on centralized learning, where the global best solution drives the search direction. This centralized approach is risky, as any error by the global best can hinder the exploration and exploitation of other potential areas, leading to inaccuracies in discovering the true global optimum. In this paper, the binary variant of a novel decentralized metaheuristic Kids Learning Optimization Algorithm (KLO) called Binary Kids Learning Optimization Algorithm (BKLO) is proposed for optimal feature selection for classification purposes in wrapper mode. The continuous solutions of KLO are converted to binary space by using the transfer function. A comparison is provided between the two transfer functions: hyperbolic tan (V-shaped) and the Sigmoidal (S-shaped) transfer functions. BKLO is compared with seven state-of-the-art algorithms. The performance of algorithms is evaluated and compared using several assessment indicators over fifteen benchmark datasets with a wide range of dimensions (small, medium, and large) from the University of California Irvine (UCI) repository and Arizona State University. The superiority of BKLO in reducing the number of features with increased classification accuracy over the other competing algorithms is demonstrated through the experiments and Friedman's Mean Rank (FMR) statistical tests.
科技的飞速发展导致了大数据的产生。这些庞大而多样的数据可以发现有价值的模式,并在有效地挖掘、处理和分析时产生有希望的结果。然而,它也引入了“维度诅咒”,这可能会对机器学习模型的性能产生负面影响。特征选择(FS)是一种旨在识别最优特征集以提高模型效率和减少处理时间的数据预处理技术。许多基于元启发式包装的FS技术已经在文献中进行了探索。然而,许多这些算法的一个重大缺点是它们依赖于集中学习,其中全局最优解驱动搜索方向。这种集中的方法是有风险的,因为全局最优的任何错误都可能阻碍对其他潜在区域的探索和开发,从而导致发现真正的全局最优的不准确性。本文提出了一种新的去中心化元启发式儿童学习优化算法(KLO)的二进制变体,称为二进制儿童学习优化算法(BKLO),用于在包装器模式下进行分类目的的最优特征选择。利用传递函数将KLO的连续解转换为二进制空间。比较了两种传递函数:双曲tan (v形)和s形(s形)传递函数。BKLO与7种最先进的算法进行了比较。算法的性能通过来自加州大学欧文分校(UCI)存储库和亚利桑那州立大学的15个具有广泛维度(小、中、大)的基准数据集的几个评估指标进行评估和比较。通过实验和Friedman's Mean Rank (FMR)统计检验,证明了BKLO在减少特征数量和提高分类精度方面优于其他竞争算法。
{"title":"A decentralized metaheuristic approach to feature selection inspired by social interactions within a societal framework, for handling datasets of diverse sizes","authors":"Sobia Tariq Javed ,&nbsp;Kashif Zafar ,&nbsp;Irfan Younas","doi":"10.1016/j.bdr.2025.100556","DOIUrl":"10.1016/j.bdr.2025.100556","url":null,"abstract":"<div><div>The rapid advancement of technology has led to the generation of big data. This vast and diverse data can uncover valuable patterns and yield promising results when effectively mined, processed, and analyzed. However, it also introduces the “curse of dimensionality,” which can negatively impact the performance of machine learning models. Feature Selection (FS) is a data preprocessing technique aimed at identifying the optimal feature set to enhance model efficiency and reduce processing time. Numerous metaheuristic wrapper-based FS techniques have been explored in the literature. However, a significant drawback of many of these algorithms is their dependence on centralized learning, where the global best solution drives the search direction. This centralized approach is risky, as any error by the global best can hinder the exploration and exploitation of other potential areas, leading to inaccuracies in discovering the true global optimum. In this paper, the binary variant of a novel decentralized metaheuristic Kids Learning Optimization Algorithm (KLO) called <strong>Binary Kids Learning Optimization Algorithm (BKLO)</strong> is proposed for optimal feature selection for classification purposes in wrapper mode. The continuous solutions of KLO are converted to binary space by using the transfer function. A comparison is provided between the two transfer functions: hyperbolic tan (V-shaped) and the Sigmoidal (S-shaped) transfer functions. BKLO is compared with seven state-of-the-art algorithms. The performance of algorithms is evaluated and compared using several assessment indicators over fifteen benchmark datasets with a wide range of dimensions (small, medium, and large) from the University of California Irvine (UCI) repository and Arizona State University. The superiority of BKLO in reducing the number of features with increased classification accuracy over the other competing algorithms is demonstrated through the experiments and Friedman's Mean Rank (FMR) statistical tests.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100556"},"PeriodicalIF":4.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144903932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compression of big data collected in wind farm based on tensor train decomposition 基于张量列分解的风电场大数据压缩
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-20 DOI: 10.1016/j.bdr.2025.100554
Keren Li , Wenqiang Zhang , Dandan Xiao , Peng Hou , Shuai Yan , Yang Wang , Xuerui Mao
To address the storage challenges stemming from large volumes of heterogeneous data in wind farms, we propose a data compression technique based on tensor train decomposition (TTD). Initially, we establish a tensor-based processing model to standardize the heterogeneous data originating from wind farms, which includes both structured SCADA (supervisory control and data acquisition) data and unstructured video and picture data. Subsequently, we introduce a TTD-based method designed to compress the heterogeneous data generated in wind farms while preserving the inherent spatial eigenstructure of the data. Finally, we validate the efficacy of the proposed method in alleviating data storage challenges by utilizing authentic wind farm datasets. Comparative analysis reveals that the TTD-based method outperforms previously proposed compression techniques, specifically the canonical polyadic (CP) and Tucker methods.
为了解决风电场中大量异构数据带来的存储挑战,我们提出了一种基于张量列分解(TTD)的数据压缩技术。首先,我们建立了一个基于张量的处理模型来标准化来自风电场的异构数据,其中包括结构化SCADA(监控和数据采集)数据和非结构化视频和图像数据。随后,我们引入了一种基于ttd的方法,该方法旨在压缩风电场产生的异构数据,同时保留数据固有的空间特征结构。最后,我们利用真实的风电场数据集验证了所提出方法在缓解数据存储挑战方面的有效性。对比分析表明,基于ttd的方法优于先前提出的压缩技术,特别是规范多进(CP)和塔克方法。
{"title":"Compression of big data collected in wind farm based on tensor train decomposition","authors":"Keren Li ,&nbsp;Wenqiang Zhang ,&nbsp;Dandan Xiao ,&nbsp;Peng Hou ,&nbsp;Shuai Yan ,&nbsp;Yang Wang ,&nbsp;Xuerui Mao","doi":"10.1016/j.bdr.2025.100554","DOIUrl":"10.1016/j.bdr.2025.100554","url":null,"abstract":"<div><div>To address the storage challenges stemming from large volumes of heterogeneous data in wind farms, we propose a data compression technique based on tensor train decomposition (TTD). Initially, we establish a tensor-based processing model to standardize the heterogeneous data originating from wind farms, which includes both structured SCADA (supervisory control and data acquisition) data and unstructured video and picture data. Subsequently, we introduce a TTD-based method designed to compress the heterogeneous data generated in wind farms while preserving the inherent spatial eigenstructure of the data. Finally, we validate the efficacy of the proposed method in alleviating data storage challenges by utilizing authentic wind farm datasets. Comparative analysis reveals that the TTD-based method outperforms previously proposed compression techniques, specifically the canonical polyadic (CP) and Tucker methods.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100554"},"PeriodicalIF":4.2,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable malware detection through integrated graph reduction and learning techniques 可解释的恶意软件检测通过集成图约简和学习技术
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-08-19 DOI: 10.1016/j.bdr.2025.100555
Hesamodin Mohammadian, Griffin Higgins, Samuel Ansong, Roozbeh Razavi-Far, Ali A. Ghorbani
Recently, Control Flow Graphs and Function Call Graphs have gain attention in malware detection task due to their ability in representation the complex structural and functional behavior of programs. To better utilize these representations in malware detection and improve the detection performance, they have been paired with Graph Neural Networks (GNNs). However, the sheer size and complexity of these graph representation poses a significant challenge for researchers. At the same time, a simple binary classification provided by the GNN models is insufficient for malware analysts. To address these challenges, this paper integrates novel graph reduction techniques and GNN explainability in to a malware detection framework to enhance both efficiency and interpretability. Through our extensive evolution, we demonstrate that the proposed graph reduction technique significantly reduces the size and complexity of the input graphs, while maintaining the detection performance. Furthermore, the extracted important subgraphs using the GNNExplainer, provide better insights about the model's decision and help security experts with their further analysis.
近年来,控制流图和函数调用图由于能够表征程序复杂的结构和功能行为,在恶意软件检测任务中受到了广泛的关注。为了更好地利用这些表征在恶意软件检测中并提高检测性能,将它们与图神经网络(gnn)配对。然而,这些图形表示的规模和复杂性给研究人员带来了重大挑战。同时,GNN模型提供的简单的二值分类对于恶意软件分析来说是不够的。为了解决这些挑战,本文将新的图约简技术和GNN可解释性集成到恶意软件检测框架中,以提高效率和可解释性。通过我们广泛的进化,我们证明了所提出的图约简技术显着降低了输入图的大小和复杂性,同时保持了检测性能。此外,使用gninterpreter提取的重要子图提供了关于模型决策的更好的见解,并帮助安全专家进行进一步的分析。
{"title":"Explainable malware detection through integrated graph reduction and learning techniques","authors":"Hesamodin Mohammadian,&nbsp;Griffin Higgins,&nbsp;Samuel Ansong,&nbsp;Roozbeh Razavi-Far,&nbsp;Ali A. Ghorbani","doi":"10.1016/j.bdr.2025.100555","DOIUrl":"10.1016/j.bdr.2025.100555","url":null,"abstract":"<div><div>Recently, Control Flow Graphs and Function Call Graphs have gain attention in malware detection task due to their ability in representation the complex structural and functional behavior of programs. To better utilize these representations in malware detection and improve the detection performance, they have been paired with Graph Neural Networks (GNNs). However, the sheer size and complexity of these graph representation poses a significant challenge for researchers. At the same time, a simple binary classification provided by the GNN models is insufficient for malware analysts. To address these challenges, this paper integrates novel graph reduction techniques and GNN explainability in to a malware detection framework to enhance both efficiency and interpretability. Through our extensive evolution, we demonstrate that the proposed graph reduction technique significantly reduces the size and complexity of the input graphs, while maintaining the detection performance. Furthermore, the extracted important subgraphs using the GNNExplainer, provide better insights about the model's decision and help security experts with their further analysis.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100555"},"PeriodicalIF":4.2,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1