首页 > 最新文献

Applied Intelligence最新文献

英文 中文
A cutting-edge framework for industrial intrusion detection: Privacy-preserving, cost-friendly, and powered by federated learning
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-02 DOI: 10.1007/s10489-025-06404-6
Lingzi Zhu, Bo Zhao, Jiabao Guo, Minzhi Ji, Junru Peng

With the networking of industrially deployed facilities in distributed environments, industrial control systems (ICS) are facing an escalating number of attacks, emphasizing the criticality of intrusion detection systems. Currently, machine learning-based intrusion detection systems have been extensively researched. However, the sensitivity of ICS data poses a challenge of scarce labeled data for these systems. Additionally, distributed ICS necessitate privacy-preserving collaborative detection. To address these challenges, some solutions combining federated learning and transfer learning have been proposed. Nonetheless, these solutions often overlook the clustering characteristics of factory equipment and the constraints posed by limited computational and communication resources. Therefore, we propose GC-FADA, a chained cross-domain collaborative intrusion detection framework, to effectively address the interplay between labeled data scarcity, privacy protection, and resource constraints in ICS intrusion detection techniques. Firstly, GC-FADA used the adversarial domain adaptation scheme to train the local model to alleviate the performance limitation of intrusion detection model caused by labeled data scarcity. Then, to reduce the communication overhead between the nodes in the factory communication network and protect client privacy, GC-FADA utilizes the geographical clustering characteristics of the factory devices and proposes a FL-based grouped chain learning structure to achieve collaborative training. Finally, GC-FADA achieves privacy protection with low computational overhead by utilizing patterns from lightweight pseudo-random generators instead of complex cryptographic primitives. Extensive experiments conducted on real industrial SCADA datasets validate the effectiveness and rationality of the proposed approach, proving that GC-FADA outperforms major domain adaptation methods in terms of accuracy while reducing computation and communication costs. In the cross-domain learning task on the two data sets, the detection accuracy of our GC-FADA reaches 88.7% and 98.29% respectively, and the detection accuracy of various network attacks is mostly more than 90%.

{"title":"A cutting-edge framework for industrial intrusion detection: Privacy-preserving, cost-friendly, and powered by federated learning","authors":"Lingzi Zhu,&nbsp;Bo Zhao,&nbsp;Jiabao Guo,&nbsp;Minzhi Ji,&nbsp;Junru Peng","doi":"10.1007/s10489-025-06404-6","DOIUrl":"10.1007/s10489-025-06404-6","url":null,"abstract":"<div><p>With the networking of industrially deployed facilities in distributed environments, industrial control systems (ICS) are facing an escalating number of attacks, emphasizing the criticality of intrusion detection systems. Currently, machine learning-based intrusion detection systems have been extensively researched. However, the sensitivity of ICS data poses a challenge of scarce labeled data for these systems. Additionally, distributed ICS necessitate privacy-preserving collaborative detection. To address these challenges, some solutions combining federated learning and transfer learning have been proposed. Nonetheless, these solutions often overlook the clustering characteristics of factory equipment and the constraints posed by limited computational and communication resources. Therefore, we propose GC-FADA, a chained cross-domain collaborative intrusion detection framework, to effectively address the interplay between labeled data scarcity, privacy protection, and resource constraints in ICS intrusion detection techniques. Firstly, GC-FADA used the adversarial domain adaptation scheme to train the local model to alleviate the performance limitation of intrusion detection model caused by labeled data scarcity. Then, to reduce the communication overhead between the nodes in the factory communication network and protect client privacy, GC-FADA utilizes the geographical clustering characteristics of the factory devices and proposes a FL-based grouped chain learning structure to achieve collaborative training. Finally, GC-FADA achieves privacy protection with low computational overhead by utilizing patterns from lightweight pseudo-random generators instead of complex cryptographic primitives. Extensive experiments conducted on real industrial SCADA datasets validate the effectiveness and rationality of the proposed approach, proving that GC-FADA outperforms major domain adaptation methods in terms of accuracy while reducing computation and communication costs. In the cross-domain learning task on the two data sets, the detection accuracy of our GC-FADA reaches 88.7% and 98.29% respectively, and the detection accuracy of various network attacks is mostly more than 90%.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143749070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal feature adaptive fusion for anchor-free 3D object detection
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-02 DOI: 10.1007/s10489-025-06454-w
Yanli Wu, Junyin Wang, Hui Li, Xiaoxue Ai, Xiao Li

LiDAR and camera are two key sensors that provide mutually complementary information for 3D detection in autonomous driving. Existing multimodal detection methods often decorate the original point cloud data with camera features to complete the detection, ignoring the mutual fusion between camera features and point cloud features. In addition, ground points scanned by LiDAR in natural scenes usually interfere significantly with the detection results, and existing methods fail to address this problem effectively. We present a simple yet efficient anchor-free 3D object detection, which can better adapt to complex scenes through the adaptive fusion of multimodal features. First, we propose a fully convolutional bird’s-eye view reconstruction module to sense ground map geometry changes, for improving the interference of ground points on detection results. Second, a multimodal feature adaptive fusion module with local awareness is designed to improve the mutual fusion of camera and point cloud features. Finally, we introduce a scale-aware mini feature pyramid networks (Mini-FPN) that can directly regress 3D bounding boxes from the augmented dense feature maps, boosting the network’s ability to detect scale-varying objects, and we additionally construct a scene-adaptive single-stage 3D detector in an anchor-free manner. Extensive experiments on the KITTI and nuScenes datasets validate our method’s competitive performance.

{"title":"Multimodal feature adaptive fusion for anchor-free 3D object detection","authors":"Yanli Wu,&nbsp;Junyin Wang,&nbsp;Hui Li,&nbsp;Xiaoxue Ai,&nbsp;Xiao Li","doi":"10.1007/s10489-025-06454-w","DOIUrl":"10.1007/s10489-025-06454-w","url":null,"abstract":"<div><p>LiDAR and camera are two key sensors that provide mutually complementary information for 3D detection in autonomous driving. Existing multimodal detection methods often decorate the original point cloud data with camera features to complete the detection, ignoring the mutual fusion between camera features and point cloud features. In addition, ground points scanned by LiDAR in natural scenes usually interfere significantly with the detection results, and existing methods fail to address this problem effectively. We present a simple yet efficient anchor-free 3D object detection, which can better adapt to complex scenes through the adaptive fusion of multimodal features. First, we propose a fully convolutional bird’s-eye view reconstruction module to sense ground map geometry changes, for improving the interference of ground points on detection results. Second, a multimodal feature adaptive fusion module with local awareness is designed to improve the mutual fusion of camera and point cloud features. Finally, we introduce a scale-aware mini feature pyramid networks (Mini-FPN) that can directly regress 3D bounding boxes from the augmented dense feature maps, boosting the network’s ability to detect scale-varying objects, and we additionally construct a scene-adaptive single-stage 3D detector in an anchor-free manner. Extensive experiments on the KITTI and nuScenes datasets validate our method’s competitive performance.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143749071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SiamYOLOv8: a rapid conditional detection framework for one-shot object detection
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-01 DOI: 10.1007/s10489-025-06513-2
Matthieu Desmarescaux, Wissam Kaddah, Ayman Alfalou, Isabelle Badoc

Deep learning networks typically require vast amounts of labeled data for effective training. However, recent research has introduced a challenging task called One-Shot Object Detection, which addresses scenarios where certain classes are novel and unseen during training and represented by only a single labeled example. In this paper, we propose a novel One-Shot Object Detection model applicable to Conditional Detection without over-training on novel classes. Our approach leverages the strengths of YOLOv8 (You Only Look Once v8), a popular real-time object detector. Specifically, we incorporate a Siamese network and a matching module to enhance One-Shot Object Detection capabilities. Our proposed model, SiamYOLOv8, enables exploration of new applications without being limited by its training data. To evaluate the performance, we introduce a novel methodology for using the Retail Product Checkout (RPC) dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/RPC)”, and extend our evaluation using the Grozi-3.2k dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/GROZI-3.2k)”. In such contexts, new products often lack sufficient data for continuous Deep Learning methods, making individual case identification difficult. Our model outperforms SOTA models, achieving a significant performance improvement of 20.33% increase in Average Precision (+12.41 AP) on the Grozi-3.2k dataset and 25.68% increase (+17.37 AP) on the RPC dataset.

{"title":"SiamYOLOv8: a rapid conditional detection framework for one-shot object detection","authors":"Matthieu Desmarescaux,&nbsp;Wissam Kaddah,&nbsp;Ayman Alfalou,&nbsp;Isabelle Badoc","doi":"10.1007/s10489-025-06513-2","DOIUrl":"10.1007/s10489-025-06513-2","url":null,"abstract":"<div><p>Deep learning networks typically require vast amounts of labeled data for effective training. However, recent research has introduced a challenging task called One-Shot Object Detection, which addresses scenarios where certain classes are novel and unseen during training and represented by only a single labeled example. In this paper, we propose a novel One-Shot Object Detection model applicable to Conditional Detection without over-training on novel classes. Our approach leverages the strengths of YOLOv8 (You Only Look Once v8), a popular real-time object detector. Specifically, we incorporate a Siamese network and a matching module to enhance One-Shot Object Detection capabilities. Our proposed model, SiamYOLOv8, enables exploration of new applications without being limited by its training data. To evaluate the performance, we introduce a novel methodology for using the Retail Product Checkout (RPC) dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/RPC)”, and extend our evaluation using the Grozi-3.2k dataset “(https://github.com/MatD3mons/Conditional-Detection-datasets/tree/main/GROZI-3.2k)”. In such contexts, new products often lack sufficient data for continuous Deep Learning methods, making individual case identification difficult. Our model outperforms SOTA models, achieving a significant performance improvement of 20.33% increase in Average Precision (+12.41 AP) on the Grozi-3.2k dataset and 25.68% increase (+17.37 AP) on the RPC dataset.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual attention-guided distillation for class incremental semantic segmentation
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-01 DOI: 10.1007/s10489-025-06436-y
Pengju Xu, Yan Wang, Bingye Wang, Haiying Zhao

Class Incremental Semantic Segmentation (CISS) aims at segmenting the incremental new classes without losing the ability on old classes. Currently, some CISS methods based on feature knowledge distillation suffer from the stability-plasticity dilemma, i.e., excessive knowledge distillation may impede models from learning new classes. Besides, distilling without emphasis fails to preserve old knowledge effectively. To address these issues, a more fine-grained and focused approach to knowledge transfer, named dual attention-guided distillation (DAGD), is proposed for the CISS task. This approach not only ensures that the inherited knowledge is distilled in a targeted manner but also allows the model to adapt and learn new knowledge more efficiently. DAGD model contains a channel attention-guided distillation module and a spatial attention-guided distillation module. The former distills channel-wise attention maps to improve the knowledge transfer of essential channels while accommodating new knowledge learning. The latter encodes a weight coefficient map to highlight important regions in the spatial dimension, which further decouples old knowledge retention and new knowledge entry. Furthermore, a dynamic temperature strategy is introduced to facilitate logit knowledge distillation, specifically sharpening the predictive distribution produced by the output of the old model, thus achieving more accurate knowledge transfer. Extensive experimental results on Pascal VOC 2012 and ADE20K datasets demonstrate that our method achieves competitive results.

{"title":"Dual attention-guided distillation for class incremental semantic segmentation","authors":"Pengju Xu,&nbsp;Yan Wang,&nbsp;Bingye Wang,&nbsp;Haiying Zhao","doi":"10.1007/s10489-025-06436-y","DOIUrl":"10.1007/s10489-025-06436-y","url":null,"abstract":"<div><p>Class Incremental Semantic Segmentation (CISS) aims at segmenting the incremental new classes without losing the ability on old classes. Currently, some CISS methods based on feature knowledge distillation suffer from the stability-plasticity dilemma, <i>i.e.</i>, excessive knowledge distillation may impede models from learning new classes. Besides, distilling without emphasis fails to preserve old knowledge effectively. To address these issues, a more fine-grained and focused approach to knowledge transfer, named dual attention-guided distillation (DAGD), is proposed for the CISS task. This approach not only ensures that the inherited knowledge is distilled in a targeted manner but also allows the model to adapt and learn new knowledge more efficiently. DAGD model contains a channel attention-guided distillation module and a spatial attention-guided distillation module. The former distills channel-wise attention maps to improve the knowledge transfer of essential channels while accommodating new knowledge learning. The latter encodes a weight coefficient map to highlight important regions in the spatial dimension, which further decouples old knowledge retention and new knowledge entry. Furthermore, a dynamic temperature strategy is introduced to facilitate logit knowledge distillation, specifically sharpening the predictive distribution produced by the output of the old model, thus achieving more accurate knowledge transfer. Extensive experimental results on Pascal VOC 2012 and ADE20K datasets demonstrate that our method achieves competitive results.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive patch selection to improve Vision Transformers through Reinforcement Learning
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-01 DOI: 10.1007/s10489-025-06516-z
Francesco Cauteruccio, Michele Marchetti, Davide Traini, Domenico Ursino, Luca Virgili

In recent years, Transformers have revolutionized the management of Natural Language Processing tasks, and Vision Transformers (ViTs) promise to do the same for Computer Vision ones. However, the adoption of ViTs is hampered by their computational cost. Indeed, given an image divided into patches, it is necessary to compute for each layer the attention of each patch with respect to all the others. Researchers have proposed many solutions to reduce the computational cost of attention layers by adopting techniques such as quantization, knowledge distillation and manipulation of input images. In this paper, we aim to contribute to the solution of this problem. In particular, we propose a new framework, called AgentViT, which uses Reinforcement Learning to train an agent that selects the most important patches to improve the learning of a ViT. The goal of AgentViT is to reduce the number of patches processed by a ViT, and thus its computational load, while still maintaining competitive performance. We tested AgentViT on CIFAR10, FashionMNIST, and Imagenette(^+) (which is a subset of ImageNet) in the image classification task and obtained promising performance when compared to baseline ViTs and other related approaches available in the literature.

{"title":"Adaptive patch selection to improve Vision Transformers through Reinforcement Learning","authors":"Francesco Cauteruccio,&nbsp;Michele Marchetti,&nbsp;Davide Traini,&nbsp;Domenico Ursino,&nbsp;Luca Virgili","doi":"10.1007/s10489-025-06516-z","DOIUrl":"10.1007/s10489-025-06516-z","url":null,"abstract":"<div><p>In recent years, Transformers have revolutionized the management of Natural Language Processing tasks, and Vision Transformers (ViTs) promise to do the same for Computer Vision ones. However, the adoption of ViTs is hampered by their computational cost. Indeed, given an image divided into patches, it is necessary to compute for each layer the attention of each patch with respect to all the others. Researchers have proposed many solutions to reduce the computational cost of attention layers by adopting techniques such as quantization, knowledge distillation and manipulation of input images. In this paper, we aim to contribute to the solution of this problem. In particular, we propose a new framework, called AgentViT, which uses Reinforcement Learning to train an agent that selects the most important patches to improve the learning of a ViT. The goal of AgentViT is to reduce the number of patches processed by a ViT, and thus its computational load, while still maintaining competitive performance. We tested AgentViT on CIFAR10, FashionMNIST, and Imagenette<span>(^+)</span> (which is a subset of ImageNet) in the image classification task and obtained promising performance when compared to baseline ViTs and other related approaches available in the literature.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06516-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-04-01 DOI: 10.1007/s10489-025-06422-4
Jose Carlos Mondragon, Paula Branco, Guy-Vincent Jourdan, Andres Eduardo Gutierrez-Rodriguez, Rajesh Roshan Biswal

Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.

在全球范围内,网络攻击每月都在增长和变异。为应对这些威胁,开发了智能入侵网络检测系统来分析和检测异常流量。解决这一问题的一种方法是使用网络流(设备间通信的汇总版本)。网络流数据集用于训练人工智能(AI)模型,以对特定攻击进行分类。训练这些模型需要通常在实验室中合成的威胁样本,因为在运行网络中捕获这些样本是一项具有挑战性的任务。随着威胁的快速发展,新的网络流也在不断开发和共享。然而,在测试模型时,使用旧数据集仍然是一种流行的程序,这阻碍了对新攻击中最新解决方案的优势和机遇进行更全面的描述。此外,标准化基准的缺失也会导致算法所生成的模型之间无法进行很好的比较。为了弥补这些不足,我们提出了一个包含 14 个最新预处理数据集的基准,并研究了基于网络流的七类网络入侵检测算法。我们为研究人员提供了一个集中的预处理数据集源,便于下载。所有数据集还提供了训练、验证和测试分区,以便对现有解决方案和新解决方案进行直接、公平的比较。我们选择了最先进的公开算法,它们是各种方法的代表。我们使用这些算法的 Macro F1 分数进行了实验比较。我们的结果突出了每个模型在数据集场景下的运行情况,并为有竞争力的解决方案提供了指导。最后,我们讨论了模型和基准的主要特点,重点是对从业人员和研究人员的实际影响和建议。
{"title":"Advanced IDS: a comparative study of datasets and machine learning algorithms for network flow-based intrusion detection systems","authors":"Jose Carlos Mondragon,&nbsp;Paula Branco,&nbsp;Guy-Vincent Jourdan,&nbsp;Andres Eduardo Gutierrez-Rodriguez,&nbsp;Rajesh Roshan Biswal","doi":"10.1007/s10489-025-06422-4","DOIUrl":"10.1007/s10489-025-06422-4","url":null,"abstract":"<div><p>Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06422-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-31 DOI: 10.1007/s10489-025-06410-8
Hristo Petkov, Calum MacLellan, Feng Dong

Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.

了解数据变量之间的因果关系可以为构建表格数据集提供重要的启示。现有的大多数因果关系学习方法通常侧重于应用单一的可识别因果模型,如加性噪声模型(ANM)或线性非高斯循环模型(LiNGAM),来发现观测数据中表现出的依赖关系。我们对这种方法进行了改进,引入了一种新颖的双步骤框架,能够在多种因果模型假设下执行因果结构学习和表格数据综合。我们的方法使用有向无环图(DAG)来表示数据变量之间的因果关系。通过应用各种函数因果模型,包括 ANM、LiNGAM 和后非线性模型(PNL),我们隐式地学习了 DAG 的内容,从而模拟了观测数据的生成过程,有效地复制了真实的数据分布。理论分析解释了构成该框架目标函数的多个损失项。实验结果表明,在结构学习方面,DAGAF 优于许多现有方法,在现实世界和基准数据集上都取得了显著较低的结构汉明距离(SHD)分数(Sachs:47%;Child:11%、Hailfinder5%,Pathfinder:与最先进技术相比提高了 7%),同时还能生成多样化、高质量的样本。
{"title":"DAGAF: A directed acyclic generative adversarial framework for joint structure learning and tabular data synthesis","authors":"Hristo Petkov,&nbsp;Calum MacLellan,&nbsp;Feng Dong","doi":"10.1007/s10489-025-06410-8","DOIUrl":"10.1007/s10489-025-06410-8","url":null,"abstract":"<div><p>Understanding the causal relationships between data variables can provide crucial insights into the construction of tabular datasets. Most existing causality learning methods typically focus on applying a single identifiable causal model, such as the Additive Noise Model (ANM) or the Linear non-Gaussian Acyclic Model (LiNGAM), to discover the dependencies exhibited in observational data. We improve on this approach by introducing a novel dual-step framework capable of performing both causal structure learning and tabular data synthesis under multiple causal model assumptions. Our approach uses Directed Acyclic Graphs (DAG) to represent causal relationships among data variables. By applying various functional causal models including ANM, LiNGAM and the Post-Nonlinear model (PNL), we implicitly learn the contents of DAG to simulate the generative process of observational data, effectively replicating the real data distribution. This is supported by a theoretical analysis to explain the multiple loss terms comprising the objective function of the framework. Experimental results demonstrate that DAGAF outperforms many existing methods in structure learning, achieving significantly lower Structural Hamming Distance (SHD) scores across both real-world and benchmark datasets (Sachs: 47%, Child: 11%, Hailfinder: 5%, Pathfinder: 7% improvement compared to state-of-the-art), while being able to produce diverse, high-quality samples.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06410-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SentimentMapper: A framework for mapping of sentiments towards disaster response using social media data
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-31 DOI: 10.1007/s10489-025-06442-0
Tanu Gupta, Aman Rai, Sudip Roy

Social networking platforms have been generating a massive amount of data in real-time that can be analysed and used to support government and relief organizations in preparing quick and effective action plans for disaster response. Effective disaster response requires a broad understanding of disaster situations, such as the emergency necessities of the people, their sentiments towards emergency needs, and the geographical distribution of their requirements and opinions. However, in literature, many studies exist that estimate the emotions and sentiments of the people during a disaster; they are inept in identifying and mapping the public sentiments toward emergency needs. This paper proposes a framework called SentimentMapper. This framework quickly maps the sentiments of people toward emergency needs using social media data to plan for effective disaster response. In order to perform an automatic analysis of sentiments using Twitter (re-branded to X since July 2023) data, we introduce a BERT Convolutional Neural Network (BCNN). BCNN performs the sentiment analysis of the collected data from the disaster-affected people regarding essential needs like food, shelter, medical emergency, and rescue during different disasters. Next, we present a tweet-text independent approach to detect the location of the tweets posted on Twitter and discover the impacts in different areas due to any disaster event. Furthermore, we also study the variations in public attitudes about the essential needs during identical or different disasters. As a case study, the proposed framework has been used on the dataset collected from Twitter during the Assam flood 2021 in India and validated with the corresponding survey reports published by the government agency. The detailed results of the analytics in the proposed framework and its validation with the case study data confirm that it is capable of providing credible situational information quickly required for the disaster responses.

{"title":"SentimentMapper: A framework for mapping of sentiments towards disaster response using social media data","authors":"Tanu Gupta,&nbsp;Aman Rai,&nbsp;Sudip Roy","doi":"10.1007/s10489-025-06442-0","DOIUrl":"10.1007/s10489-025-06442-0","url":null,"abstract":"<div><p>Social networking platforms have been generating a massive amount of data in real-time that can be analysed and used to support government and relief organizations in preparing quick and effective action plans for disaster response. Effective disaster response requires a broad understanding of disaster situations, such as the emergency necessities of the people, their sentiments towards emergency needs, and the geographical distribution of their requirements and opinions. However, in literature, many studies exist that estimate the emotions and sentiments of the people during a disaster; they are inept in identifying and mapping the public sentiments toward emergency needs. This paper proposes a framework called <i>SentimentMapper</i>. This framework quickly maps the sentiments of people toward emergency needs using social media data to plan for effective disaster response. In order to perform an automatic analysis of sentiments using Twitter (re-branded to X since July 2023) data, we introduce a BERT Convolutional Neural Network (BCNN). BCNN performs the sentiment analysis of the collected data from the disaster-affected people regarding essential needs like food, shelter, medical emergency, and rescue during different disasters. Next, we present a tweet-text independent approach to detect the location of the tweets posted on Twitter and discover the impacts in different areas due to any disaster event. Furthermore, we also study the variations in public attitudes about the essential needs during identical or different disasters. As a case study, the proposed framework has been used on the dataset collected from Twitter during the Assam flood 2021 in India and validated with the corresponding survey reports published by the government agency. The detailed results of the analytics in the proposed framework and its validation with the case study data confirm that it is capable of providing credible situational information quickly required for the disaster responses.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards accurate post-training quantization for reparameterized models
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-31 DOI: 10.1007/s10489-025-06418-0
Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, Yangwei Ying, Hong Zhou

Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework consists of two core components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1% for 8-bit PTQ and 2% for 6-bit PTQ, showcasing its superior performance. The code is available at https://github.com/ilur98/DLMC-QUANT.

模型重新参数化是一种广为接受的技术,可在不影响性能的前提下提高推理速度。然而,当前的训练后量化(PTQ)方法在应用于重新参数化模型时,往往会导致准确度显著下降。这主要是由特定信道和特定样本的异常值造成的,这些异常值只出现在特定的样本和信道中,并影响量化参数的选择。为了解决这个问题,我们提出了 RepAPQ,这是一个新颖的框架,可以保持量化的重新参数化模型的准确性。与以往使用平均平方误差(MSE)作为衡量标准的框架不同,我们利用平均绝对误差(MAE)来减轻异常值对量化参数的影响。我们的框架由两个核心部分组成:量化保护重参数化和跨块校准。为实现有效校准,量化保护重参数化将多个分支与仿射层结合成一个卷积。在训练过程中,仿射层会加速收敛并放大卷积的输出,以更好地适应异常值样本。此外,跨块校准利用阶段输出的测量作为监督,以解决 MAE 带来的梯度问题,并增强层间与量化参数的相关性。综合实验证明了 RepAPQ 在各种模型和任务中的有效性。对于 8 位 PTQ 和 6 位 PTQ,我们的框架分别比以前的方法高出约 1%和 2%,显示了其卓越的性能。代码见 https://github.com/ilur98/DLMC-QUANT。
{"title":"Towards accurate post-training quantization for reparameterized models","authors":"Luoming Zhang,&nbsp;Yefei He,&nbsp;Wen Fei,&nbsp;Zhenyu Lou,&nbsp;Weijia Wu,&nbsp;Yangwei Ying,&nbsp;Hong Zhou","doi":"10.1007/s10489-025-06418-0","DOIUrl":"10.1007/s10489-025-06418-0","url":null,"abstract":"<div><p>Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework consists of two core components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1% for 8-bit PTQ and 2% for 6-bit PTQ, showcasing its superior performance. The code is available at https://github.com/ilur98/DLMC-QUANT.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging BiLSTM-GAT for enhanced stock market prediction: a dual-graph approach to portfolio optimization
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-31 DOI: 10.1007/s10489-025-06462-w
Xiaobin Lu, Josiah Poon, Matloob Khushi

Stock price prediction remains a critical challenge in financial research due to its potential to inform strategic decision-making. Existing approaches predominantly focus on two key tasks: (1) regression, which forecasts future stock prices, and (2) classification, which identifies trading signals such as buy, sell, or hold. However, the inherent limitations of financial data hinder effective model training, often leading to suboptimal performance. To mitigate this issue, prior studies have expanded datasets by aggregating historical data from multiple companies. This strategy, however, fails to account for the unique characteristics and interdependencies among individual stocks, thereby reducing predictive accuracy. To address these limitations, we propose a novel BiLSTM-GAT-AM model that integrates bidirectional long short-term memory (BiLSTM) networks with graph attention networks (GAT) and an attention mechanism (AM). Unlike conventional graph-based models that define edges based solely on technical or fundamental relationships, our approach employs a dual-graph structure: one graph captures technical similarities, while the other encodes fundamental industry relationships. These two representations are aligned through an attention mechanism, enabling the model to exploit both technical and fundamental insights for enhanced stock market predictions. We conduct extensive experiments, including ablation studies and comparative evaluations against baseline models. The results demonstrate that our model achieves superior predictive performance. Furthermore, leveraging the model’s forecasts, we construct an optimized portfolio and conduct backtesting on the test dataset. Empirical results indicate that our portfolio consistently outperforms both baseline models and the S&P 500 index, highlighting the effectiveness of our approach in stock market prediction and portfolio optimization.

{"title":"Leveraging BiLSTM-GAT for enhanced stock market prediction: a dual-graph approach to portfolio optimization","authors":"Xiaobin Lu,&nbsp;Josiah Poon,&nbsp;Matloob Khushi","doi":"10.1007/s10489-025-06462-w","DOIUrl":"10.1007/s10489-025-06462-w","url":null,"abstract":"<div><p>Stock price prediction remains a critical challenge in financial research due to its potential to inform strategic decision-making. Existing approaches predominantly focus on two key tasks: (1) regression, which forecasts future stock prices, and (2) classification, which identifies trading signals such as buy, sell, or hold. However, the inherent limitations of financial data hinder effective model training, often leading to suboptimal performance. To mitigate this issue, prior studies have expanded datasets by aggregating historical data from multiple companies. This strategy, however, fails to account for the unique characteristics and interdependencies among individual stocks, thereby reducing predictive accuracy. To address these limitations, we propose a novel BiLSTM-GAT-AM model that integrates bidirectional long short-term memory (BiLSTM) networks with graph attention networks (GAT) and an attention mechanism (AM). Unlike conventional graph-based models that define edges based solely on technical or fundamental relationships, our approach employs a dual-graph structure: one graph captures technical similarities, while the other encodes fundamental industry relationships. These two representations are aligned through an attention mechanism, enabling the model to exploit both technical and fundamental insights for enhanced stock market predictions. We conduct extensive experiments, including ablation studies and comparative evaluations against baseline models. The results demonstrate that our model achieves superior predictive performance. Furthermore, leveraging the model’s forecasts, we construct an optimized portfolio and conduct backtesting on the test dataset. Empirical results indicate that our portfolio consistently outperforms both baseline models and the S&amp;P 500 index, highlighting the effectiveness of our approach in stock market prediction and portfolio optimization.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06462-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1