首页 > 最新文献

Applied Intelligence最新文献

英文 中文
STVAD: A Spatio-temporal Coupled Based Transformer for Unsupervised Video Anomaly Detection STVAD:基于时空耦合的无监督视频异常检测变压器
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-16 DOI: 10.1007/s10489-025-07065-1
Huiyu Mu, Luhui Wang, Hongjian Yin, Yonggan Li, Lanxue Dang, Yang Liu, Xianyu Zuo

Video anomaly detection aims to detect incidental and sudden patterns of events in complex scenes, which plays an essential role for maintaining public security and averting potential risks. Unsupervised learning is particularly effective in solving the problem of insufficient annotation of video abnormal behaviour in low resource scenarios. However, due to the complex entangled factors contained in the appearance/motion features of videos, it remains a formidable task for unsupervised video anomaly detection to overcome the barrier of spatio-temporal coupling. Most related methods focus on constructing a compact normal manifold within a normality training dataset or tracking objects, which often neglect the distinct visual characteristics of various anomalies and lead to the static coupling bias. To unravel the spatio-temporal coupling mechanism, we present an unsupervised approach STVAD that focuses on efficient context modelling and elimination of the spatio-temporal factor gap in a novel encoder-decoder network. Additionally, we propose a new paradigm that integrates spatio-temporal dependencies using self-attention mechanisms to boost the discriminative capacity to identify human-related irregular events from surveillance video sequences. Extensive experiments validate the accuracy and computational effectiveness of our method on three challenging benchmarks: UCSD Pedestrian(Ped1 and Ped2), CUHK Avenue and ShanghaiTech datasets respectively. Notably, our method significantly exploits the spatio-temporal clues to estimate association and improves the detection accuracy in detecting anomaly behaviours.

视频异常检测旨在检测复杂场景中事件的偶然性和突发性,对维护公共安全、防范潜在风险具有重要作用。无监督学习对于解决低资源场景下视频异常行为标注不足的问题尤为有效。然而,由于视频的外观/运动特征中包含复杂的纠缠因素,克服时空耦合的障碍仍然是无监督视频异常检测的一项艰巨任务。大多数相关方法都侧重于在法向训练数据集中构造紧凑的法向流形或跟踪对象,往往忽略了各种异常的明显视觉特征,导致静态耦合偏差。为了揭示时空耦合机制,我们提出了一种无监督方法STVAD,该方法专注于有效的上下文建模和消除新型编码器-解码器网络中的时空因素差距。此外,我们提出了一种利用自注意机制整合时空依赖性的新范式,以提高从监控视频序列中识别与人类相关的不规则事件的判别能力。大量的实验验证了我们的方法在三个具有挑战性的基准上的准确性和计算效率:UCSD行人(Ped1和Ped2),中大大道和上海科技数据集。值得注意的是,我们的方法显著地利用了时空线索来估计关联,提高了异常行为检测的检测精度。
{"title":"STVAD: A Spatio-temporal Coupled Based Transformer for Unsupervised Video Anomaly Detection","authors":"Huiyu Mu,&nbsp;Luhui Wang,&nbsp;Hongjian Yin,&nbsp;Yonggan Li,&nbsp;Lanxue Dang,&nbsp;Yang Liu,&nbsp;Xianyu Zuo","doi":"10.1007/s10489-025-07065-1","DOIUrl":"10.1007/s10489-025-07065-1","url":null,"abstract":"<div>\u0000 \u0000 <p>Video anomaly detection aims to detect incidental and sudden patterns of events in complex scenes, which plays an essential role for maintaining public security and averting potential risks. Unsupervised learning is particularly effective in solving the problem of insufficient annotation of video abnormal behaviour in low resource scenarios. However, due to the complex entangled factors contained in the appearance/motion features of videos, it remains a formidable task for unsupervised video anomaly detection to overcome the barrier of spatio-temporal coupling. Most related methods focus on constructing a compact normal manifold within a normality training dataset or tracking objects, which often neglect the distinct visual characteristics of various anomalies and lead to the static coupling bias. To unravel the spatio-temporal coupling mechanism, we present an unsupervised approach STVAD that focuses on efficient context modelling and elimination of the spatio-temporal factor gap in a novel encoder-decoder network. Additionally, we propose a new paradigm that integrates spatio-temporal dependencies using self-attention mechanisms to boost the discriminative capacity to identify human-related irregular events from surveillance video sequences. Extensive experiments validate the accuracy and computational effectiveness of our method on three challenging benchmarks: UCSD Pedestrian(Ped1 and Ped2), CUHK Avenue and ShanghaiTech datasets respectively. Notably, our method significantly exploits the spatio-temporal clues to estimate association and improves the detection accuracy in detecting anomaly behaviours.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147339264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of transcoding from HEVC to VVC based on CU types and Motion Vector Map 基于CU类型和运动矢量图的HEVC到VVC转码检测
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-16 DOI: 10.1007/s10489-026-07134-z
Jiajian Lin, Jianlin Zhang, Shan Bian, Chuntao Wang

The application of new coding standards brings a wider range of applications as well as new video forensic challenges. In order to solve the transcoding detection problem under the new video standard H.266/Versatile Video Coding (H.266/VVC), we propose a new algorithm based on Coding Unit (CU) block type classification and Motion Vector Map (MVM) statistics to detect VVC transcoded videos. We first analyze the generation of HEVC/VVC transcoded video traces and define MVM graphs for constructing features. By calculating the average frequency of different CU partition types for intra-frame prediction frames and inter-frame prediction frames, they are connected and sent as distinguishing features to a support vector machine for classification. Experimental results show that the method has high recognition accuracy under the new coding standard. Specifically, the algorithm constructs a 51-dimensional feature vector by combining intra-frame and inter-frame features, capturing transcoding traces such as changes in CU partitioning and motion vector distributions. Extensive experiments on 1080p and 4K video datasets, across various bitrate combinations, quantization parameters, and Group of Pictures (GOP) structures, demonstrate an average detection accuracy of 96.81% on 1080p datasets, outperforming existing methods. The robustness of the proposed approach across diverse encoding configurations highlights its effectiveness for forensic analysis under the VVC standard.

新的编码标准的应用带来了更广泛的应用,也带来了新的视频取证挑战。为了解决新视频标准H.266/通用视频编码(H.266/VVC)下的转码检测问题,提出了一种基于编码单元(CU)块类型分类和运动矢量图(MVM)统计的VVC转码视频检测算法。我们首先分析了HEVC/VVC转编码视频轨迹的生成,并定义了用于构造特征的MVM图。通过计算帧内预测帧和帧间预测帧的不同CU划分类型的平均频率,将它们连接起来,作为区分特征发送给支持向量机进行分类。实验结果表明,该方法在新的编码标准下具有较高的识别准确率。具体而言,该算法通过结合帧内和帧间特征构建51维特征向量,捕捉CU划分变化和运动向量分布等转码轨迹。在1080p和4K视频数据集上进行了广泛的实验,跨越了各种比特率组合、量化参数和图像组(GOP)结构,结果表明,在1080p数据集上的平均检测准确率为96.81%,优于现有方法。所提出的方法在不同编码配置下的鲁棒性突出了其在VVC标准下的法医分析的有效性。
{"title":"Detection of transcoding from HEVC to VVC based on CU types and Motion Vector Map","authors":"Jiajian Lin,&nbsp;Jianlin Zhang,&nbsp;Shan Bian,&nbsp;Chuntao Wang","doi":"10.1007/s10489-026-07134-z","DOIUrl":"10.1007/s10489-026-07134-z","url":null,"abstract":"<div><p>The application of new coding standards brings a wider range of applications as well as new video forensic challenges. In order to solve the transcoding detection problem under the new video standard H.266/Versatile Video Coding (H.266/VVC), we propose a new algorithm based on Coding Unit (CU) block type classification and Motion Vector Map (MVM) statistics to detect VVC transcoded videos. We first analyze the generation of HEVC/VVC transcoded video traces and define MVM graphs for constructing features. By calculating the average frequency of different CU partition types for intra-frame prediction frames and inter-frame prediction frames, they are connected and sent as distinguishing features to a support vector machine for classification. Experimental results show that the method has high recognition accuracy under the new coding standard. Specifically, the algorithm constructs a 51-dimensional feature vector by combining intra-frame and inter-frame features, capturing transcoding traces such as changes in CU partitioning and motion vector distributions. Extensive experiments on 1080p and 4K video datasets, across various bitrate combinations, quantization parameters, and Group of Pictures (GOP) structures, demonstrate an average detection accuracy of 96.81% on 1080p datasets, outperforming existing methods. The robustness of the proposed approach across diverse encoding configurations highlights its effectiveness for forensic analysis under the VVC standard.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147339383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blunder prediction in chess 国际象棋中的失误预测
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-16 DOI: 10.1007/s10489-026-07131-2
Yarden Rokach, Bracha Shapira

The ability to predict blunders in chess plays a crucial role in improving players’ performance and enabling strategic decision-making. We introduce a novel, scalable, and personalized blunder prediction model for chess. Unlike prior work requiring a separate model per player, our unified architecture learns a collaborative user embedding space, allowing it to generalize weaknesses across players and new users. Our hybrid model, inspired by Deep Factorization Machines (DeepFM), fuses a frozen pre-trained CNN (for board embeddings) with dynamically learned user embeddings to model player-board interactions while still utilizing metadata about the state of the board and the user. We demonstrate that this latent ’blunder profile’ is a significantly more powerful predictor of error than a player’s explicit Elo rating. The system achieves state-of-the-art performance (0.801 AUC) on both immediate and non-immediate blunders, offering an efficient and data-sparse-friendly solution for personalized chess analysis. Ultimately, this approach demonstrates the practical viability of deep personalization in complex strategy games, facilitating highly efficient, user-centric learning environments.

预测棋局失误的能力在提高棋手的表现和制定战略决策方面起着至关重要的作用。我们为国际象棋引入了一种新颖的、可扩展的、个性化的错误预测模型。不像之前的工作需要每个玩家单独的模型,我们的统一架构学习一个协作的用户嵌入空间,允许它概括玩家和新用户的弱点。我们的混合模型受到深度分解机器(DeepFM)的启发,将冷冻的预训练CNN(用于棋盘嵌入)与动态学习的用户嵌入融合在一起,以模拟玩家与棋盘的交互,同时仍然利用关于棋盘和用户状态的元数据。我们证明,这种潜在的“失误档案”比玩家明确的Elo评级更能预测失误。该系统在即时和非即时错误上都达到了最先进的性能(0.801 AUC),为个性化国际象棋分析提供了高效且数据稀疏友好的解决方案。最终,这种方法证明了深度个性化在复杂策略游戏中的实际可行性,促进了高效、以用户为中心的学习环境。
{"title":"Blunder prediction in chess","authors":"Yarden Rokach,&nbsp;Bracha Shapira","doi":"10.1007/s10489-026-07131-2","DOIUrl":"10.1007/s10489-026-07131-2","url":null,"abstract":"<div>\u0000 \u0000 <p>The ability to predict blunders in chess plays a crucial role in improving players’ performance and enabling strategic decision-making. We introduce a novel, scalable, and personalized blunder prediction model for chess. Unlike prior work requiring a separate model per player, our unified architecture learns a collaborative user embedding space, allowing it to generalize weaknesses across players and new users. Our hybrid model, inspired by Deep Factorization Machines (DeepFM), fuses a frozen pre-trained CNN (for board embeddings) with dynamically learned user embeddings to model player-board interactions while still utilizing metadata about the state of the board and the user. We demonstrate that this latent ’blunder profile’ is a significantly more powerful predictor of error than a player’s explicit Elo rating. The system achieves state-of-the-art performance (0.801 AUC) on both immediate and non-immediate blunders, offering an efficient and data-sparse-friendly solution for personalized chess analysis. Ultimately, this approach demonstrates the practical viability of deep personalization in complex strategy games, facilitating highly efficient, user-centric learning environments.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 4","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-026-07131-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147339314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal hand feature recognition base on modality-related feature interaction 基于模态相关特征交互的多模态手部特征识别
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-14 DOI: 10.1007/s10489-026-07128-x
Huabin Wang, Li Zhang, Shicheng Wei, Fei Liu, Liang Tao

Existing approaches for multimodal hand feature fusion typically involve extracting features from different modalities independently and subsequently merging them directly. However, these methods overlook potential correlations between modalities, leading to limitations in effectively leveraging complementary information across modalities. Therefore, this paper proposes a multimodal hand feature fusion method based on modality-related feature interaction (MR-FIFM). Firstly, a specific-shared feature extraction network(SSFEN) is constructed, preserving the integrity of features of each modality while enhancing the correlations between modalities. Secondly, a class-constrained multimodal metric loss is proposed to avoid the gradient vanishing problem and enhance the category representation capability of features. Finally, a feature interaction fusion module(FIFM) is proposed, allowing the model to simultaneously focus on features at different positions within each modality to enhance intra-modality features. Simultaneously, the features between the two modalities are interacted, enabling the model to dynamically adjust its own feature representation based on the features of the other modality. Experimental results on public datasets, including palmprint + palm-vein (PolyU_Palmprint+PolyU_NIR, PolyU_Blue+PolyU_NIR), and fingerprint + finger-vein (NUPT_FP+NUPT_FV), demonstrate that the proposed method outperforms existing multimodal fusion recognition methods for hand features.

现有的多模态手部特征融合方法通常是独立提取不同模态的特征,然后直接合并。然而,这些方法忽略了模式之间的潜在相关性,导致有效利用模式之间的互补信息的局限性。为此,本文提出了一种基于模态相关特征交互(MR-FIFM)的多模态手部特征融合方法。首先,构建特定共享特征提取网络(SSFEN),在保持各模态特征完整性的同时增强模态之间的相关性;其次,提出了类约束的多模态度量损失,避免了梯度消失问题,增强了特征的类别表示能力;最后,提出了特征交互融合模块(FIFM),允许模型同时关注每个模态内不同位置的特征,以增强模态内特征。同时,两种模态之间的特征相互作用,使模型能够根据另一种模态的特征动态调整自己的特征表示。在掌纹+掌静脉(poly_palmprint + poly_nir, poly_blue + poly_nir)和指纹+指静脉(NUPT_FP+NUPT_FV)等公开数据集上的实验结果表明,该方法优于现有的手部特征多模态融合识别方法。
{"title":"Multimodal hand feature recognition base on modality-related feature interaction","authors":"Huabin Wang,&nbsp;Li Zhang,&nbsp;Shicheng Wei,&nbsp;Fei Liu,&nbsp;Liang Tao","doi":"10.1007/s10489-026-07128-x","DOIUrl":"10.1007/s10489-026-07128-x","url":null,"abstract":"<div><p>Existing approaches for multimodal hand feature fusion typically involve extracting features from different modalities independently and subsequently merging them directly. However, these methods overlook potential correlations between modalities, leading to limitations in effectively leveraging complementary information across modalities. Therefore, this paper proposes a multimodal hand feature fusion method based on modality-related feature interaction (MR-FIFM). Firstly, a specific-shared feature extraction network(SSFEN) is constructed, preserving the integrity of features of each modality while enhancing the correlations between modalities. Secondly, a class-constrained multimodal metric loss is proposed to avoid the gradient vanishing problem and enhance the category representation capability of features. Finally, a feature interaction fusion module(FIFM) is proposed, allowing the model to simultaneously focus on features at different positions within each modality to enhance intra-modality features. Simultaneously, the features between the two modalities are interacted, enabling the model to dynamically adjust its own feature representation based on the features of the other modality. Experimental results on public datasets, including palmprint + palm-vein (PolyU_Palmprint+PolyU_NIR, PolyU_Blue+PolyU_NIR), and fingerprint + finger-vein (NUPT_FP+NUPT_FV), demonstrate that the proposed method outperforms existing multimodal fusion recognition methods for hand features.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147339132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stream-DINO: exploring DETR-based online object detection with streaming perception 流- dino:探索基于der的在线对象检测与流感知
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-14 DOI: 10.1007/s10489-026-07139-8
Man Zhang, Yongqiang Zhang, Rui Tian, Yin Zhang, Zian Zhang, Jinwei Sun

Online object detection is a crucial task related to the safety of autonomous driving during dynamic movement. While previous researches have achieved a significant breakthrough in low-latency detection by pursuing higher FPS, there has not been a clear analysis between detection speed and online object detection performance, particularly for non-real-time detectors (e.g. DETR-based detectors). To address this issue, we propose a metric named fAP to select a future frame as the matching target for the current frame according to the FPS of the detector. Through this new metric, we reduce the bias of detection performance evaluation caused by input frame inconsistency and verify the potential of the object detection model based on transformer architecture for the online object detection task. To this end, we develop a novel end-to-end object detector in conjunction with streaming perception, named Stream-DINO, which has better online object detection capability than other transformer-based models. Specifically, a Feature Encoder Acceleration Module is proposed to reduce the computation cost, thus make it focus on small target objects in each frame. We also introduce an Associated Frame Loss to utilize the object information captured in the associated frame to supervise the detection results and obtain accurate location of target objects. Notably, Stream-DINO is the first transformer-based online object detection method, and we achieve SOTA performance ( 19.9% in fAP and 18.6% in sAP) on Argoverse-HD dataset, outperforming the baseline (DINO) by 2.2% in fAP and 3.0% in sAP respectively.

动态运动过程中的在线目标检测是关系到自动驾驶安全的一项关键任务。虽然以前的研究通过追求更高的FPS在低延迟检测方面取得了重大突破,但检测速度和在线目标检测性能之间并没有明确的分析,特别是对于非实时检测器(例如基于der的检测器)。为了解决这个问题,我们提出了一个名为fAP的度量,根据检测器的FPS选择未来帧作为当前帧的匹配目标。通过这一新的度量,减少了由于输入帧不一致导致的检测性能评估偏差,验证了基于变压器结构的目标检测模型在在线目标检测任务中的潜力。为此,我们开发了一种结合流感知的新型端到端目标检测器,命名为Stream-DINO,它比其他基于变压器的模型具有更好的在线目标检测能力。具体而言,提出了Feature Encoder加速模块,降低计算成本,使其专注于每帧中的小目标对象。我们还引入了关联帧损失,利用关联帧中捕获的目标信息来监督检测结果,从而获得目标物体的准确位置。值得注意的是,Stream-DINO是第一个基于变压器的在线目标检测方法,我们在argose - hd数据集上实现了SOTA性能(fAP为19.9%,sAP为18.6%),在fAP和sAP中分别比基线(DINO)高出2.2%和3.0%。
{"title":"Stream-DINO: exploring DETR-based online object detection with streaming perception","authors":"Man Zhang,&nbsp;Yongqiang Zhang,&nbsp;Rui Tian,&nbsp;Yin Zhang,&nbsp;Zian Zhang,&nbsp;Jinwei Sun","doi":"10.1007/s10489-026-07139-8","DOIUrl":"10.1007/s10489-026-07139-8","url":null,"abstract":"<div>\u0000 \u0000 <p>Online object detection is a crucial task related to the safety of autonomous driving during dynamic movement. While previous researches have achieved a significant breakthrough in low-latency detection by pursuing higher FPS, there has not been a clear analysis between detection speed and online object detection performance, particularly for non-real-time detectors (<i>e.g.</i> DETR-based detectors). To address this issue, we propose a metric named fAP to select a future frame as the matching target for the current frame according to the FPS of the detector. Through this new metric, we reduce the bias of detection performance evaluation caused by input frame inconsistency and verify the potential of the object detection model based on transformer architecture for the online object detection task. To this end, we develop a novel end-to-end object detector in conjunction with streaming perception, named Stream-DINO, which has better online object detection capability than other transformer-based models. Specifically, a Feature Encoder Acceleration Module is proposed to reduce the computation cost, thus make it focus on small target objects in each frame. We also introduce an Associated Frame Loss to utilize the object information captured in the associated frame to supervise the detection results and obtain accurate location of target objects. Notably, Stream-DINO is the first transformer-based online object detection method, and we achieve SOTA performance ( 19.9% in fAP and 18.6% in sAP) on Argoverse-HD dataset, outperforming the baseline (DINO) by 2.2% in fAP and 3.0% in sAP respectively.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FinBERT-BiLSTM: a deep learning model for predicting volatile cryptocurrency market prices using market sentiment dynamics FinBERT-BiLSTM:一种深度学习模型,用于使用市场情绪动态预测波动的加密货币市场价格
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-14 DOI: 10.1007/s10489-026-07086-4
Mabsur Fatin Bin Hossain, Lubna Zahan Lamia, Md Mahmudur Rahman, Md Mosaddek Khan

Time series forecasting is a key tool in financial markets, helping to predict asset prices and guide investment decisions. In highly volatile markets like Bitcoin (BTC) and Ethereum (ETH), forecasting is challenged by sharp price swings driven by sentiment, technological shifts, and regulatory changes. Traditionally, forecasting relied on statistical methods, but as markets became more complex, deep learning models like LSTM, BiLSTM, and the newer FinBERT-LSTM emerged to capture intricate patterns. Building upon these advancements and addressing the volatility inherent in cryptocurrency markets, we propose a novel hybrid model that combines Bidirectional Long Short-Term Memory (BiLSTM) networks with FinBERT. In volatile markets like cryptocurrencies, where price movements reflect both immediate reactions and delayed responses to sentiment, the FinBERT-BiLSTM model offers a clear advantage over unidirectional approaches. We evaluate the model against traditional baselines (LSTM, BiLSTM), sentiment-aware variants (FinBERT-LSTM, FinBERT-TCN), and state-of-the-art models, including Transformer-based (Informer, TCN, TFT) and GPT-based (TimeGPT) architectures. These evaluations span short-term (intra-day and one-day-ahead) and long-term (30-day-ahead) forecasting horizons, augmented with a realistic trading simulation. Experimental results show that FinBERT-BiLSTM achieves low Mean Absolute Percentage Error values across all horizons—2.03% (BTC) and 2.52% (ETH) for intra-day, 2.20% (BTC) and 2.77% (ETH) for one-day-ahead, and 1.62% (BTC) and 5.10% (ETH) for 30-day-ahead prediction. Moreover, it outperforms all competing models in simulated trading profitability, demonstrating both statistical and practical value for sentiment-informed cryptocurrency forecasting.

时间序列预测是金融市场的关键工具,有助于预测资产价格和指导投资决策。在比特币(BTC)和以太坊(ETH)等高度波动的市场中,预测受到情绪、技术变革和监管变化驱动的大幅价格波动的挑战。传统上,预测依赖于统计方法,但随着市场变得更加复杂,LSTM、BiLSTM和较新的FinBERT-LSTM等深度学习模型出现,以捕捉复杂的模式。基于这些进步并解决加密货币市场固有的波动性,我们提出了一种将双向长短期记忆(BiLSTM)网络与FinBERT相结合的新型混合模型。在像加密货币这样的波动市场中,价格波动反映了对情绪的即时反应和延迟反应,FinBERT-BiLSTM模型比单向方法具有明显的优势。我们根据传统基线(LSTM, BiLSTM),情绪感知变体(FinBERT-LSTM, FinBERT-TCN)和最先进的模型(包括基于变压器的(Informer, TCN, TFT)和基于gpt的(TimeGPT)架构评估模型。这些评估涵盖短期(日内和一天前)和长期(30天前)预测范围,并辅以现实的交易模拟。实验结果表明,FinBERT-BiLSTM在所有水平上的平均绝对百分比误差都很低,日内预测为2.03% (BTC)和2.52% (ETH),提前一天预测为2.20% (BTC)和2.77% (ETH),提前30天预测为1.62% (BTC)和5.10% (ETH)。此外,它在模拟交易盈利能力方面优于所有竞争模型,展示了基于情绪的加密货币预测的统计和实用价值。
{"title":"FinBERT-BiLSTM: a deep learning model for predicting volatile cryptocurrency market prices using market sentiment dynamics","authors":"Mabsur Fatin Bin Hossain,&nbsp;Lubna Zahan Lamia,&nbsp;Md Mahmudur Rahman,&nbsp;Md Mosaddek Khan","doi":"10.1007/s10489-026-07086-4","DOIUrl":"10.1007/s10489-026-07086-4","url":null,"abstract":"<div><p>Time series forecasting is a key tool in financial markets, helping to predict asset prices and guide investment decisions. In highly volatile markets like Bitcoin (BTC) and Ethereum (ETH), forecasting is challenged by sharp price swings driven by sentiment, technological shifts, and regulatory changes. Traditionally, forecasting relied on statistical methods, but as markets became more complex, deep learning models like LSTM, BiLSTM, and the newer FinBERT-LSTM emerged to capture intricate patterns. Building upon these advancements and addressing the volatility inherent in cryptocurrency markets, we propose a novel hybrid model that combines Bidirectional Long Short-Term Memory (BiLSTM) networks with FinBERT. In volatile markets like cryptocurrencies, where price movements reflect both immediate reactions and delayed responses to sentiment, the FinBERT-BiLSTM model offers a clear advantage over unidirectional approaches. We evaluate the model against traditional baselines (LSTM, BiLSTM), sentiment-aware variants (FinBERT-LSTM, FinBERT-TCN), and state-of-the-art models, including Transformer-based (Informer, TCN, TFT) and GPT-based (TimeGPT) architectures. These evaluations span short-term (intra-day and one-day-ahead) and long-term (30-day-ahead) forecasting horizons, augmented with a realistic trading simulation. Experimental results show that FinBERT-BiLSTM achieves low Mean Absolute Percentage Error values across all horizons—2.03% (BTC) and 2.52% (ETH) for intra-day, 2.20% (BTC) and 2.77% (ETH) for one-day-ahead, and 1.62% (BTC) and 5.10% (ETH) for 30-day-ahead prediction. Moreover, it outperforms all competing models in simulated trading profitability, demonstrating both statistical and practical value for sentiment-informed cryptocurrency forecasting.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual design for empathy in virtual reality: a systematic review of emotional engagement (2018–2023) 虚拟现实中同理心的视觉设计:情感投入的系统回顾(2018-2023)
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-13 DOI: 10.1007/s10489-025-06970-9
Wenjing Liu, Amirrudin Kamsin, Hanafi Hussin

This study investigates how visual design elements—specifically color, lighting, and spatial composition—affect emotional and empathetic engagement in Virtual Reality (VR) environments. It focuses on adult users across diverse domains such as education, psychotherapy, and humanitarian advocacy. A systematic review of literature from 2018 to 2023 was conducted using Scopus, Web of Science, and JSTOR, adhering to PRISMA guidelines. A total of 21 peer-reviewed studies were selected based on their focus on visual modalities in VR and empathy-related outcomes for adult users. The review identifies a consistent impact of visual elements on users’ empathetic and emotional responses in immersive VR contexts. However, it also uncovers key challenges: (1) a lack of standardized, validated metrics for measuring empathy in VR; (2) a fragmented approach to visual emotion analysis that isolates design elements rather than considering their interplay; and (3) limited investigation into cultural variability in interpreting visual stimuli. This study offers a novel contribution by synthesizing visual design practices in VR to foster empathy, a focus not previously addressed in our research. Unlike existing studies, including any prior work by our team, this review integrates cross-cultural perspectives, proposes standardized empathy metrics, and examines the interplay of visual elements (color, lighting, spatial composition) across diverse domains, providing a cohesive framework for empathy-driven VR design.

本研究探讨了视觉设计元素——特别是色彩、照明和空间构成——如何影响虚拟现实(VR)环境中的情感和移情参与。它关注不同领域的成人用户,如教育、心理治疗和人道主义倡导。遵循PRISMA指南,使用Scopus、Web of Science和JSTOR对2018年至2023年的文献进行系统综述。根据对VR视觉模式和成人用户共情相关结果的关注,共选择了21项同行评议的研究。该评论确定了视觉元素对沉浸式VR环境中用户的移情和情绪反应的一致影响。然而,它也揭示了关键的挑战:(1)缺乏标准化的、经过验证的衡量VR同理心的指标;(2)分离设计元素而不考虑其相互作用的碎片化视觉情感分析方法;(3)对解释视觉刺激的文化差异的有限研究。这项研究通过综合VR中的视觉设计实践来培养同理心,这是我们之前的研究中没有涉及的一个重点。与现有的研究不同,包括我们团队之前的任何工作,本综述整合了跨文化视角,提出了标准化的移情指标,并检查了视觉元素(颜色,照明,空间构成)在不同领域的相互作用,为移情驱动的VR设计提供了一个有凝聚力的框架。
{"title":"Visual design for empathy in virtual reality: a systematic review of emotional engagement (2018–2023)","authors":"Wenjing Liu,&nbsp;Amirrudin Kamsin,&nbsp;Hanafi Hussin","doi":"10.1007/s10489-025-06970-9","DOIUrl":"10.1007/s10489-025-06970-9","url":null,"abstract":"<div><p>This study investigates how visual design elements—specifically color, lighting, and spatial composition—affect emotional and empathetic engagement in Virtual Reality (VR) environments. It focuses on adult users across diverse domains such as education, psychotherapy, and humanitarian advocacy. A systematic review of literature from 2018 to 2023 was conducted using Scopus, Web of Science, and JSTOR, adhering to PRISMA guidelines. A total of 21 peer-reviewed studies were selected based on their focus on visual modalities in VR and empathy-related outcomes for adult users. The review identifies a consistent impact of visual elements on users’ empathetic and emotional responses in immersive VR contexts. However, it also uncovers key challenges: (1) a lack of standardized, validated metrics for measuring empathy in VR; (2) a fragmented approach to visual emotion analysis that isolates design elements rather than considering their interplay; and (3) limited investigation into cultural variability in interpreting visual stimuli. This study offers a novel contribution by synthesizing visual design practices in VR to foster empathy, a focus not previously addressed in our research. Unlike existing studies, including any prior work by our team, this review integrates cross-cultural perspectives, proposes standardized empathy metrics, and examines the interplay of visual elements (color, lighting, spatial composition) across diverse domains, providing a cohesive framework for empathy-driven VR design.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-025-06970-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entity injection with contrastive learning encoder for Chinese few-shot natural language inference 基于对比学习编码器的实体注入中文少镜头自然语言推理
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-12 DOI: 10.1007/s10489-026-07124-1
Peichao Lai, Feiyang Ye, Yanggeng Fu, Ruiqing Wang, Yingjie Wu, Yilei Wang

Recognizing Textual Entailment (RTE) is a fundamental task in natural language processing with broad downstream applications. In low-resource scenarios, existing methods struggle due to their reliance on external knowledge, sensitivity to semantic distribution bias, and limitations in capturing fine-grained entity distinctions. To address these challenges, we propose a novel framework that enhances semantic awareness of sequence entities through two key strategies: Sample-based Entity Replacement Augmentation and Integration of Entity Features. First, we extract and categorize entities from the dataset, then augment the data by replacing entities with others of the same type, leveraging semi-supervised learning to enhance model training. Second, we introduce the Advanced Contrastive Learning Encoder (ACE) model, which integrates filtered entity features into sequence representations and employs sequence-entity contrastive learning to improve entity distinction. Our approach significantly reduces dependence on external knowledge, mitigates semantic distribution bias, and enhances entity-awareness in RTE. Experimental results demonstrate that our methods consistently achieve state-of-the-art performance across all benchmark datasets for low-resource Chinese RTE tasks.

文本蕴涵识别(RTE)是自然语言处理中的一项基础任务,具有广泛的下游应用。在资源匮乏的情况下,现有的方法由于对外部知识的依赖、对语义分布偏差的敏感性以及在捕获细粒度实体差异方面的限制而陷入困境。为了解决这些挑战,我们提出了一个新的框架,通过两个关键策略来增强序列实体的语义感知:基于样本的实体替换增强和实体特征的集成。首先,我们从数据集中提取实体并对其进行分类,然后通过使用相同类型的其他实体替换实体来增强数据,利用半监督学习来增强模型训练。其次,我们引入了高级对比学习编码器(ACE)模型,该模型将过滤后的实体特征集成到序列表示中,并采用序列-实体对比学习来提高实体区分。我们的方法显著降低了对外部知识的依赖,减轻了语义分布偏差,并增强了RTE中的实体意识。实验结果表明,我们的方法在低资源中文RTE任务的所有基准数据集上始终如一地实现了最先进的性能。
{"title":"Entity injection with contrastive learning encoder for Chinese few-shot natural language inference","authors":"Peichao Lai,&nbsp;Feiyang Ye,&nbsp;Yanggeng Fu,&nbsp;Ruiqing Wang,&nbsp;Yingjie Wu,&nbsp;Yilei Wang","doi":"10.1007/s10489-026-07124-1","DOIUrl":"10.1007/s10489-026-07124-1","url":null,"abstract":"<div>\u0000 \u0000 <p>Recognizing Textual Entailment (RTE) is a fundamental task in natural language processing with broad downstream applications. In low-resource scenarios, existing methods struggle due to their reliance on external knowledge, sensitivity to semantic distribution bias, and limitations in capturing fine-grained entity distinctions. To address these challenges, we propose a novel framework that enhances semantic awareness of sequence entities through two key strategies: Sample-based Entity Replacement Augmentation and Integration of Entity Features. First, we extract and categorize entities from the dataset, then augment the data by replacing entities with others of the same type, leveraging semi-supervised learning to enhance model training. Second, we introduce the Advanced Contrastive Learning Encoder (ACE) model, which integrates filtered entity features into sequence representations and employs sequence-entity contrastive learning to improve entity distinction. Our approach significantly reduces dependence on external knowledge, mitigates semantic distribution bias, and enhances entity-awareness in RTE. Experimental results demonstrate that our methods consistently achieve state-of-the-art performance across all benchmark datasets for low-resource Chinese RTE tasks.</p>\u0000 </div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forged anomaly detection using advanced deep learning 伪造异常检测使用先进的深度学习
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-12 DOI: 10.1007/s10489-026-07114-3
Nomica Choudhry, Jemal Abawajy, Shamsul Huda, Imran Rao

We propose a comprehensive system architecture designed to effectively mitigate the growing risks associated with modern DeepFake technologies. Our proposed system comprises of three primary components, each playing a vital role in the detection process. In the first component, we proposed a GAN framework with a dual discriminator, specifically developed for initial image authenticity verification. This innovative approach is essential for differentiating genuine images from those artificially manipulated. Following this, we employ a specialized form of CNN, known as a Residual Block-Based CNN, to further analyze images verified by the GANs. This deep network, consisting of 33 layers, excels in extracting and processing intricate features from the images, a critical factor in identifying complex patterns indicative of anomalies. Lastly, the system incorporates a Multiple Instance Learning (MIL) framework. The framework presented in the research significantly enhances image anomaly detection at the event level, rather than just the instant level. The system was rigorously evaluated on two benchmark datasets: FaceForensics++ (FF++) and CelebDF. It achieved a classification accuracy of 96.01% with a False Negative Rate (FNR) of 18.86% and an average computation time of 1801.10 seconds on the FF++ dataset. On the CelebDF dataset, it attained an accuracy of 94.3%, an FNR of 5.9%, and an average computation time of 110.04 seconds. These results demonstrate the proposed framework’s effectiveness and efficiency in detecting complex image forgery cases without requiring extensive manual annotations, making it suitable for real-time surveillance and media integrity verification systems.

我们提出了一个全面的系统架构,旨在有效地减轻与现代DeepFake技术相关的日益增长的风险。我们提出的系统由三个主要部分组成,每个部分在检测过程中都起着至关重要的作用。在第一部分中,我们提出了一个具有双鉴别器的GAN框架,专门用于初始图像真实性验证。这种创新的方法对于区分真实图像和人为处理的图像至关重要。在此之后,我们采用一种特殊形式的CNN,称为基于残差块的CNN,来进一步分析经过gan验证的图像。该深度网络由33层组成,擅长从图像中提取和处理复杂的特征,这是识别指示异常的复杂模式的关键因素。最后,该系统结合了一个多实例学习(MIL)框架。研究提出的框架显著增强了事件级别的图像异常检测,而不仅仅是瞬间级别。该系统在两个基准数据集上进行了严格的评估:face取证++ (FF++)和CelebDF。在FF++数据集上实现了96.01%的分类准确率,假阴性率(FNR)为18.86%,平均计算时间为1801.10秒。在CelebDF数据集上,它的准确率为94.3%,FNR为5.9%,平均计算时间为110.04秒。这些结果证明了所提出的框架在检测复杂图像伪造案件方面的有效性和效率,而不需要大量的人工注释,使其适用于实时监控和媒体完整性验证系统。
{"title":"Forged anomaly detection using advanced deep learning","authors":"Nomica Choudhry,&nbsp;Jemal Abawajy,&nbsp;Shamsul Huda,&nbsp;Imran Rao","doi":"10.1007/s10489-026-07114-3","DOIUrl":"10.1007/s10489-026-07114-3","url":null,"abstract":"<div><p>We propose a comprehensive system architecture designed to effectively mitigate the growing risks associated with modern DeepFake technologies. Our proposed system comprises of three primary components, each playing a vital role in the detection process. In the first component, we proposed a GAN framework with a dual discriminator, specifically developed for initial image authenticity verification. This innovative approach is essential for differentiating genuine images from those artificially manipulated. Following this, we employ a specialized form of CNN, known as a Residual Block-Based CNN, to further analyze images verified by the GANs. This deep network, consisting of 33 layers, excels in extracting and processing intricate features from the images, a critical factor in identifying complex patterns indicative of anomalies. Lastly, the system incorporates a Multiple Instance Learning (MIL) framework. The framework presented in the research significantly enhances image anomaly detection at the event level, rather than just the instant level. The system was rigorously evaluated on two benchmark datasets: FaceForensics++ (FF++) and CelebDF. It achieved a classification accuracy of 96.01% with a False Negative Rate (FNR) of 18.86% and an average computation time of 1801.10 seconds on the FF++ dataset. On the CelebDF dataset, it attained an accuracy of 94.3%, an FNR of 5.9%, and an average computation time of 110.04 seconds. These results demonstrate the proposed framework’s effectiveness and efficiency in detecting complex image forgery cases without requiring extensive manual annotations, making it suitable for real-time surveillance and media integrity verification systems.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mamba-integrated spatio-temporal attention graph convolutional network for session-based recommendation 基于会话推荐的mamba集成时空注意图卷积网络
IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-11 DOI: 10.1007/s10489-026-07097-1
Yafang Li, Miao Wang, Baokai Zu, Caiyan Jia, Hongyuan Wang

Session-based recommendation (SBR) faces critical limitations in existing approaches that hinder recommendation accuracy. Current methods suffer from three fundamental deficiencies: insufficient modeling of temporal dependencies, where standard attention mechanisms treat all sequence positions equally and fail to capture the relative importance of temporal proximity; noisy aggregation in graph neural networks (GNNs), where multi-layer GNN architectures cause over-smoothing and feature information loss as node representations converge to indistinguishable states; and computational inefficiency, where quadratic-complexity self-attention mechanisms introduce noise through indiscriminate all-to-all item connections. To address these challenges, we propose the Mamba-Integrated Spatio-Temporal Attention Graph Convolutional Network (MSTA-GNN). MSTA-GNN tackles temporal dependency neglect through a convolution-based temporal attention mechanism that explicitly encodes temporal ordering and assigns higher weights to temporally closer items. To overcome GNN noise aggregation, we design a single-layer Spatio-Temporal Attention Graph Convolutional Layer (STA-GCL) that achieves higher-order neighbor aggregation while avoiding over-smoothing. For computational efficiency, we replace quadratic self-attention with a linear-complexity Mamba mechanism that selectively filters noise while efficiently capturing long-term dependencies. Additionally, we introduce virtual nodes and a global-level sparse attention encoder to further mitigate noise and capture inter-session dependencies. Extensive experiments on two public datasets demonstrate that MSTA-GNN significantly outperforms state-of-the-art methods, achieving improvements of 10.22%-14.89% in precision metrics and 1.1%-7.66% in MRR metrics.

基于会话的推荐(SBR)在现有方法中面临着严重的限制,这些限制阻碍了推荐的准确性。目前的方法存在三个基本缺陷:时间依赖性建模不足,标准注意机制对所有序列位置一视同仁,未能捕捉到时间邻近的相对重要性;图神经网络(GNN)中的噪声聚集,其中多层GNN架构导致过度平滑和特征信息丢失,因为节点表示收敛到不可区分的状态;计算效率低下,二次复杂性自关注机制通过不分青红皂白的所有项目连接引入噪音。为了解决这些挑战,我们提出了曼巴集成时空注意图卷积网络(MSTA-GNN)。MSTA-GNN通过基于卷积的时间注意机制来解决时间依赖性忽略问题,该机制明确地对时间排序进行编码,并为时间更近的项目分配更高的权重。为了克服GNN噪声聚集,我们设计了一个单层时空注意图卷积层(STA-GCL),在避免过度平滑的同时实现了高阶邻居聚集。为了提高计算效率,我们用线性复杂度的曼巴机制取代了二次自关注,该机制可以选择性地过滤噪音,同时有效地捕获长期依赖关系。此外,我们引入了虚拟节点和全局级稀疏注意力编码器,以进一步减轻噪声并捕获会话间依赖关系。在两个公共数据集上的大量实验表明,MSTA-GNN显著优于最先进的方法,在精度指标上实现了10.22%-14.89%的改进,在MRR指标上实现了1.1%-7.66%的改进。
{"title":"Mamba-integrated spatio-temporal attention graph convolutional network for session-based recommendation","authors":"Yafang Li,&nbsp;Miao Wang,&nbsp;Baokai Zu,&nbsp;Caiyan Jia,&nbsp;Hongyuan Wang","doi":"10.1007/s10489-026-07097-1","DOIUrl":"10.1007/s10489-026-07097-1","url":null,"abstract":"<div><p>Session-based recommendation (SBR) faces critical limitations in existing approaches that hinder recommendation accuracy. Current methods suffer from three fundamental deficiencies: insufficient modeling of temporal dependencies, where standard attention mechanisms treat all sequence positions equally and fail to capture the relative importance of temporal proximity; noisy aggregation in graph neural networks (GNNs), where multi-layer GNN architectures cause over-smoothing and feature information loss as node representations converge to indistinguishable states; and computational inefficiency, where quadratic-complexity self-attention mechanisms introduce noise through indiscriminate all-to-all item connections. To address these challenges, we propose the Mamba-Integrated Spatio-Temporal Attention Graph Convolutional Network (MSTA-GNN). MSTA-GNN tackles temporal dependency neglect through a convolution-based temporal attention mechanism that explicitly encodes temporal ordering and assigns higher weights to temporally closer items. To overcome GNN noise aggregation, we design a single-layer Spatio-Temporal Attention Graph Convolutional Layer (STA-GCL) that achieves higher-order neighbor aggregation while avoiding over-smoothing. For computational efficiency, we replace quadratic self-attention with a linear-complexity Mamba mechanism that selectively filters noise while efficiently capturing long-term dependencies. Additionally, we introduce virtual nodes and a global-level sparse attention encoder to further mitigate noise and capture inter-session dependencies. Extensive experiments on two public datasets demonstrate that MSTA-GNN significantly outperforms state-of-the-art methods, achieving improvements of 10.22%-14.89% in precision metrics and 1.1%-7.66% in MRR metrics.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"56 3","pages":""},"PeriodicalIF":3.5,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147338598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1