首页 > 最新文献

Applied Intelligence最新文献

英文 中文
Visible and thermal image fusion network with diffusion models for high-level visual tasks 基于扩散模型的高阶视觉任务可见光和热图像融合网络
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-09 DOI: 10.1007/s10489-024-06210-6
Jin Meng, Jiahui Zou, Zhuoheng Xiang, Cui Wang, Shifeng Wang, Yan Li, Jonghyuk Kim

Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.

融合技术通过结合互补信息,提高了安防、自动驾驶、军事监视、医疗成像、环境监测等应用的性能。可见光和热成像(RGB-T)图像的融合对于改善人类观察和视觉任务至关重要。然而,大多数语义驱动的融合算法的训练将分割和融合任务结合在一起,从而增加了计算成本,并且没有充分利用语义信息。设计一个更清晰的融合架构来挖掘丰富的深层语义特征是解决这一问题的关键。提出了一种带扩散模型的两级RGB-T图像融合网络。第一阶段,利用扩散模型提取多尺度特征;这为融合网络提供了丰富的语义特征和纹理边缘。下一步,提出语义特征增强模块(semantic feature enhancement module, SFEM)和细节特征增强模块(detail feature enhancement module, DFEM)来提高网络描述小细节的能力。采用自适应全局-局部注意机制(AGAM)增强与视觉任务相关的关键特征权重。具体来说,我们通过创建一个新的三模态传感器驾驶场景数据集(TSDS)来对所提出的算法进行基准测试,该数据集包括15234组标记图像(可见光、热成像和偏振度图像)。在我们的融合图像上训练的语义分割模型的准确率达到78.41%,目标检测模型的MAP达到87.21%。实验结果表明,该算法优于现有的图像融合算法。
{"title":"Visible and thermal image fusion network with diffusion models for high-level visual tasks","authors":"Jin Meng,&nbsp;Jiahui Zou,&nbsp;Zhuoheng Xiang,&nbsp;Cui Wang,&nbsp;Shifeng Wang,&nbsp;Yan Li,&nbsp;Jonghyuk Kim","doi":"10.1007/s10489-024-06210-6","DOIUrl":"10.1007/s10489-024-06210-6","url":null,"abstract":"<div><p>Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive spiking neuron with population coding for a residual spiking neural network 残差峰值神经网络的种群编码自适应峰值神经元
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-09 DOI: 10.1007/s10489-024-06128-z
Yongping Dan, Changhao Sun, Hengyi Li, Lin Meng

Spiking neural networks (SNNs) have attracted significant research attention due to their inherent sparsity and event-driven processing capabilities. Recent studies indicate that the incorporation of convolutional and residual structures into SNNs can substantially enhance performance. However, these converted spiking residual structures are associated with increased complexity and stacked parameterized spiking neurons. To address this challenge, this paper proposes a meticulously refined two-layer decision structure for residual-based SNNs, consisting solely of fully connected and spiking neuron layers. Specifically, the spiking neuron layers incorporate an innovative dynamic leaky integrate-and-fire (DLIF) neuron model with a nonlinear self-feedback mechanism, characterized by dynamic threshold adjustment and a self-regulating firing rate. Furthermore, diverging from traditional direct encoding, which focuses solely on individual neuronal frequency, we introduce a novel mixed coding mechanism that combines direct encoding with multineuronal population decoding. The proposed architecture improves the adaptability and responsiveness of spiking neurons in various computational contexts. Experimental results demonstrate the superior efficacy of our approach. Although it uses a highly simplified structure with only 6 timesteps, our proposal achieves enhanced performance in the experimental trials compared to multiple state-of-the-art methods. Specifically, it achieves accuracy improvements of 0.01-1.99% on three static datasets and of 0.14-7.50% on three N-datasets. The DLIF model excels in information processing, showing double mutual information compared to other neurons. In the sequential MNIST dataset, it balances biological realism and practicality, enhancing memory and the dynamic range. Our proposed method not only offers improved computational efficacy and simplified network structure but also enhances the biological plausibility of SNN models and can be easily adapted to other deep SNNs.

脉冲神经网络(snn)由于其固有的稀疏性和事件驱动的处理能力而引起了广泛的研究关注。最近的研究表明,将卷积结构和残差结构结合到snn中可以大大提高性能。然而,这些转换的尖峰残余结构与复杂性的增加和参数化尖峰神经元的堆叠有关。为了解决这一挑战,本文提出了一种精心改进的基于残差的snn的两层决策结构,仅由完全连接和尖峰神经元层组成。具体来说,脉冲神经元层结合了一种创新的动态泄漏集成和放电(DLIF)神经元模型,该模型具有非线性自反馈机制,其特征是动态阈值调节和自调节放电速率。此外,与传统的只关注单个神经元频率的直接编码不同,我们引入了一种新的混合编码机制,将直接编码与多神经元群体解码相结合。提出的结构提高了尖峰神经元在各种计算环境下的适应性和响应性。实验结果证明了该方法的优越性。虽然它使用高度简化的结构,只有6个时间步,但与多种最先进的方法相比,我们的提议在实验试验中实现了更高的性能。具体来说,它在三个静态数据集上的准确率提高了0.01-1.99%,在三个n -数据集上的准确率提高了0.14-7.50%。DLIF模型具有较强的信息处理能力,与其他神经元相比具有双重互信息。在序列MNIST数据集中,它平衡了生物的现实性和实用性,增强了记忆和动态范围。我们提出的方法不仅提高了计算效率和简化了网络结构,而且增强了SNN模型的生物合理性,并且可以很容易地适应其他深度SNN。
{"title":"Adaptive spiking neuron with population coding for a residual spiking neural network","authors":"Yongping Dan,&nbsp;Changhao Sun,&nbsp;Hengyi Li,&nbsp;Lin Meng","doi":"10.1007/s10489-024-06128-z","DOIUrl":"10.1007/s10489-024-06128-z","url":null,"abstract":"<div><p>Spiking neural networks (SNNs) have attracted significant research attention due to their inherent sparsity and event-driven processing capabilities. Recent studies indicate that the incorporation of convolutional and residual structures into SNNs can substantially enhance performance. However, these converted spiking residual structures are associated with increased complexity and stacked parameterized spiking neurons. To address this challenge, this paper proposes a meticulously refined two-layer decision structure for residual-based SNNs, consisting solely of fully connected and spiking neuron layers. Specifically, the spiking neuron layers incorporate an innovative dynamic leaky integrate-and-fire (DLIF) neuron model with a nonlinear self-feedback mechanism, characterized by dynamic threshold adjustment and a self-regulating firing rate. Furthermore, diverging from traditional direct encoding, which focuses solely on individual neuronal frequency, we introduce a novel mixed coding mechanism that combines direct encoding with multineuronal population decoding. The proposed architecture improves the adaptability and responsiveness of spiking neurons in various computational contexts. Experimental results demonstrate the superior efficacy of our approach. Although it uses a highly simplified structure with only 6 timesteps, our proposal achieves enhanced performance in the experimental trials compared to multiple state-of-the-art methods. Specifically, it achieves accuracy improvements of 0.01-1.99% on three static datasets and of 0.14-7.50% on three N-datasets. The DLIF model excels in information processing, showing double mutual information compared to other neurons. In the sequential MNIST dataset, it balances biological realism and practicality, enhancing memory and the dynamic range. Our proposed method not only offers improved computational efficacy and simplified network structure but also enhances the biological plausibility of SNN models and can be easily adapted to other deep SNNs.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMD empowered neural network for predicting spatio-temporal non-stationary channel in UAV communications 基于EMD的无人机通信时空非平稳信道预测神经网络
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-09 DOI: 10.1007/s10489-024-06165-8
Qiuyun Zhang, Qiumei Guo, Hong Jiang, Xinfan Yin, Muhammad Umer Mushtaq, Ying Luo, Chun Wu

This paper introduces a novel prediction method for spatio-temporal non-stationary channels between unmanned aerial vehicles (UAVs) and ground control vehicles, essential for the fast and accurate acquisition of channel state information (CSI) to support UAV applications in ultra-reliable and low-latency communication (URLLC). Specifically, an empirical mode decomposition (EMD)-empowered spatio-temporal attention neural network is proposed, referred to as EMD-STANN. The STANN sub-module within EMD-STANN is designed to capture the spatial correlation and temporal dependence of CSI. Furthermore, the EMD component is employed to handle the non-stationary and nonlinear dynamic characteristics of the UAV-to-ground control vehicle (U2V) channel, thereby enhancing the feature extraction and refinement capabilities of the STANN and improving the accuracy of CSI prediction. Additionally, we conducted a validation of the proposed EMD-STANN model across multiple datasets. The results indicated that EMD-STANN is capable of effectively adapting to diverse channel conditions and accurately predicting channel states. Compared to existing methods, EMD-STANN exhibited superior predictive performance, as indicated by its reduced root mean square error (RMSE) and mean absolute error (MAE) metrics. Specifically, EMD-STANN achieved a reduction of 24.66% in RMSE and 25.46% in MAE compared to the reference method under our simulation conditions. This improvement in prediction accuracy provides a solid foundation for the implementation of URLLC applications.

本文介绍了一种新的无人机与地面控制车辆之间时空非平稳信道的预测方法,该方法对于快速准确地获取信道状态信息(CSI)以支持无人机在超可靠低延迟通信(URLLC)中的应用至关重要。具体而言,提出了一种基于经验模态分解(EMD)的时空注意力神经网络,简称EMD- stann。EMD-STANN中的STANN子模块旨在捕获CSI的空间相关性和时间依赖性。利用EMD分量处理U2V信道的非平稳和非线性动态特性,增强STANN的特征提取和细化能力,提高CSI预测的精度。此外,我们在多个数据集上对提出的EMD-STANN模型进行了验证。结果表明,EMD-STANN能够有效适应各种信道条件,并能准确预测信道状态。与现有方法相比,EMD-STANN的预测性能更优,这可以从其降低的均方根误差(RMSE)和平均绝对误差(MAE)指标中看出。具体而言,在我们的仿真条件下,EMD-STANN的RMSE和MAE分别比参考方法降低了24.66%和25.46%。预测精度的提高为URLLC应用程序的实现提供了坚实的基础。
{"title":"EMD empowered neural network for predicting spatio-temporal non-stationary channel in UAV communications","authors":"Qiuyun Zhang,&nbsp;Qiumei Guo,&nbsp;Hong Jiang,&nbsp;Xinfan Yin,&nbsp;Muhammad Umer Mushtaq,&nbsp;Ying Luo,&nbsp;Chun Wu","doi":"10.1007/s10489-024-06165-8","DOIUrl":"10.1007/s10489-024-06165-8","url":null,"abstract":"<div><p>This paper introduces a novel prediction method for spatio-temporal non-stationary channels between unmanned aerial vehicles (UAVs) and ground control vehicles, essential for the fast and accurate acquisition of channel state information (CSI) to support UAV applications in ultra-reliable and low-latency communication (URLLC). Specifically, an empirical mode decomposition (EMD)-empowered spatio-temporal attention neural network is proposed, referred to as EMD-STANN. The STANN sub-module within EMD-STANN is designed to capture the spatial correlation and temporal dependence of CSI. Furthermore, the EMD component is employed to handle the non-stationary and nonlinear dynamic characteristics of the UAV-to-ground control vehicle (U2V) channel, thereby enhancing the feature extraction and refinement capabilities of the STANN and improving the accuracy of CSI prediction. Additionally, we conducted a validation of the proposed EMD-STANN model across multiple datasets. The results indicated that EMD-STANN is capable of effectively adapting to diverse channel conditions and accurately predicting channel states. Compared to existing methods, EMD-STANN exhibited superior predictive performance, as indicated by its reduced root mean square error (RMSE) and mean absolute error (MAE) metrics. Specifically, EMD-STANN achieved a reduction of 24.66% in RMSE and 25.46% in MAE compared to the reference method under our simulation conditions. This improvement in prediction accuracy provides a solid foundation for the implementation of URLLC applications.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F2RAIL: panoptic segmentation integrating Fpn and transFormer towards RAILway F2RAIL:集成Fpn和变压器的面向铁路的全光分割
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-09 DOI: 10.1007/s10489-024-06158-7
Bai Dingyuan, Guo Baoqing, Ruan Tao, Zhou Xingfang, Sun Tao, Wang Yu, Liu Tao

Panoptic segmentation method enables precise identification and localization of various elements in railway scenes by assigning unique masks to each object in the image, thereby providing crucial data support for autonomous perception tasks in railway environments. However, existing segmentation methods fail to effectively leverage the prominent boundary and linear features of objects such as railway tracks and guardrails, resulting in unsatisfactory segmentation performance in railway scenes. Moreover, the inherent structural limitations of generic segmentation methods lead to weak feature extraction capabilities. Accordingly, this paper proposes the F2RAIL panoptic segmentation network, which achieves a unified approach to multi-scale detection and high-precision recognition through an innovative fusion of Feature Pyramid Networks (FPN) and transformer networks. By introducing an edge feature enhancement module, we address the insufficient utilization of linear features in railway scenes by segmentation models; By introducing a multi-dimensional enhancement module, we resolve the issues of weakened or even lost deep feature information in segmentation models. Based on the aforementioned structural innovations and methodological improvements, F2RAIL achieved a panoptic quality(PQ) of 43.74% on our custom railway dataset, representing a 2.2% improvement over existing state-of-the-art(SOTA) methods. Additionally, it demonstrated comparable performance to SOTA methods on public benchmark datasets.

全视分割方法通过为图像中的每个物体分配独特的掩模,可以精确识别和定位铁路场景中的各种元素,从而为铁路环境中的自主感知任务提供重要的数据支持。然而,现有的分割方法未能有效利用铁路轨道、护栏等物体突出的边界和线性特征,导致铁路场景的分割效果不理想。此外,一般分割方法固有的结构限制导致特征提取能力较弱。据此,本文提出了F2RAIL全光分割网络,通过特征金字塔网络(FPN)和变压器网络的创新融合,实现了多尺度检测和高精度识别的统一方法。通过引入边缘特征增强模块,解决了分割模型对铁路场景线性特征利用不足的问题;通过引入多维增强模块,解决了分割模型中深层特征信息弱化甚至丢失的问题。基于上述结构创新和方法改进,F2RAIL在我们的定制铁路数据集上实现了43.74%的全景质量(PQ),比现有的最先进(SOTA)方法提高了2.2%。此外,它在公共基准数据集上展示了与SOTA方法相当的性能。
{"title":"F2RAIL: panoptic segmentation integrating Fpn and transFormer towards RAILway","authors":"Bai Dingyuan,&nbsp;Guo Baoqing,&nbsp;Ruan Tao,&nbsp;Zhou Xingfang,&nbsp;Sun Tao,&nbsp;Wang Yu,&nbsp;Liu Tao","doi":"10.1007/s10489-024-06158-7","DOIUrl":"10.1007/s10489-024-06158-7","url":null,"abstract":"<div><p>Panoptic segmentation method enables precise identification and localization of various elements in railway scenes by assigning unique masks to each object in the image, thereby providing crucial data support for autonomous perception tasks in railway environments. However, existing segmentation methods fail to effectively leverage the prominent boundary and linear features of objects such as railway tracks and guardrails, resulting in unsatisfactory segmentation performance in railway scenes. Moreover, the inherent structural limitations of generic segmentation methods lead to weak feature extraction capabilities. Accordingly, this paper proposes the F2RAIL panoptic segmentation network, which achieves a unified approach to multi-scale detection and high-precision recognition through an innovative fusion of Feature Pyramid Networks (FPN) and transformer networks. By introducing an edge feature enhancement module, we address the insufficient utilization of linear features in railway scenes by segmentation models; By introducing a multi-dimensional enhancement module, we resolve the issues of weakened or even lost deep feature information in segmentation models. Based on the aforementioned structural innovations and methodological improvements, F2RAIL achieved a panoptic quality(PQ) of 43.74% on our custom railway dataset, representing a 2.2% improvement over existing state-of-the-art(SOTA) methods. Additionally, it demonstrated comparable performance to SOTA methods on public benchmark datasets.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepSCNN: a simplicial convolutional neural network for deep learning DeepSCNN:用于深度学习的简单卷积神经网络
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1007/s10489-024-06121-6
Chunyang Tang, Zhonglin Ye, Haixing Zhao, Libing Bai, Jingjing Lin

Graph convolutional neural networks (GCNs) are deep learning methods for processing graph-structured data. Usually, GCNs mainly consider pairwise connections and ignore higher-order interactions between nodes. Recently, simplices have been shown to encode not only pairwise relations between nodes but also encode higher-order interactions between nodes. Researchers have been concerned with how to design simplicial-based convolutional neural networks. The existing simplicial neural networks can achieve good performance in tasks such as missing value imputation, graph classification, and node classification. However, due to issues of gradient vanishing, over-smoothing, and over-fitting, they are typically limited to very shallow models. Therefore, we innovatively propose a simplicial convolutional neural network for deep learning (DeepSCNN). Firstly, simplicial edge sampling technology (SES) is introduced to prevent over-fitting caused by deepening network layers. Subsequently, initial residual connection technology is added to simplicial convolutional layers. Finally, to verify the validity of the DeepSCNN, we conduct missing data imputation and node classification experiments on citation networks. Additionally, we compare the experimental performance of the DeepSCNN with that of simplicial neural networks (SNN) and simplicial convolutional networks (SCNN). The results show that our proposed DeepSCNN method outperforms SNN and SCNN.

图卷积神经网络(GCNs)是处理图结构数据的深度学习方法。通常,GCNs主要考虑两两连接,忽略节点间的高阶交互。近年来,简单函数不仅可以编码节点之间的成对关系,还可以编码节点之间的高阶相互作用。研究人员一直关注如何设计基于简化的卷积神经网络。现有的简单神经网络在缺失值输入、图分类和节点分类等任务中都能取得较好的性能。然而,由于梯度消失、过度平滑和过度拟合的问题,它们通常仅限于非常浅的模型。因此,我们创新地提出了一种用于深度学习的简单卷积神经网络(DeepSCNN)。首先,引入简单边缘采样技术(SES),防止网络层数加深引起的过拟合;随后,在简单卷积层中加入初始残差连接技术。最后,为了验证DeepSCNN的有效性,我们在引文网络上进行了缺失数据的输入和节点分类实验。此外,我们还将DeepSCNN的实验性能与简单神经网络(SNN)和简单卷积网络(SCNN)进行了比较。结果表明,我们提出的深度神经网络方法优于SNN和SCNN。
{"title":"DeepSCNN: a simplicial convolutional neural network for deep learning","authors":"Chunyang Tang,&nbsp;Zhonglin Ye,&nbsp;Haixing Zhao,&nbsp;Libing Bai,&nbsp;Jingjing Lin","doi":"10.1007/s10489-024-06121-6","DOIUrl":"10.1007/s10489-024-06121-6","url":null,"abstract":"<div><p>Graph convolutional neural networks (GCNs) are deep learning methods for processing graph-structured data. Usually, GCNs mainly consider pairwise connections and ignore higher-order interactions between nodes. Recently, simplices have been shown to encode not only pairwise relations between nodes but also encode higher-order interactions between nodes. Researchers have been concerned with how to design simplicial-based convolutional neural networks. The existing simplicial neural networks can achieve good performance in tasks such as missing value imputation, graph classification, and node classification. However, due to issues of gradient vanishing, over-smoothing, and over-fitting, they are typically limited to very shallow models. Therefore, we innovatively propose a simplicial convolutional neural network for deep learning (DeepSCNN). Firstly, simplicial edge sampling technology (SES) is introduced to prevent over-fitting caused by deepening network layers. Subsequently, initial residual connection technology is added to simplicial convolutional layers. Finally, to verify the validity of the DeepSCNN, we conduct missing data imputation and node classification experiments on citation networks. Additionally, we compare the experimental performance of the DeepSCNN with that of simplicial neural networks (SNN) and simplicial convolutional networks (SCNN). The results show that our proposed DeepSCNN method outperforms SNN and SCNN.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utility-based agent model for intermodal behaviors: a case study for urban toll in Lille 基于效用的多式联运行为代理模型:以里尔城市收费为例
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1007/s10489-024-05869-1
Azise Oumar Diallo, Guillaume Lozenguez, Arnaud Doniec, René Mandiau

To reduce the congestion and pollution in urban cities, the political authorities encourage the modal shift from private cars in favor of sustainable trip behaviors such as intermodality (through combinations of private cars and public transport). Coercive decisions such as urban tolls are also an increasingly investigated solution. To avoid the cost of toll taxes, agents thus select intermodal transportation modes (private cars and public transport) by parking their vehicles in park-and-ride (PR) facilities at the entrance to the area toll. This paper proposes a methodology for an agent-based model (ABM), particularly a model called utility-based agent, to reproduce intermodal trip behaviors in a city and to assess the impact of an urban toll. In this context, we focus on multinomial logit models, coupled with the agent-and-activity simulation tool MATSim, is used to determine the modal choice for each agent. Based on open data (for European Metropolis of Lille, MEL), the simulation shows that (varvec{20}) € ((varvec{21.75}) $) of toll tax is sufficient to reduce by (varvec{20}%) the use of private vehicles.

为了减少城市的拥堵和污染,政府当局鼓励从私家车转向可持续的出行方式,如多式联运(通过私家车和公共交通的结合)。强制性决策,如城市通行费,也是一种越来越被研究的解决方案。为了避免通行费的成本,代理商因此选择多式联运方式(私家车和公共交通),将车辆停放在收费区域入口处的停车换乘设施中。本文提出了一种基于主体模型(ABM)的方法,特别是基于效用的主体模型(utility-based agent),用于再现城市中多式联运出行行为并评估城市收费的影响。在这种情况下,我们将重点放在多项logit模型上,并使用agent-and-activity仿真工具MATSim来确定每个agent的模态选择。基于开放数据(针对MEL的欧洲大都市里尔),仿真结果表明,通行费税率为(varvec b{20})€((varvec{21.75}) $)足以使私家车使用量减少(varvec{20}%)。
{"title":"Utility-based agent model for intermodal behaviors: a case study for urban toll in Lille","authors":"Azise Oumar Diallo,&nbsp;Guillaume Lozenguez,&nbsp;Arnaud Doniec,&nbsp;René Mandiau","doi":"10.1007/s10489-024-05869-1","DOIUrl":"10.1007/s10489-024-05869-1","url":null,"abstract":"<div><p>To reduce the congestion and pollution in urban cities, the political authorities encourage the modal shift from private cars in favor of sustainable trip behaviors such as intermodality (through combinations of private cars and public transport). Coercive decisions such as urban tolls are also an increasingly investigated solution. To avoid the cost of toll taxes, agents thus select intermodal transportation modes (private cars and public transport) by parking their vehicles in park-and-ride (<i>PR</i>) facilities at the entrance to the area toll. This paper proposes a methodology for an agent-based model (<i>ABM</i>), particularly a model called <i>utility-based agent</i>, to reproduce intermodal trip behaviors in a city and to assess the impact of an urban toll. In this context, we focus on multinomial logit models, coupled with the agent-and-activity simulation tool <i>MATSim</i>, is used to determine the modal choice for each agent. Based on open data (for <i>European Metropolis of Lille</i>, <i>MEL</i>), the simulation shows that <span>(varvec{20})</span> € (<span>(varvec{21.75})</span> $) of toll tax is sufficient to reduce by <span>(varvec{20}%)</span> the use of private vehicles.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing few-shot learning using targeted mixup 利用有针对性的混合增强少量射击学习
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1007/s10489-024-06157-8
Yaw Darkwah Jnr., Dae-Ki Kang

Irrespective of the attention that long-tailed classification has received over recent years, expectedly, the performance of the tail classes suffers more than the remaining classes. We address this problem by means of a novel data augmentation technique called Targeted Mixup. This is about mixing class samples based on the model’s performance regarding each class. Instances of classes that are difficult to distinguish are randomly chosen and linearly interpolated to produce a new sample such that the model can pay attention to those two classes. The expectation is that the model can learn the distinguishing features to improve classification of instances belonging to their respective classes. To prove the efficiency of our proposed methods empirically, we performed experiments using CIFAR-100-LT, Places-LT, and Speech Commands-LT datasets. From the results of the experiments, there was an improvement on the few-shot classes without sacrificing too much of the model performance on the many-shot and medium-shot classes. In fact, there was an increase in the overall accuracy as well.

不考虑长尾分类近年来受到的关注,可以预料的是,尾部分类的表现比其他分类受到的影响更大。我们通过一种名为“目标混淆”的新型数据增强技术来解决这个问题。这是关于基于模型对每个类的性能混合类样本。随机选择难以区分的类的实例并进行线性内插以产生新的样本,使模型能够关注这两个类。期望该模型能够学习区分特征,以改进属于各自类别的实例的分类。为了从经验上证明我们提出的方法的有效性,我们使用CIFAR-100-LT、place - lt和Speech Commands-LT数据集进行了实验。从实验结果来看,在不牺牲太多多弹和中弹的模型性能的情况下,在少弹类别上有了改进。事实上,总体准确率也有所提高。
{"title":"Enhancing few-shot learning using targeted mixup","authors":"Yaw Darkwah Jnr.,&nbsp;Dae-Ki Kang","doi":"10.1007/s10489-024-06157-8","DOIUrl":"10.1007/s10489-024-06157-8","url":null,"abstract":"<div><p>Irrespective of the attention that long-tailed classification has received over recent years, expectedly, the performance of the tail classes suffers more than the remaining classes. We address this problem by means of a novel data augmentation technique called Targeted Mixup. This is about mixing class samples based on the model’s performance regarding each class. Instances of classes that are difficult to distinguish are randomly chosen and linearly interpolated to produce a new sample such that the model can pay attention to those two classes. The expectation is that the model can learn the distinguishing features to improve classification of instances belonging to their respective classes. To prove the efficiency of our proposed methods empirically, we performed experiments using CIFAR-100-LT, Places-LT, and Speech Commands-LT datasets. From the results of the experiments, there was an improvement on the few-shot classes without sacrificing too much of the model performance on the many-shot and medium-shot classes. In fact, there was an increase in the overall accuracy as well.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative study of handling imbalanced data using generative adversarial networks for machine learning based software fault prediction 基于机器学习的软件故障预测中生成对抗网络处理不平衡数据的比较研究
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-08 DOI: 10.1007/s10489-024-05930-z
Ha Thi Minh Phuong, Pham Vu Thu Nguyet, Nguyen Huu Nhat Minh, Le Thi My Hanh, Nguyen Thanh Binh

Software fault prediction (SFP) is the process of identifying potentially defect-prone modules before the testing stage of a software development process. By identifying faults early in the development process, software engineers can spend their efforts on those components most likely to contain defects, thereby improving the overall quality and reliability of the software. However, data imbalance and feature redundancy are challenging issues in SFP that can negatively impact the performance of fault prediction models. Imbalanced software fault datasets, in which the number of normal modules (majority class) is significantly higher than that of faulty modules (minority class), may lead to many false negative results. In this work, we study and perform an empirical assessment of the variants of Generative Adversarial Networks (GANs), an emerging synthetic data generation method, for resolving the data imbalance issue in common software fault prediction datasets. Five GANs variations - CopulaGAN, VanillaGAN, CTGAN, TGAN and WGANGP are utilized to generate synthetic faulty samples to balance the proportion of the majority and minority classes in datasets. Thereafter, we present an extensive evaluation of the performance of different prediction models which involve combining Recursive Feature Elimination (RFE) for feature selection with GANs oversampling methods, along with pairs of Autoencoders for feature extraction with GANs models. Throughout the experiments with five fault datasets extracted from the PROMISE repository, we evaluate six different machine learning approaches using precision, recall, F1-score, Area Under Curve (AUC) and Matthews Correlation Coefficient (MCC) as performance evaluation metrics. The experimental results demonstrate that the combination of CTGAN with RFE and a pair of CTGAN with Autoencoders outperform other baselines for all datasets, followed by WGANGP and VanillaGAN. According to the comparative analysis, GANs-based oversampling methods exhibited significant improvement in dealing with data imbalance for software fault prediction.

软件故障预测(SFP)是在软件开发过程的测试阶段之前识别可能存在缺陷的模块的过程。通过在开发过程的早期识别错误,软件工程师可以将他们的精力花在那些最有可能包含缺陷的组件上,从而提高软件的整体质量和可靠性。然而,数据不平衡和特征冗余是SFP中具有挑战性的问题,会对故障预测模型的性能产生负面影响。软件故障数据集不平衡,正常模块(多数类)的数量明显高于故障模块(少数类)的数量,可能导致许多假阴性结果。在这项工作中,我们研究并对生成对抗网络(GANs)的变体进行了经验评估,GANs是一种新兴的综合数据生成方法,用于解决常见软件故障预测数据集中的数据不平衡问题。利用CopulaGAN、VanillaGAN、CTGAN、TGAN和WGANGP五种gan变体生成合成错误样本,以平衡数据集中多数类和少数类的比例。此后,我们对不同预测模型的性能进行了广泛的评估,其中包括将递归特征消除(RFE)用于特征选择与gan过采样方法相结合,以及将成对的自编码器用于gan模型的特征提取。在从PROMISE存储库中提取的五个故障数据集的整个实验中,我们使用精度,召回率,f1分数,曲线下面积(AUC)和马修斯相关系数(MCC)作为性能评估指标评估了六种不同的机器学习方法。实验结果表明,在所有数据集上,CTGAN与RFE组合和一对CTGAN与Autoencoders组合的性能都优于其他基线,其次是WGANGP和vanilla agan。对比分析表明,基于高斯的过采样方法在处理软件故障预测中的数据不平衡方面有显著改善。
{"title":"A comparative study of handling imbalanced data using generative adversarial networks for machine learning based software fault prediction","authors":"Ha Thi Minh Phuong,&nbsp;Pham Vu Thu Nguyet,&nbsp;Nguyen Huu Nhat Minh,&nbsp;Le Thi My Hanh,&nbsp;Nguyen Thanh Binh","doi":"10.1007/s10489-024-05930-z","DOIUrl":"10.1007/s10489-024-05930-z","url":null,"abstract":"<p>Software fault prediction (SFP) is the process of identifying potentially defect-prone modules before the testing stage of a software development process. By identifying faults early in the development process, software engineers can spend their efforts on those components most likely to contain defects, thereby improving the overall quality and reliability of the software. However, data imbalance and feature redundancy are challenging issues in SFP that can negatively impact the performance of fault prediction models. Imbalanced software fault datasets, in which the number of normal modules (majority class) is significantly higher than that of faulty modules (minority class), may lead to many false negative results. In this work, we study and perform an empirical assessment of the variants of Generative Adversarial Networks (GANs), an emerging synthetic data generation method, for resolving the data imbalance issue in common software fault prediction datasets. Five GANs variations - CopulaGAN, VanillaGAN, CTGAN, TGAN and WGANGP are utilized to generate synthetic faulty samples to balance the proportion of the majority and minority classes in datasets. Thereafter, we present an extensive evaluation of the performance of different prediction models which involve combining Recursive Feature Elimination (RFE) for feature selection with GANs oversampling methods, along with pairs of Autoencoders for feature extraction with GANs models. Throughout the experiments with five fault datasets extracted from the PROMISE repository, we evaluate six different machine learning approaches using precision, recall, F1-score, Area Under Curve (AUC) and Matthews Correlation Coefficient (MCC) as performance evaluation metrics. The experimental results demonstrate that the combination of CTGAN with RFE and a pair of CTGAN with Autoencoders outperform other baselines for all datasets, followed by WGANGP and VanillaGAN. According to the comparative analysis, GANs-based oversampling methods exhibited significant improvement in dealing with data imbalance for software fault prediction.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced causal effects estimation based on offline reinforcement learning 基于离线强化学习的增强因果效应估计
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-07 DOI: 10.1007/s10489-024-06009-5
Huan Xia, Chaozhe Jiang, Chenyang Zhang

Causal effects estimation is essential for analyzing the causal effects of treatment (intervention) on outcome, but traditional methods often rely on the strong assumption of no unobserved confounding factors. We propose ECEE-RL (Enhanced Causal Effects Estimation based on Reinforcement Learning), a novel architecture that leverages offline reinforcement learning to relax this assumption. ECEE-RL innovatively models causal effects estimation as a stateless Markov Decision Process, allowing for adaptive policy optimization through action-reward combinations. By framing estimation as "actions" and sensitivity analysis results as "rewards", ECEE-RL minimizes sensitivity to confounders, including unobserved ones. Theoretical analysis confirms the convergence and robustness of ECEE-RL. Experiments on the two simulated datasets demonstrate significant improvements, with CATE MSE reductions ranging from 5.45% to 66.55% and sensitivity significance reductions of up to 98.29% compared to baseline methods. These results corroborate our theoretical findings on ECEE-RL's improved accuracy and robustness. Application to real-world pilot-aircraft interaction data reveals significant causal effects of control behaviors on bioelectrical signals and emotions, demonstrating ECEE-RL's practical utility. While computationally intensive, ECEE-RL offers a promising approach for causal effects estimation, particularly in scenarios where unobserved confounding may be present, representing an important step towards more reliable causal inference in complex real-world settings.

因果效应估计对于分析治疗(干预)对结果的因果效应至关重要,但传统方法往往依赖于没有未观察到的混杂因素的强假设。我们提出了ECEE-RL(基于强化学习的增强因果效应估计),这是一种利用离线强化学习来放松这种假设的新架构。ECEE-RL创新地将因果效应估计建模为无状态马尔可夫决策过程,允许通过行动-奖励组合进行自适应策略优化。通过将评估视为“行动”,将敏感性分析结果视为“奖励”,ECEE-RL将对混杂因素(包括未观察到的因素)的敏感性降至最低。理论分析证实了该方法的收敛性和鲁棒性。在两个模拟数据集上的实验表明,与基线方法相比,CATE MSE降低了5.45% ~ 66.55%,灵敏度显著性降低了98.29%。这些结果证实了我们的理论发现,即ECEE-RL提高了准确性和稳健性。应用于现实世界的人机交互数据揭示了控制行为对生物电信号和情绪的显著因果效应,证明了ECEE-RL的实用性。虽然计算量很大,但ECEE-RL为因果效应估计提供了一种很有前途的方法,特别是在可能存在未观察到的混淆的情况下,这是在复杂的现实环境中向更可靠的因果推断迈出的重要一步。
{"title":"Enhanced causal effects estimation based on offline reinforcement learning","authors":"Huan Xia,&nbsp;Chaozhe Jiang,&nbsp;Chenyang Zhang","doi":"10.1007/s10489-024-06009-5","DOIUrl":"10.1007/s10489-024-06009-5","url":null,"abstract":"<div><p>Causal effects estimation is essential for analyzing the causal effects of treatment (intervention) on outcome, but traditional methods often rely on the strong assumption of no unobserved confounding factors. We propose ECEE-RL (Enhanced Causal Effects Estimation based on Reinforcement Learning), a novel architecture that leverages offline reinforcement learning to relax this assumption. ECEE-RL innovatively models causal effects estimation as a stateless Markov Decision Process, allowing for adaptive policy optimization through action-reward combinations. By framing estimation as \"actions\" and sensitivity analysis results as \"rewards\", ECEE-RL minimizes sensitivity to confounders, including unobserved ones. Theoretical analysis confirms the convergence and robustness of ECEE-RL. Experiments on the two simulated datasets demonstrate significant improvements, with CATE MSE reductions ranging from 5.45% to 66.55% and sensitivity significance reductions of up to 98.29% compared to baseline methods. These results corroborate our theoretical findings on ECEE-RL's improved accuracy and robustness. Application to real-world pilot-aircraft interaction data reveals significant causal effects of control behaviors on bioelectrical signals and emotions, demonstrating ECEE-RL's practical utility. While computationally intensive, ECEE-RL offers a promising approach for causal effects estimation, particularly in scenarios where unobserved confounding may be present, representing an important step towards more reliable causal inference in complex real-world settings.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel embedded cross framework for high-resolution salient object detection 一种用于高分辨率显著目标检测的新型嵌入式交叉框架
IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-07 DOI: 10.1007/s10489-024-06073-x
Baoyu Wang, Mao Yang, Pingping Cao, Yan Liu

Salient object detection (SOD) is a fundamental research topic in computer vision and has attracted significant interest from various fields, it has revealed two issues while driving the rapid development of salient detection. (1) The salient regions in high-resolution images exhibit significant differences in location, structure, and edge details, which makes them difficult to recognize and depict. (2) The traditional salient detection architecture is insensitive to detecting targets in high-resolution feature spaces, which leads to incomplete saliency predictions. To address these limitations, this paper proposes a novel embedded cross framework with a dual-path transformer (ECF-DT) for high-resolution SOD. The framework consists of a dual-path transformer and a unit fusion module for partitioning the salient targets. Specifically, we first design a cross network as a baseline model for salient object detection. Then, the dual-path transformer is embedded into the cross network with the objective of integrating fine-grained visual contextual information and target details while suppressing the disparity of the feature space. To generate more robust feature representations, we also introduce a unit fusion module, which highlights the positive information in the feature channels and encourages saliency prediction. Extensive experiments are conducted on nine benchmark databases, and the performance of the ECF-DT is compared with that of other existing state-of-the-art methods. The results indicate that our method outperforms its competitors and accurately detects the targets in high-resolution images with large objects, cluttered backgrounds, and complex scenes. It achieves MAEs of 0.017, 0.026, and 0.031 on three high-resolution public databases. Moreover, it reaches S-measure rates of 0.909, 0.876, 0.936, 0.854, 0.929, and 0.826 on six low-resolution public databases.

显著目标检测(SOD)是计算机视觉领域的一个基础研究课题,引起了各领域的广泛关注,在推动显著目标检测快速发展的同时,也暴露出两个问题。(1)高分辨率图像中的显著区域在位置、结构和边缘细节上存在显著差异,难以识别和描绘。(2)传统的显著性检测架构对高分辨率特征空间中的目标检测不敏感,导致显著性预测不完整。为了解决这些限制,本文提出了一种具有双路变压器(ECF-DT)的新型嵌入式交叉框架,用于高分辨率SOD。该框架由一个双路变压器和一个用于划分突出目标的单元融合模块组成。具体来说,我们首先设计了一个交叉网络作为显著目标检测的基线模型。然后,将双路转换器嵌入到交叉网络中,目的是将细粒度的视觉上下文信息与目标细节相结合,同时抑制特征空间的差异。为了生成更鲁棒的特征表示,我们还引入了一个单元融合模块,它突出了特征通道中的正信息,并鼓励显著性预测。在9个基准数据库上进行了大量实验,并将ECF-DT的性能与其他现有最先进的方法进行了比较。结果表明,该方法在具有大目标、杂乱背景和复杂场景的高分辨率图像中能够准确地检测出目标。在三个高分辨率公共数据库上,MAEs分别为0.017、0.026和0.031。在6个低分辨率公共数据库上,s -测度率分别为0.909、0.876、0.936、0.854、0.929和0.826。
{"title":"A novel embedded cross framework for high-resolution salient object detection","authors":"Baoyu Wang,&nbsp;Mao Yang,&nbsp;Pingping Cao,&nbsp;Yan Liu","doi":"10.1007/s10489-024-06073-x","DOIUrl":"10.1007/s10489-024-06073-x","url":null,"abstract":"<div><p>Salient object detection (SOD) is a fundamental research topic in computer vision and has attracted significant interest from various fields, it has revealed two issues while driving the rapid development of salient detection. (1) The salient regions in high-resolution images exhibit significant differences in location, structure, and edge details, which makes them difficult to recognize and depict. (2) The traditional salient detection architecture is insensitive to detecting targets in high-resolution feature spaces, which leads to incomplete saliency predictions. To address these limitations, this paper proposes a novel embedded cross framework with a dual-path transformer (ECF-DT) for high-resolution SOD. The framework consists of a dual-path transformer and a unit fusion module for partitioning the salient targets. Specifically, we first design a cross network as a baseline model for salient object detection. Then, the dual-path transformer is embedded into the cross network with the objective of integrating fine-grained visual contextual information and target details while suppressing the disparity of the feature space. To generate more robust feature representations, we also introduce a unit fusion module, which highlights the positive information in the feature channels and encourages saliency prediction. Extensive experiments are conducted on nine benchmark databases, and the performance of the ECF-DT is compared with that of other existing state-of-the-art methods. The results indicate that our method outperforms its competitors and accurately detects the targets in high-resolution images with large objects, cluttered backgrounds, and complex scenes. It achieves MAEs of 0.017, 0.026, and 0.031 on three high-resolution public databases. Moreover, it reaches S-measure rates of 0.909, 0.876, 0.936, 0.854, 0.929, and 0.826 on six low-resolution public databases.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 4","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142939167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1