Pub Date : 2024-11-22DOI: 10.1007/s10489-024-05856-6
Maher Dissem, Manar Amayri
Efficient energy management relies heavily on accurate load forecasting, particularly in the face of increasing energy demands and the imperative for sustainable operations. However, the presence of anomalies in historical data poses a significant challenge to the effectiveness of forecasting models, potentially leading to suboptimal resource allocation and decision-making. This paper presents an innovative unsupervised feature bank based framework for anomaly detection in time series data affected by anomalies. Leveraging an RNN-based recurrent denoising autoencoder, identified anomalies are replaced with plausible patterns. We evaluate the effectiveness of our methodology through a comprehensive study, comparing the performance of different forecasting models before and after the anomaly detection and imputation processes. Our results demonstrate the versatility and effectiveness of our approach across various energy applications for smart grids and smart buildings, highlighting its potential for widespread adoption in energy management systems.
{"title":"Unsupervised anomaly detection and imputation in noisy time series data for enhancing load forecasting","authors":"Maher Dissem, Manar Amayri","doi":"10.1007/s10489-024-05856-6","DOIUrl":"10.1007/s10489-024-05856-6","url":null,"abstract":"<div><p>Efficient energy management relies heavily on accurate load forecasting, particularly in the face of increasing energy demands and the imperative for sustainable operations. However, the presence of anomalies in historical data poses a significant challenge to the effectiveness of forecasting models, potentially leading to suboptimal resource allocation and decision-making. This paper presents an innovative unsupervised feature bank based framework for anomaly detection in time series data affected by anomalies. Leveraging an RNN-based recurrent denoising autoencoder, identified anomalies are replaced with plausible patterns. We evaluate the effectiveness of our methodology through a comprehensive study, comparing the performance of different forecasting models before and after the anomaly detection and imputation processes. Our results demonstrate the versatility and effectiveness of our approach across various energy applications for smart grids and smart buildings, highlighting its potential for widespread adoption in energy management systems.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-22DOI: 10.1007/s10489-024-06053-1
Tinghuai Ma, Yuming Su, Mohamed Magdy Abdel Wahab, Alaa Abd ELraouf Khalil
Meteorological forecasting is of paramount importance for safeguarding human life, mitigating natural disasters, and promoting economic development. However, achieving precise forecasts poses significant challenges owing to the complexities associated with feature representation in observed meteorological data and the dynamic spatio-temporal dependencies therein. Graph Neural Networks (GNNs) have gained prominence in addressing spatio-temporal forecasting challenges, owing to their ability to model non-Euclidean data structures and capture spatio-temporal dependencies. However, existing GNN-based methods lead to obscure of spatio-temporal patterns between nodes due to the over-smoothing problem. Worse still, important high-order structural information is lost during GNN propagation. Topological Data Analysis (TDA), a synthesis of mathematical analysis and machine learning methodologies that can mine the higher-order features present in the data itself, offers a novel perspective for addressing cross-domain spatio-temporal meteorological forecasting tasks. To leverage above problems more effectively and empower GNN with time-aware ability, a new spatio-temporal meteorological forecasting model with topological data analysis is proposed, called Zigzag Persistence with subgraph Decomposition and Supra-graph construction Network (ZPDSN), which can dynamically simulate meteorological data across the spatio-temporal domain. The adjacency matrix for the final spatial dimension is derived by treating the topological features captured via zigzag persistence as a high-order representation of the data, and by introducing subgraph decomposition and supra-graph construction mechanisms to better capture spatial-temporal correlations. ZPDSN outperforms other GNN-based models on four meteorological datasets, namely, temperature, cloud cover, humidity and surface wind component.
{"title":"ZPDSN: spatio-temporal meteorological forecasting with topological data analysis","authors":"Tinghuai Ma, Yuming Su, Mohamed Magdy Abdel Wahab, Alaa Abd ELraouf Khalil","doi":"10.1007/s10489-024-06053-1","DOIUrl":"10.1007/s10489-024-06053-1","url":null,"abstract":"<div><p>Meteorological forecasting is of paramount importance for safeguarding human life, mitigating natural disasters, and promoting economic development. However, achieving precise forecasts poses significant challenges owing to the complexities associated with feature representation in observed meteorological data and the dynamic spatio-temporal dependencies therein. Graph Neural Networks (GNNs) have gained prominence in addressing spatio-temporal forecasting challenges, owing to their ability to model non-Euclidean data structures and capture spatio-temporal dependencies. However, existing GNN-based methods lead to obscure of spatio-temporal patterns between nodes due to the over-smoothing problem. Worse still, important high-order structural information is lost during GNN propagation. Topological Data Analysis (TDA), a synthesis of mathematical analysis and machine learning methodologies that can mine the higher-order features present in the data itself, offers a novel perspective for addressing cross-domain spatio-temporal meteorological forecasting tasks. To leverage above problems more effectively and empower GNN with time-aware ability, a new spatio-temporal meteorological forecasting model with topological data analysis is proposed, called Zigzag Persistence with subgraph Decomposition and Supra-graph construction Network (ZPDSN), which can dynamically simulate meteorological data across the spatio-temporal domain. The adjacency matrix for the final spatial dimension is derived by treating the topological features captured via zigzag persistence as a high-order representation of the data, and by introducing subgraph decomposition and supra-graph construction mechanisms to better capture spatial-temporal correlations. ZPDSN outperforms other GNN-based models on four meteorological datasets, namely, temperature, cloud cover, humidity and surface wind component.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-22DOI: 10.1007/s10489-024-05875-3
Ming He, Han Zhang, Zihao Zhang, Chang Liu
Sequential recommendation aims at mining user interests through modeling sequential behaviors. Most existing sequential recommendation methods overlook the direct transition relationship among items, and only encode a user sequence as a whole, capturing the intention behind the sequence and predicting the next item with which the user might interact. However, in real-world scenarios, a small subset of items within a sequence may directly impact future interactions due to the direct transition relationship among items. To solve the above problem, in this paper, we propose a novel framework called Direct Transition Relationship for Recommendation (DTR4Rec). Specifically, we first construct a long-term direct transition matrix and a short-term co-occurrence matrix among items based on their occurrence patterns in the interaction data. The long-term direct transition matrix is constructed by counting the frequency of transitions from one item to another within a relatively long window. The short-term co-occurrence matrix is built by counting the frequency of co-occurrences of two items within a short window. We further utilize a learnable fusion approach to blend traditional sequence transition patterns with the direct transition relationship among items for predicting the next item. This integration is accomplished through a learnable fusion matrix. Additionally, in order to mitigate the data sparsity problem and enhance the generalization of the model, we propose a new paradigm for computing item similarity, which considers both collaborative filtering similarity and sequential similarity among items, then such similarity is utilized to substitute part of items in the sequence, thereby creating augmented data. We conduct extensive experiments on three real-world datasets, demonstrating that DTR4Rec outperforms state-of-the-art baselines for sequential recommendation.
{"title":"DTR4Rec: direct transition relationship for sequential recommendation","authors":"Ming He, Han Zhang, Zihao Zhang, Chang Liu","doi":"10.1007/s10489-024-05875-3","DOIUrl":"10.1007/s10489-024-05875-3","url":null,"abstract":"<div><p>Sequential recommendation aims at mining user interests through modeling sequential behaviors. Most existing sequential recommendation methods overlook the direct transition relationship among items, and only encode a user sequence as a whole, capturing the intention behind the sequence and predicting the next item with which the user might interact. However, in real-world scenarios, a small subset of items within a sequence may directly impact future interactions due to the direct transition relationship among items. To solve the above problem, in this paper, we propose a novel framework called Direct Transition Relationship for Recommendation (DTR4Rec). Specifically, we first construct a long-term direct transition matrix and a short-term co-occurrence matrix among items based on their occurrence patterns in the interaction data. The long-term direct transition matrix is constructed by counting the frequency of transitions from one item to another within a relatively long window. The short-term co-occurrence matrix is built by counting the frequency of co-occurrences of two items within a short window. We further utilize a learnable fusion approach to blend traditional sequence transition patterns with the direct transition relationship among items for predicting the next item. This integration is accomplished through a learnable fusion matrix. Additionally, in order to mitigate the data sparsity problem and enhance the generalization of the model, we propose a new paradigm for computing item similarity, which considers both collaborative filtering similarity and sequential similarity among items, then such similarity is utilized to substitute part of items in the sequence, thereby creating augmented data. We conduct extensive experiments on three real-world datasets, demonstrating that DTR4Rec outperforms state-of-the-art baselines for sequential recommendation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20DOI: 10.1007/s10489-024-05864-6
Kai Wang, Yanping Chen, Ruizhang Huang, Yongbin Qin
Prototypical networks transform relation instances and relation types into the same semantic space, where a relation instance is assigned a type based on the nearest prototype. Traditional prototypical network methods generate relation prototypes by averaging the sentence representations from a predefined support set, which suffers from two key limitations. One limitation is sensitive to the outliers in the support set that can skew the relation prototypes. Another limitation is the lack of the necessary representational capacity to capture the full complexity of the relation extraction task. To address these limitations, we propose the Prototype Evolution Network (PEN) for relation extraction. First, we assign a type cue for each relation instance to mine the semantics of the relation type. Based on the type cues and relation instances, we then present a prototype refiner comprising a multichannel convolutional neural network and a scaling module to learn and refine the relation prototypes. Finally, we introduce historical prototypes during each episode into the current prototype learning process to enable continuous prototype evolution. We evaluate the PEN on the ACE 2005, SemEval 2010, and CoNLL2004 datasets, and the results demonstrate impressive improvements, with the PEN outperforming existing state-of-the-art methods.
{"title":"A prototype evolution network for relation extraction","authors":"Kai Wang, Yanping Chen, Ruizhang Huang, Yongbin Qin","doi":"10.1007/s10489-024-05864-6","DOIUrl":"10.1007/s10489-024-05864-6","url":null,"abstract":"<div><p>Prototypical networks transform relation instances and relation types into the same semantic space, where a relation instance is assigned a type based on the nearest prototype. Traditional prototypical network methods generate relation prototypes by averaging the sentence representations from a predefined support set, which suffers from two key limitations. One limitation is sensitive to the outliers in the support set that can skew the relation prototypes. Another limitation is the lack of the necessary representational capacity to capture the full complexity of the relation extraction task. To address these limitations, we propose the Prototype Evolution Network (PEN) for relation extraction. First, we assign a type cue for each relation instance to mine the semantics of the relation type. Based on the type cues and relation instances, we then present a prototype refiner comprising a multichannel convolutional neural network and a scaling module to learn and refine the relation prototypes. Finally, we introduce historical prototypes during each episode into the current prototype learning process to enable continuous prototype evolution. We evaluate the PEN on the ACE 2005, SemEval 2010, and CoNLL2004 datasets, and the results demonstrate impressive improvements, with the PEN outperforming existing state-of-the-art methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20DOI: 10.1007/s10489-024-06066-w
Haoxiang Liang, Huansheng Song, Shaoyang Zhang, Yongfeng Bu
Spillages may cause traffic congestion and incidents and seriously affect the efficiency of traffic operation. Due to the changeable shape and scale of a spill on a highway, the location of the spill is random, so the current background extraction and object detection methods cannot achieve good detection results for the spill. This paper proposes a highway spill detection method using an improved STPM anomaly detection network. The method is based on the STPM network and achieves detection through FFDNet image filtering, calculation of the global correlation features of the student and teacher networks, contour positioning of spillages in the feature map, and automatic collection of positive samples to train and update the model, achieving high-precision identification and positioning of the spillages. The experimental results of the custom-built top-view road surface spillage dataset and the MVTec anomaly detection dataset show that the method proposed in this paper can obtain an AOC-ROC value of 0.978 and a PRO score of 0.965 and can distinguish between spillages and reflective cones, avoiding the problem of false detection when spills are similar in appearance. Therefore, the proposed method has value in the research and engineering application of spill detection in special highway scenes.
{"title":"Highway spillage detection using an improved STPM anomaly detection network from a surveillance perspective","authors":"Haoxiang Liang, Huansheng Song, Shaoyang Zhang, Yongfeng Bu","doi":"10.1007/s10489-024-06066-w","DOIUrl":"10.1007/s10489-024-06066-w","url":null,"abstract":"<p>Spillages may cause traffic congestion and incidents and seriously affect the efficiency of traffic operation. Due to the changeable shape and scale of a spill on a highway, the location of the spill is random, so the current background extraction and object detection methods cannot achieve good detection results for the spill. This paper proposes a highway spill detection method using an improved STPM anomaly detection network. The method is based on the STPM network and achieves detection through FFDNet image filtering, calculation of the global correlation features of the student and teacher networks, contour positioning of spillages in the feature map, and automatic collection of positive samples to train and update the model, achieving high-precision identification and positioning of the spillages. The experimental results of the custom-built top-view road surface spillage dataset and the MVTec anomaly detection dataset show that the method proposed in this paper can obtain an AOC-ROC value of 0.978 and a PRO score of 0.965 and can distinguish between spillages and reflective cones, avoiding the problem of false detection when spills are similar in appearance. Therefore, the proposed method has value in the research and engineering application of spill detection in special highway scenes.</p>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1007/s10489-024-06057-x
Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu
Visible-infrared person re-identification (VI Re-ID) is designed to perform pedestrian retrieval on non-overlapping visible-infrared cameras, and it is widely employed in intelligent surveillance. For the VI Re-ID task, one of the main challenges is the huge modality discrepancy between the visible and infrared images. Therefore, mining more shared features in the cross-modality task turns into an important issue. To address this problem, this paper proposes a novel framework for feature learning and feature embedding in VI Re-ID, namely Channel Enhanced Cross-modality Relation Network (CECR-Net). More specifically, the network contains three key modules. In the first module, to shorten the distance between the original modalities, a channel selection operation is applied to the visible images, the robustness against color variations is improved by randomly generating three-channel R/G/B images. The module also exploits the low- and mid-level information of the visible and auxiliary modal images through a feature parameter-sharing strategy. Considering that the body sequences of pedestrians are not easy to change with modality, CECR-Net designs two modules based on relation network for VI Re-ID, namely the intra-relation learning and the cross-relation learning modules. These two modules help to capture the structural relationship between body parts, which is a modality-invariant information, disrupting the isolation between local features. Extensive experiments on the two public benchmarks indicate that CECR-Net is superior compared to the state-of-the-art methods. In particular, for the SYSU-MM01 dataset, the Rank1 and mAP reach 76.83% and 71.56% in the "All Search" mode, respectively.
可见光-红外人员再识别(VI Re-ID)的设计目的是在非重叠的可见光-红外摄像机上执行行人检索,它被广泛应用于智能监控领域。对于 VI Re-ID 任务来说,主要挑战之一是可见光和红外图像之间巨大的模态差异。因此,在跨模态任务中挖掘更多共享特征成为一个重要问题。为解决这一问题,本文提出了一种新颖的 VI Re-ID 特征学习和特征嵌入框架,即通道增强跨模态关系网络(CECR-Net)。具体来说,该网络包含三个关键模块。在第一个模块中,为了缩短原始模态之间的距离,对可见光图像进行了通道选择操作,并通过随机生成 R/G/B 三通道图像提高了对颜色变化的鲁棒性。该模块还通过特征参数共享策略利用了可见光图像和辅助模态图像的中低层信息。考虑到行人的身体序列不易随模态变化,CECR-Net 设计了两个基于关系网络的 VI Re-ID 模块,即内部关系学习模块和交叉关系学习模块。这两个模块有助于捕捉身体部位之间的结构关系,这是一种模态不变的信息,打破了局部特征之间的孤立性。在两个公共基准上进行的大量实验表明,CECR-Net 优于最先进的方法。特别是在 SYSU-MM01 数据集上,在 "全部搜索 "模式下,Rank1 和 mAP 分别达到了 76.83% 和 71.56%。
{"title":"Channel enhanced cross-modality relation network for visible-infrared person re-identification","authors":"Wanru Song, Xinyi Wang, Weimin Wu, Yuan Zhang, Feng Liu","doi":"10.1007/s10489-024-06057-x","DOIUrl":"10.1007/s10489-024-06057-x","url":null,"abstract":"<div><p>Visible-infrared person re-identification (VI Re-ID) is designed to perform pedestrian retrieval on non-overlapping visible-infrared cameras, and it is widely employed in intelligent surveillance. For the VI Re-ID task, one of the main challenges is the huge modality discrepancy between the visible and infrared images. Therefore, mining more shared features in the cross-modality task turns into an important issue. To address this problem, this paper proposes a novel framework for feature learning and feature embedding in VI Re-ID, namely Channel Enhanced Cross-modality Relation Network (CECR-Net). More specifically, the network contains three key modules. In the first module, to shorten the distance between the original modalities, a channel selection operation is applied to the visible images, the robustness against color variations is improved by randomly generating three-channel R/G/B images. The module also exploits the low- and mid-level information of the visible and auxiliary modal images through a feature parameter-sharing strategy. Considering that the body sequences of pedestrians are not easy to change with modality, CECR-Net designs two modules based on relation network for VI Re-ID, namely the intra-relation learning and the cross-relation learning modules. These two modules help to capture the structural relationship between body parts, which is a modality-invariant information, disrupting the isolation between local features. Extensive experiments on the two public benchmarks indicate that CECR-Net is superior compared to the state-of-the-art methods. In particular, for the SYSU-MM01 dataset, the Rank1 and mAP reach 76.83% and 71.56% in the \"All Search\" mode, respectively.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1007/s10489-024-06060-2
Dongxue Shi, Zheng Liu, Shanshan Gao, Ang Li
Cross-modal retrieval aims to retrieve related items in one modality using a query from another modality. As the foundational and key challenge of it, image-text retrieval has garnered significant research interest from scholars. In recent years, hashing techniques have gained widespread interest for large-scale dataset retrieval due to their minimal storage requirements and rapid query processing capabilities. However, existing hashing approaches either learn unified representations for both modalities or specific representations within each modality. The former approach lacks modality-specific information, while the latter does not consider the relationships between image-text pairs across various modalities. Therefore, we propose an innovative supervised hashing method that leverages intra-modality and inter-modality matrix factorization. This method integrates semantic labels into the hash code learning process, aiming to understand both inter-modality and intra-modality relationships within a unified framework for diverse data types. The objective is to preserve inter-modal complementarity and intra-modal consistency in multimodal data. Our approach involves: (1) mapping data from various modalities into a shared latent semantic space through inter-modality matrix factorization to derive unified hash codes, and (2) mapping data from each modality into modality-specific latent semantic spaces via intra-modality matrix factorization to obtain modality-specific hash codes. These are subsequently merged to construct the final hash codes. Experimental results demonstrate that our approach surpasses several state-of-the-art cross-modal image-text retrieval hashing methods. Additionally, ablation studies further validate the effectiveness of each component within our model.
{"title":"Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval","authors":"Dongxue Shi, Zheng Liu, Shanshan Gao, Ang Li","doi":"10.1007/s10489-024-06060-2","DOIUrl":"10.1007/s10489-024-06060-2","url":null,"abstract":"<div><p>Cross-modal retrieval aims to retrieve related items in one modality using a query from another modality. As the foundational and key challenge of it, image-text retrieval has garnered significant research interest from scholars. In recent years, hashing techniques have gained widespread interest for large-scale dataset retrieval due to their minimal storage requirements and rapid query processing capabilities. However, existing hashing approaches either learn unified representations for both modalities or specific representations within each modality. The former approach lacks modality-specific information, while the latter does not consider the relationships between image-text pairs across various modalities. Therefore, we propose an innovative supervised hashing method that leverages intra-modality and inter-modality matrix factorization. This method integrates semantic labels into the hash code learning process, aiming to understand both inter-modality and intra-modality relationships within a unified framework for diverse data types. The objective is to preserve inter-modal complementarity and intra-modal consistency in multimodal data. Our approach involves: (1) mapping data from various modalities into a shared latent semantic space through inter-modality matrix factorization to derive unified hash codes, and (2) mapping data from each modality into modality-specific latent semantic spaces via intra-modality matrix factorization to obtain modality-specific hash codes. These are subsequently merged to construct the final hash codes. Experimental results demonstrate that our approach surpasses several state-of-the-art cross-modal image-text retrieval hashing methods. Additionally, ablation studies further validate the effectiveness of each component within our model.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1007/s10489-024-06058-w
Hongmin Sun, Ao Kan, Jianhao Liu, Wei Du
In recent years, heterogeneous graphs, a complex graph structure that can express multiple types of nodes and edges, have been widely used for modeling various real-world scenarios. As a powerful analysis tool, heterogeneous graph neural networks (HGNNs) can effectively mine the information and knowledge in heterogeneous graphs. However, designing an excellent HGNN architecture requires a lot of domain knowledge and is a time-consuming and laborious task. Inspired by neural architecture search (NAS), some works on homogeneous graph NAS have emerged. However, there are few works on heterogeneous graph NAS. In addition, the hyperparameters related to the HGNN architecture are also important factors affecting its performance in downstream tasks. Manually tuning hyperparameters is also a tedious and inefficient process. To solve the above problems, we propose a novel search (HG-Search for short) algorithm specifically for HGNNs, which achieves fully automatic architecture design and hyperparameter tuning. Specifically, we first design a search space for HG-Search, composed of two parts: HGNN architecture search space and hyperparameter search space. Furthermore, we propose a multi-stage search (MS-Search for short) module and combine it with the policy gradient search (PG-Search for short). Experiments on real-world datasets show that this method can design HGNN architectures comparable to those manually designed by humans and achieve automatic hyperparameter tuning, significantly improving the performance in downstream tasks. The code and related datasets can be found at https://github.com/dawn-creator/HG-Search.
{"title":"HG-search: multi-stage search for heterogeneous graph neural networks","authors":"Hongmin Sun, Ao Kan, Jianhao Liu, Wei Du","doi":"10.1007/s10489-024-06058-w","DOIUrl":"10.1007/s10489-024-06058-w","url":null,"abstract":"<div><p>In recent years, heterogeneous graphs, a complex graph structure that can express multiple types of nodes and edges, have been widely used for modeling various real-world scenarios. As a powerful analysis tool, heterogeneous graph neural networks (HGNNs) can effectively mine the information and knowledge in heterogeneous graphs. However, designing an excellent HGNN architecture requires a lot of domain knowledge and is a time-consuming and laborious task. Inspired by neural architecture search (NAS), some works on homogeneous graph NAS have emerged. However, there are few works on heterogeneous graph NAS. In addition, the hyperparameters related to the HGNN architecture are also important factors affecting its performance in downstream tasks. Manually tuning hyperparameters is also a tedious and inefficient process. To solve the above problems, we propose a novel search (HG-Search for short) algorithm specifically for HGNNs, which achieves fully automatic architecture design and hyperparameter tuning. Specifically, we first design a search space for HG-Search, composed of two parts: HGNN architecture search space and hyperparameter search space. Furthermore, we propose a multi-stage search (MS-Search for short) module and combine it with the policy gradient search (PG-Search for short). Experiments on real-world datasets show that this method can design HGNN architectures comparable to those manually designed by humans and achieve automatic hyperparameter tuning, significantly improving the performance in downstream tasks. The code and related datasets can be found at https://github.com/dawn-creator/HG-Search.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1007/s10489-024-06061-1
Ruidong Wang, Chao Li, Zhongying Zhao
Multimodal Recommendation (MR) exploits multimodal features of items (e.g., visual or textual features) to provide personalized recommendations for users. Recently, scholars have integrated Graph Convolutional Networks (GCN) into MR to model complicated multimodal relationships, but still with two significant challenges: (1) Most MR methods fail to consider the correlations between different modalities, which significantly affects the modal alignment, resulting in poor performance on MR tasks. (2) Most MR methods leverage multimodal features to enhance item representation learning. However, the connection between multimodal features and user representations remains largely unexplored. To this end, we propose a novel yet effective Cross-modal Attention-enhanced graph convolution network for user-specific Multimodal Recommendation, named CAMR. Specifically, we design a cross-modal attention mechanism to mine the cross-modal correlations. In addition, we devise a modality-aware user feature learning method that uses rich item information to learn user feature representations. Experimental results on four real-world datasets demonstrate the superiority of CAMR compared with several state-of-the-art methods. The codes of this work are available at https://github.com/ZZY-GraphMiningLab/CAMR
{"title":"Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution network","authors":"Ruidong Wang, Chao Li, Zhongying Zhao","doi":"10.1007/s10489-024-06061-1","DOIUrl":"10.1007/s10489-024-06061-1","url":null,"abstract":"<div><p>Multimodal Recommendation (MR) exploits multimodal features of items (e.g., visual or textual features) to provide personalized recommendations for users. Recently, scholars have integrated Graph Convolutional Networks (GCN) into MR to model complicated multimodal relationships, but still with two significant challenges: (1) Most MR methods fail to consider the correlations between different modalities, which significantly affects the modal alignment, resulting in poor performance on MR tasks. (2) Most MR methods leverage multimodal features to enhance item representation learning. However, the connection between multimodal features and user representations remains largely unexplored. To this end, we propose a novel yet effective Cross-modal Attention-enhanced graph convolution network for user-specific Multimodal Recommendation, named CAMR. Specifically, we design a cross-modal attention mechanism to mine the cross-modal correlations. In addition, we devise a modality-aware user feature learning method that uses rich item information to learn user feature representations. Experimental results on four real-world datasets demonstrate the superiority of CAMR compared with several state-of-the-art methods. The codes of this work are available at https://github.com/ZZY-GraphMiningLab/CAMR</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142664487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1007/s10489-024-05861-9
Elias Zavitsanos, Dimitrios Kelesis, Georgios Paliouras
In this paper, we deal with the task of identifying the probability of misstatements in the annual financial reports of public companies. In particular, we improve the state-of-the-art for financial misstatement detection by training a TabTransformer model with a gated multi-layer perceptron, which encodes and exploits relationships between financial features. We further calibrate a sample-dependent focal loss function to deal with the severe class imbalance in the data and to focus on positive examples that are hard to distinguish. We evaluate the proposed methodology in a realistic setting that preserves the essential characteristics of the task: (a) the imbalanced distribution of classes in the data, (b) the chronological order of data, and (c) the systematic noise in the labels, due to the delay in manually identifying misstatements. The proposed method achieves state-of-the-art results in this setting, compared to recent approaches in the literature. As an additional contribution, we release the dataset to facilitate further research in the field.
{"title":"Calibrating TabTransformer for financial misstatement detection","authors":"Elias Zavitsanos, Dimitrios Kelesis, Georgios Paliouras","doi":"10.1007/s10489-024-05861-9","DOIUrl":"10.1007/s10489-024-05861-9","url":null,"abstract":"<div><p>In this paper, we deal with the task of identifying the probability of misstatements in the annual financial reports of public companies. In particular, we improve the state-of-the-art for financial misstatement detection by training a TabTransformer model with a gated multi-layer perceptron, which encodes and exploits relationships between financial features. We further calibrate a sample-dependent focal loss function to deal with the severe class imbalance in the data and to focus on positive examples that are hard to distinguish. We evaluate the proposed methodology in a realistic setting that preserves the essential characteristics of the task: (a) the imbalanced distribution of classes in the data, (b) the chronological order of data, and (c) the systematic noise in the labels, due to the delay in manually identifying misstatements. The proposed method achieves state-of-the-art results in this setting, compared to recent approaches in the literature. As an additional contribution, we release the dataset to facilitate further research in the field.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142664419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}