Pub Date : 2025-08-27DOI: 10.1109/TAI.2025.3602935
Gang-Feng Ma;Xu-Hua Yang;Peng Jiang
Multimodal recommendation plays a crucial role on online platforms by integrating modalities information such as visual, textual, and audio, which significantly mitigates the sparsity of user–item interaction networks. However, current multimodal recommendation methods primarily enrich item-side representations while neglecting user-side learning. The fusion of structure and information is insufficient. To address these issues, we propose correlation-guided information deep fusion for multimodal recommendation (CIDF). First, we employ graph neural networks to capture collaborative signals based on ID embeddings and multimodal features separately, thereby capturing the independent information of each node’s different representations. Next, we construct the user–user similarity ID graph and the item–item correlation modality graph to capture connection information on user and item sides, respectively. Finally, we propose an information deep fusion method. This method integrates the aforementioned two graphs and the user–item interaction graph, thereby obtaining fused representations for both users and items through the process of information propagation and aggregation on graphs. The fused representations are further updated in user–item interaction graph to obtain node representations that better align with user interaction behaviors. We conducted experiments on real-world datasets, and the results demonstrate that CIDF outperforms state-of-the-art methods in multimodal recommendation.
{"title":"Correlation-Guided Information Deep Fusion for Multimodal Recommendation","authors":"Gang-Feng Ma;Xu-Hua Yang;Peng Jiang","doi":"10.1109/TAI.2025.3602935","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602935","url":null,"abstract":"Multimodal recommendation plays a crucial role on online platforms by integrating modalities information such as visual, textual, and audio, which significantly mitigates the sparsity of user–item interaction networks. However, current multimodal recommendation methods primarily enrich item-side representations while neglecting user-side learning. The fusion of structure and information is insufficient. To address these issues, we propose correlation-guided information deep fusion for multimodal recommendation (CIDF). First, we employ graph neural networks to capture collaborative signals based on ID embeddings and multimodal features separately, thereby capturing the independent information of each node’s different representations. Next, we construct the user–user similarity ID graph and the item–item correlation modality graph to capture connection information on user and item sides, respectively. Finally, we propose an information deep fusion method. This method integrates the aforementioned two graphs and the user–item interaction graph, thereby obtaining fused representations for both users and items through the process of information propagation and aggregation on graphs. The fused representations are further updated in user–item interaction graph to obtain node representations that better align with user interaction behaviors. We conducted experiments on real-world datasets, and the results demonstrate that CIDF outperforms state-of-the-art methods in multimodal recommendation.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1584-1595"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-27DOI: 10.1109/TAI.2025.3602931
Minyu Chen;Jingyang Li;Ling-I Wu;Guoqiang Li
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Though adversarial attacks show effectiveness in misleading models, most attack methods are designed to poison a specific image. To investigate the actual effect on the feature space, we introduce the concept of the certified local transferable region. This is a connected area of inputs where we can mathematically guarantee that a single adversarial perturbation will successfully fool the model. The size of this region is a metric to evaluate the local transferability of perturbations. We present a novel method, reverse attack oracle-based search (RAOS), to estimate the maximum size of this region. Our approach efficiently searches for the largest possible vulnerable area around an original input by iteratively refining its boundaries. Each step is guided with a minimal distance attack algorithm and proven with state-of-the-art verifiers. We conduct empirical experiments to evaluate various attacks on different model structures and adversarial training scenarios. We show the advantage of our proposed metric over existing ones and demonstrate its utility in exploring the robustness of neural networks.
{"title":"Certified Local Transferability for Evaluating Adversarial Attacks","authors":"Minyu Chen;Jingyang Li;Ling-I Wu;Guoqiang Li","doi":"10.1109/TAI.2025.3602931","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602931","url":null,"abstract":"Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Though adversarial attacks show effectiveness in misleading models, most attack methods are designed to poison a specific image. To investigate the actual effect on the feature space, we introduce the concept of the certified local transferable region. This is a connected area of inputs where we can mathematically guarantee that a single adversarial perturbation will successfully fool the model. The size of this region is a metric to evaluate the local transferability of perturbations. We present a novel method, reverse attack oracle-based search (RAOS), to estimate the maximum size of this region. Our approach efficiently searches for the largest possible vulnerable area around an original input by iteratively refining its boundaries. Each step is guided with a minimal distance attack algorithm and proven with state-of-the-art verifiers. We conduct empirical experiments to evaluate various attacks on different model structures and adversarial training scenarios. We show the advantage of our proposed metric over existing ones and demonstrate its utility in exploring the robustness of neural networks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1574-1583"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-27DOI: 10.1109/TAI.2025.3602943
Linhua Ye;Yangyang Huang;Ronghua Luo
Open world object detection (OWOD) aims to detect both known and unknown objects in dynamic environments, where unknown instances lack ground-truth supervision during training. Existing methods typically rely on supervision from known categories, leading models to overconfidently classify visually similar unknowns as known classes, and dissimilar ones as background. This known-class prior bias severely hinders the detection of truly novel objects. To address this challenge, we propose a robust unknown object detection method based on dual-granularity reconstruction error modeling. At the fine-grained level, we propose fine-grained masked reconstruction (FMR), which randomly masks feature regions to guide reconstruction toward semantic structures, thereby improving foreground–background discrimination. At the coarse-grained level, we propose adaptive region-based error aggregation (AREA), which aggregates reconstruction errors over object proposals to enhance the model’s sensitivity to ambiguous semantic boundaries while suppressing local outliers. Furthermore, we perform decoupled probabilistic modeling of foreground and background reconstruction errors, enabling soft estimation of unknown object likelihoods without supervision. Extensive experiments on standard OWOD benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches, achieving a +20.6 improvement in unknown object recall (U-Recall) while maintaining strong performance on known classes.
{"title":"Robust Unknown Object Detection in Dynamic Environments Through Dual-Granularity Reconstruction Error Modeling","authors":"Linhua Ye;Yangyang Huang;Ronghua Luo","doi":"10.1109/TAI.2025.3602943","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602943","url":null,"abstract":"Open world object detection (OWOD) aims to detect both known and unknown objects in dynamic environments, where unknown instances lack ground-truth supervision during training. Existing methods typically rely on supervision from known categories, leading models to overconfidently classify visually similar unknowns as known classes, and dissimilar ones as background. This known-class prior bias severely hinders the detection of truly novel objects. To address this challenge, we propose a robust unknown object detection method based on dual-granularity reconstruction error modeling. At the fine-grained level, we propose fine-grained masked reconstruction (FMR), which randomly masks feature regions to guide reconstruction toward semantic structures, thereby improving foreground–background discrimination. At the coarse-grained level, we propose adaptive region-based error aggregation (AREA), which aggregates reconstruction errors over object proposals to enhance the model’s sensitivity to ambiguous semantic boundaries while suppressing local outliers. Furthermore, we perform decoupled probabilistic modeling of foreground and background reconstruction errors, enabling soft estimation of unknown object likelihoods without supervision. Extensive experiments on standard OWOD benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches, achieving a +20.6 improvement in unknown object recall (U-Recall) while maintaining strong performance on known classes.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1596-1609"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data imbalance is a common challenge in real-world applications, often addressed by augmenting minority classes data for a balanced dataset. Generative adversarial networks (GANs) can generate realistic minority class samples in the dynamical adversarial learning way but face limitations due to the optimization falling into local minima. To address this, incorporating simulated annealing into GANs offers a potential method by accepting worse solutions to expand solution exploration. However, it remains underexplored whether every accepting worse solution in the learning dynamics of GAN benefits the learning of minority or majority classes. Therefore, we propose an annealing genetic slicing adversarial network (AGSAN) learning method for visual imbalance classification. It treats adversarial learning as an evolutionary process with the generator undergoing multiple offspring generation, best offspring selection, and individual updating. AGSAN builds a gradient-informed selection mechanism to facilitate individual updating and the best offspring selection, leveraging gradient consistency—measuring similarity between minority classes gradients and overall gradients—to guide optimization. Furthermore, AGSAN expands optimization ranges to facilitate multiple offspring generation by a mixture of multiple adversarial objectives. Additionally, AGSAN ensures the minimization objective function of GAN equals the distance between the generated and target distributions with relaxing the assumption of the optimal discriminator. Compared with 21 existing methods, our AGSAN can achieve state-of-the-art performance on imbalanced classification.
{"title":"Annealing Genetic Slicing Adversarial Networks Based Feedback for Imbalanced Visual Classification","authors":"Yongting Zhao;Zhifan Gao;Jingyu Hao;Yiwen Wang;Heye Zhang","doi":"10.1109/TAI.2025.3602750","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602750","url":null,"abstract":"Data imbalance is a common challenge in real-world applications, often addressed by augmenting minority classes data for a balanced dataset. Generative adversarial networks (GANs) can generate realistic minority class samples in the dynamical adversarial learning way but face limitations due to the optimization falling into local minima. To address this, incorporating simulated annealing into GANs offers a potential method by accepting worse solutions to expand solution exploration. However, it remains underexplored whether every accepting worse solution in the learning dynamics of GAN benefits the learning of minority or majority classes. Therefore, we propose an annealing genetic slicing adversarial network (AGSAN) learning method for visual imbalance classification. It treats adversarial learning as an evolutionary process with the generator undergoing multiple offspring generation, best offspring selection, and individual updating. AGSAN builds a gradient-informed selection mechanism to facilitate individual updating and the best offspring selection, leveraging gradient consistency—measuring similarity between minority classes gradients and overall gradients—to guide optimization. Furthermore, AGSAN expands optimization ranges to facilitate multiple offspring generation by a mixture of multiple adversarial objectives. Additionally, AGSAN ensures the minimization objective function of GAN equals the distance between the generated and target distributions with relaxing the assumption of the optimal discriminator. Compared with 21 existing methods, our AGSAN can achieve state-of-the-art performance on imbalanced classification.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1546-1561"},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-26DOI: 10.1109/TAI.2025.3602763
Xinxin Li;Zichi Wang;Xinpeng Zhang
The development of deep learning has provided new momentum for steganography. However, existing model steganography methods are generally applicable to convolutional neural networks models, which suffer from low embedding capacity and poor robustness. To this end, we propose a stego scheme designed for large language models based on the transformer architecture. Using the powerful feature representation ability and multilayer self-attention mechanism of transformer, a large amount of secret data can be embedded without significantly affecting the performance of the model. In our scheme, the sender uses matrix multiplication to encode the coverage parameters of a specific transformer block to embed secret data during the training process of the large language models. Ordinary users can use the stego model for text classification, text generation, and other routine tasks, while receivers can use a secret key to extract the secret data from the stego model, allowing for covert communication of secret data. Experimental results affirm the efficacy of our scheme in terms of embedding capacity, undetectability, and robustness.
{"title":"Steganography in Large Language Models","authors":"Xinxin Li;Zichi Wang;Xinpeng Zhang","doi":"10.1109/TAI.2025.3602763","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602763","url":null,"abstract":"The development of deep learning has provided new momentum for steganography. However, existing model steganography methods are generally applicable to convolutional neural networks models, which suffer from low embedding capacity and poor robustness. To this end, we propose a stego scheme designed for large language models based on the transformer architecture. Using the powerful feature representation ability and multilayer self-attention mechanism of transformer, a large amount of secret data can be embedded without significantly affecting the performance of the model. In our scheme, the sender uses matrix multiplication to encode the coverage parameters of a specific transformer block to embed secret data during the training process of the large language models. Ordinary users can use the stego model for text classification, text generation, and other routine tasks, while receivers can use a secret key to extract the secret data from the stego model, allowing for covert communication of secret data. Experimental results affirm the efficacy of our scheme in terms of embedding capacity, undetectability, and robustness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1562-1573"},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-25DOI: 10.1109/TAI.2025.3602015
Mengyuan Cui;Yi Zuo;Shaocheng Tong
The adaptive fuzzy distributed optimal fault tolerant control (FTC) problem is investigated for high-order nonlinear multi-agent (NMA) systems under weight-unbalanced directed graph. Since the optimization point of the high-order NMA systems considered in this study is unknown, the optimal signal generator is formulated to obtain the optimization point. Then, a fuzzy state observer is established to estimate unmeasurable states. Based on the designed optimal signal generator and fuzzy state observer, an adaptive fuzzy distributed optimal output-feedback FTC scheme is proposed by using backstepping control technology. It is proved that the NMA system is asymptotically stable, and the global cost function is minimized. Finally, we apply the proposed adaptive fuzzy distributed optimal output-feedback FTC approach to nonholonomic mobile robots with two actuated wheels, the simulation results and comparison results verify its effectiveness.
{"title":"Adaptive Fuzzy Distributed Optimal Fault Tolerant Control of Nonlinear Multi-Agent Systems Under Weight-Unbalanced Directed Graphs","authors":"Mengyuan Cui;Yi Zuo;Shaocheng Tong","doi":"10.1109/TAI.2025.3602015","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602015","url":null,"abstract":"The adaptive fuzzy distributed optimal fault tolerant control (FTC) problem is investigated for high-order nonlinear multi-agent (NMA) systems under weight-unbalanced directed graph. Since the optimization point of the high-order NMA systems considered in this study is unknown, the optimal signal generator is formulated to obtain the optimization point. Then, a fuzzy state observer is established to estimate unmeasurable states. Based on the designed optimal signal generator and fuzzy state observer, an adaptive fuzzy distributed optimal output-feedback FTC scheme is proposed by using backstepping control technology. It is proved that the NMA system is asymptotically stable, and the global cost function is minimized. Finally, we apply the proposed adaptive fuzzy distributed optimal output-feedback FTC approach to nonholonomic mobile robots with two actuated wheels, the simulation results and comparison results verify its effectiveness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1512-1521"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid increase of multimodal comments on social media, multimodal sentiment detection has become increasingly important. However, most existing methods overlook the difference in information density between text and images, and fall short in fully utilizing multiscale information in images. To address this issue, we propose a multiscale adaptive fusion model termed MSAF for multimodal sentiment detection. MSAF first extracts fine- and coarse-scale features of images through a multiscale visual encoder and uses a multiscale adaptive pooling module to adaptively adjust the weights of different regional features. Then, MSAF incorporates multiscale contrastive learning and multiscale rivalry tasks to ensure that the model retains associations between features at different scales while maintaining their diversity. These features are sequentially fused with text through a hierarchical fusion encoder guided by textual information, enabling MSAF to focus on sentiment-salient regions in the image. Finally, the multimodal fusion embeddings are fed into a classifier to predict the sentiment. Extensive experiments on multiple public datasets demonstrate the effectiveness and superiority of MSAF.
{"title":"MSAF: Multimodal Sentiment Detection via Multiscale Adaptive Fusion","authors":"Jihong Guan;Yulou Shu;Wuchao Liu;Wengen Li;Shuigeng Zhou;Yichao Zhang","doi":"10.1109/TAI.2025.3602409","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602409","url":null,"abstract":"With the rapid increase of multimodal comments on social media, multimodal sentiment detection has become increasingly important. However, most existing methods overlook the difference in information density between text and images, and fall short in fully utilizing multiscale information in images. To address this issue, we propose a multiscale adaptive fusion model termed MSAF for multimodal sentiment detection. MSAF first extracts fine- and coarse-scale features of images through a multiscale visual encoder and uses a multiscale adaptive pooling module to adaptively adjust the weights of different regional features. Then, MSAF incorporates multiscale contrastive learning and multiscale rivalry tasks to ensure that the model retains associations between features at different scales while maintaining their diversity. These features are sequentially fused with text through a hierarchical fusion encoder guided by textual information, enabling MSAF to focus on sentiment-salient regions in the image. Finally, the multimodal fusion embeddings are fed into a classifier to predict the sentiment. Extensive experiments on multiple public datasets demonstrate the effectiveness and superiority of MSAF.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1533-1545"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-25DOI: 10.1109/TAI.2025.3601599
Yingjie Chen;Qi Xie;Wenxuan Cui;Liming Chen;Houbing Herbert Song;Tao Zhu
Supervised human activity recognition (HAR) with sensor data typically demands substantial labeled datasets to train robust models. Contrastive learning offers a self-supervised alternative by leveraging data augmentation to improve representation learning. However, most existing augmentation methods operate independently on either the time or channel dimension and often introduce unstructured noise, which can distort meaningful temporal and spectral patterns. To address these limitations, we present a novel P-Mix data augmentation method for contrastive learning in HAR tasks, specifically designed to be compatible with the SimCLR framework. P-Mix is a customized data augmentation method tailored to sensor data for HAR, which slices and recombines both the time and channel dimensions, merging multiple temporal segments to encourage the model to explore the underlying relationships and variations in the data in an unsupervised setting. To capture motion cycles and long-term dependencies, we employ shorter temporal segments as fundamental processing units along the time dimension. By incorporating structured noise patterns based on motion cycle characteristics within these segments, we effectively enhance the model’s robustness and generalization capabilities. Extensive evaluations across five HAR benchmarks demonstrate that P-Mix achieves consistent improvements over the strongest baseline (resample), delivering relative F1-score gains ranging from 1.87% (USC-HAD: 85.63% versus 83.93%) to 6.53% (DSADS: 97.24% versus 91.28%) through controlled multidimensional fusion. These results demonstrate the effectiveness of our approach in optimizing data generation and augmentation strategies for HAR tasks.
{"title":"P-Mix: A Data Augmentation Method for Contrastive Learning Based Human Activity Recognition","authors":"Yingjie Chen;Qi Xie;Wenxuan Cui;Liming Chen;Houbing Herbert Song;Tao Zhu","doi":"10.1109/TAI.2025.3601599","DOIUrl":"https://doi.org/10.1109/TAI.2025.3601599","url":null,"abstract":"Supervised human activity recognition (HAR) with sensor data typically demands substantial labeled datasets to train robust models. Contrastive learning offers a self-supervised alternative by leveraging data augmentation to improve representation learning. However, most existing augmentation methods operate independently on either the time or channel dimension and often introduce unstructured noise, which can distort meaningful temporal and spectral patterns. To address these limitations, we present a novel P-Mix data augmentation method for contrastive learning in HAR tasks, specifically designed to be compatible with the SimCLR framework. P-Mix is a customized data augmentation method tailored to sensor data for HAR, which slices and recombines both the time and channel dimensions, merging multiple temporal segments to encourage the model to explore the underlying relationships and variations in the data in an unsupervised setting. To capture motion cycles and long-term dependencies, we employ shorter temporal segments as fundamental processing units along the time dimension. By incorporating structured noise patterns based on motion cycle characteristics within these segments, we effectively enhance the model’s robustness and generalization capabilities. Extensive evaluations across five HAR benchmarks demonstrate that P-Mix achieves consistent improvements over the strongest baseline (resample), delivering relative F1-score gains ranging from 1.87% (USC-HAD: 85.63% versus 83.93%) to 6.53% (DSADS: 97.24% versus 91.28%) through controlled multidimensional fusion. These results demonstrate the effectiveness of our approach in optimizing data generation and augmentation strategies for HAR tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1500-1511"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-25DOI: 10.1109/TAI.2025.3602017
Zhijun Zhang;Yong Ding;Jian Zhang
To improve the accuracy of breast cancer diagnosis and reduce examination costs, a novel ensemble learning method called support vector dynamic learning neural network (SVDL) is proposed in this article. The method first constructs the breast cancer classification problem as a standard quadratic programming (QP) problem based on support vector machine (SVM). Then, a novel dynamic learning neural network (DLNN) solver is designed to solve this problem to obtain the optimal diagnosis model. The experimental results on the Wisconsin diagnostic breast cancer dataset show that the proposed method is superior to traditional and state-of-the-art machine learning methods, achieving the best accuracy ($boldsymbol{98.59}%$) and area under curve value (0.9956), as well as high specificity ($boldsymbol{98.85}%$) and sensitivity ($boldsymbol{98.18}%$). It demonstrates that the proposed method has good classification performance. Furthermore, the proposed method may further enhance the model performance by introducing swarm intelligence algorithm to search for the optimal value of model parameters, which will contribute to the diagnosis of breast cancer and other diseases as well.
{"title":"A Novel Ensemble Method Based on Support Vector Dynamic Learning Neural Network for Breast Cancer Diagnosis","authors":"Zhijun Zhang;Yong Ding;Jian Zhang","doi":"10.1109/TAI.2025.3602017","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602017","url":null,"abstract":"To improve the accuracy of breast cancer diagnosis and reduce examination costs, a novel ensemble learning method called support vector dynamic learning neural network (SVDL) is proposed in this article. The method first constructs the breast cancer classification problem as a standard quadratic programming (QP) problem based on support vector machine (SVM). Then, a novel dynamic learning neural network (DLNN) solver is designed to solve this problem to obtain the optimal diagnosis model. The experimental results on the Wisconsin diagnostic breast cancer dataset show that the proposed method is superior to traditional and state-of-the-art machine learning methods, achieving the best accuracy (<inline-formula><tex-math>$boldsymbol{98.59}%$</tex-math></inline-formula>) and area under curve value (0.9956), as well as high specificity (<inline-formula><tex-math>$boldsymbol{98.85}%$</tex-math></inline-formula>) and sensitivity (<inline-formula><tex-math>$boldsymbol{98.18}%$</tex-math></inline-formula>). It demonstrates that the proposed method has good classification performance. Furthermore, the proposed method may further enhance the model performance by introducing swarm intelligence algorithm to search for the optimal value of model parameters, which will contribute to the diagnosis of breast cancer and other diseases as well.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1522-1532"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vehicle reidentification (Re-ID) targets cross-camera image retrieval and is a widely used technology in intelligent transportation systems. Current Re-ID methods primarily enhance feature extraction by focusing on either global or local features, but they often fail to effectively leverage diverse information. To address these limitations, we propose a multiinformation feature enhancement network (MFENet) that integrates diverse information types to enhance feature representation and boost model accuracy. Specifically, 1) a coarse-grained feature enhancement (CFE) module is employed to remove background influence on image features. This module filters the background, enabling the network model to extract more accurate vehicle features, such as color and model. 2) A fine-grained feature enhancement (FFE) module collects detailed information about vehicles by extracting features from subtle areas (e.g., vehicle lights and rearview mirrors) of an image, providing more unique clues about the vehicle. 3) A latent feature enhancement (LFE) module is designed to mine latent features and enrich vehicle features using nonvisual cues, such as the vehicle’s camera and orientation, without relying on image information. Extensive experiments on vehicle Re-ID datasets demonstrate that MFENet outperforms most existing methods.
{"title":"MFENet: Multiinformation Feature Enhancement Network for Vehicle Reidentification","authors":"Zhangwei Li;Yuhui Deng;Ke Wang;Junhao Huang;Zhimin Tang;Weiping Ding","doi":"10.1109/TAI.2025.3601594","DOIUrl":"https://doi.org/10.1109/TAI.2025.3601594","url":null,"abstract":"Vehicle reidentification (Re-ID) targets cross-camera image retrieval and is a widely used technology in intelligent transportation systems. Current Re-ID methods primarily enhance feature extraction by focusing on either global or local features, but they often fail to effectively leverage diverse information. To address these limitations, we propose a multiinformation feature enhancement network (MFENet) that integrates diverse information types to enhance feature representation and boost model accuracy. Specifically, 1) a coarse-grained feature enhancement (CFE) module is employed to remove background influence on image features. This module filters the background, enabling the network model to extract more accurate vehicle features, such as color and model. 2) A fine-grained feature enhancement (FFE) module collects detailed information about vehicles by extracting features from subtle areas (e.g., vehicle lights and rearview mirrors) of an image, providing more unique clues about the vehicle. 3) A latent feature enhancement (LFE) module is designed to mine latent features and enrich vehicle features using nonvisual cues, such as the vehicle’s camera and orientation, without relying on image information. Extensive experiments on vehicle Re-ID datasets demonstrate that MFENet outperforms most existing methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1487-1499"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}