Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3372373
Zihan Yuan;Xinpeng Zhang;Zichi Wang;Zhaoxia Yin
Deep neural networks (DNNs) may be subject to various modifications during transmission and use. Regular processing operations do not affect the functionality of a model, while malicious tampering will cause serious damage. Therefore, it is crucial to determine the availability of a DNN model. To address this issue, we propose a semi-fragile black-box watermarking method that can distinguish between accidental modification and malicious tampering of DNNs, focusing on the privacy and security of neural network models. Specifically, for a given model, a strategy is designed to generate semi-fragile and sensitive samples using adversarial example techniques without decreasing the model accuracy. The model outputs for these samples are extremely sensitive to malicious tampering and robust to accidental modification. According to these properties, accidental modification and malicious tampering can be distinguished to assess the availability of a watermarked model. Extensive experiments demonstrate that the proposed method can detect malicious model tampering with high accuracy up to 100% while tolerating accidental modifications such as fine-tuning, pruning, and quantitation with the accuracy exceed 75%. Moreover, our semi-fragile neural network watermarking approach can be easily extended to various DNNs.
{"title":"Semi-Fragile Neural Network Watermarking Based on Adversarial Examples","authors":"Zihan Yuan;Xinpeng Zhang;Zichi Wang;Zhaoxia Yin","doi":"10.1109/TETCI.2024.3372373","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3372373","url":null,"abstract":"Deep neural networks (DNNs) may be subject to various modifications during transmission and use. Regular processing operations do not affect the functionality of a model, while malicious tampering will cause serious damage. Therefore, it is crucial to determine the availability of a DNN model. To address this issue, we propose a semi-fragile black-box watermarking method that can distinguish between accidental modification and malicious tampering of DNNs, focusing on the privacy and security of neural network models. Specifically, for a given model, a strategy is designed to generate semi-fragile and sensitive samples using adversarial example techniques without decreasing the model accuracy. The model outputs for these samples are extremely sensitive to malicious tampering and robust to accidental modification. According to these properties, accidental modification and malicious tampering can be distinguished to assess the availability of a watermarked model. Extensive experiments demonstrate that the proposed method can detect malicious model tampering with high accuracy up to 100% while tolerating accidental modifications such as fine-tuning, pruning, and quantitation with the accuracy exceed 75%. Moreover, our semi-fragile neural network watermarking approach can be easily extended to various DNNs.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141965840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3369323
You Zhang;Jin Wang;Liang-Chih Yu;Dan Xu;Xuejie Zhang
Personal attributes have been proven to be useful for sentiment analysis. However, previous models of learning attribute-specific language representations are suboptimal because only context- or content-wise injection is adopted. This study proposes a transformer structure with a combination of both context- and content-wise injections based on a well pretrained transformer encoder. For context-wise injection, self-interactive attention is implemented by incorporating personal attributes into a multi-head attention. For the content-wise perspective, an attribute-based layer normalization is used to align text representation with personal attributes. In particular, the proposed transformer layer can be a universal layer compatible with the original Google Transformer layer. Instead of training from scratch, the proposed Transformer layer can be initialized from a well pre-trained checkpoint for downstream tasks. Extensive experiments were conducted on three benchmarks of document-level sentiment analysis, including IMDB, Yelp-2013 and Yelp-2014. The results show that the proposed method outperforms the previous methods for personalized sentiment analysis, demonstrating that the combination of both context- and content-wise injections can facilitate model learning for attribute-specific language representations.
{"title":"Attribute-Based Injection Transformer for Personalized Sentiment Analysis","authors":"You Zhang;Jin Wang;Liang-Chih Yu;Dan Xu;Xuejie Zhang","doi":"10.1109/TETCI.2024.3369323","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3369323","url":null,"abstract":"Personal attributes have been proven to be useful for sentiment analysis. However, previous models of learning attribute-specific language representations are suboptimal because only context- or content-wise injection is adopted. This study proposes a transformer structure with a combination of both context- and content-wise injections based on a well pretrained transformer encoder. For context-wise injection, self-interactive attention is implemented by incorporating personal attributes into a multi-head attention. For the content-wise perspective, an attribute-based layer normalization is used to align text representation with personal attributes. In particular, the proposed transformer layer can be a universal layer compatible with the original Google Transformer layer. Instead of training from scratch, the proposed Transformer layer can be initialized from a well pre-trained checkpoint for downstream tasks. Extensive experiments were conducted on three benchmarks of document-level sentiment analysis, including IMDB, Yelp-2013 and Yelp-2014. The results show that the proposed method outperforms the previous methods for personalized sentiment analysis, demonstrating that the combination of both context- and content-wise injections can facilitate model learning for attribute-specific language representations.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141094887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joint space-time video super-resolution aims to increase both the spatial resolution and the frame rate of a video sequence. As a result, details become more apparent, leading to a better and more realistic viewing experience. This is particularly valuable for applications such as video streaming, video surveillance (object recognition and tracking), and digital entertainment. Over the last few years, several joint space-time video super-resolution methods have been proposed. While those built on deep learning have shown great potential, their performance still falls short. One major reason is that they heavily rely on two-dimensional (2D) convolutional networks, which restricts their capacity to effectively exploit spatio-temporal information. To address this limitation, we propose a novel generative adversarial network for joint space-time video super-resolution. The novelty of our network is twofold. First, we propose a three-dimensional (3D) attention mechanism instead of traditional two-dimensional attention mechanisms. Our generator uses 3D convolutions associated with the proposed 3D attention mechanism to process temporal and spatial information simultaneously and focus on the most important channel and spatial features. Second, we design two discriminator strategies to enhance the performance of the generator. The discriminative network uses a two-branch structure to handle the intra-frame texture details and inter-frame motion occlusions in parallel, making the generated results more accurate. Experimental results on the Vid4, Vimeo-90 K, and REDS datasets demonstrate the effectiveness of the proposed method.
联合时空视频超分辨率旨在提高视频序列的空间分辨率和帧频。因此,细节会变得更加明显,从而带来更好、更逼真的观看体验。这对于视频流、视频监控(物体识别和跟踪)和数字娱乐等应用尤为重要。在过去几年中,已经提出了几种联合时空视频超分辨率方法。虽然这些基于深度学习的方法已经显示出巨大的潜力,但其性能仍有不足。其中一个主要原因是它们严重依赖二维(2D)卷积网络,这限制了它们有效利用时空信息的能力。为了解决这一局限性,我们提出了一种用于联合时空视频超分辨率的新型生成对抗网络。我们网络的新颖之处有两方面。首先,我们提出了一种三维(3D)注意力机制,而不是传统的二维注意力机制。我们的生成器使用与三维注意力机制相关的三维卷积来同时处理时间和空间信息,并聚焦于最重要的信道和空间特征。其次,我们设计了两种判别策略来提高信号发生器的性能。判别网络采用双分支结构,并行处理帧内纹理细节和帧间运动遮挡,使生成的结果更加准确。在 Vid4、Vimeo-90 K 和 REDS 数据集上的实验结果证明了所提方法的有效性。
{"title":"3DAttGAN: A 3D Attention-Based Generative Adversarial Network for Joint Space-Time Video Super-Resolution","authors":"Congrui Fu;Hui Yuan;Liquan Shen;Raouf Hamzaoui;Hao Zhang","doi":"10.1109/TETCI.2024.3369994","DOIUrl":"10.1109/TETCI.2024.3369994","url":null,"abstract":"Joint space-time video super-resolution aims to increase both the spatial resolution and the frame rate of a video sequence. As a result, details become more apparent, leading to a better and more realistic viewing experience. This is particularly valuable for applications such as video streaming, video surveillance (object recognition and tracking), and digital entertainment. Over the last few years, several joint space-time video super-resolution methods have been proposed. While those built on deep learning have shown great potential, their performance still falls short. One major reason is that they heavily rely on two-dimensional (2D) convolutional networks, which restricts their capacity to effectively exploit spatio-temporal information. To address this limitation, we propose a novel generative adversarial network for joint space-time video super-resolution. The novelty of our network is twofold. First, we propose a three-dimensional (3D) attention mechanism instead of traditional two-dimensional attention mechanisms. Our generator uses 3D convolutions associated with the proposed 3D attention mechanism to process temporal and spatial information simultaneously and focus on the most important channel and spatial features. Second, we design two discriminator strategies to enhance the performance of the generator. The discriminative network uses a two-branch structure to handle the intra-frame texture details and inter-frame motion occlusions in parallel, making the generated results more accurate. Experimental results on the Vid4, Vimeo-90 K, and REDS datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141808757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3369478
Zhipeng Liu;Jing He;Tao Gong;Heng Weng;Fu Lee Wang;Hai Liu;Tianyong Hao
Conversational KBQA(Knowledge Based Question Answering) is a sequential question-answering process in the form of conversation based on knowledge, and it has been paid great attention in recent years. One of the major challenges in conversational KBQA is the ellipsis and co-reference of topic entities in follow-up questions, which affects the performance of the whole conversational KBQA. Previous approaches identified the topics of current turn questions by encoding conversation records or modeling entities in conversation records. However, they ignored the meanings carried by the entities themselves in the modeling process. To solve the above problem and mitigate the impact of the problem on the whole KBQA system, we propose a new textual reader to integrate entity-related textual information and construct a graph-based neural network containing the textual reader to determine the topics of questions. The graph-based neural network scores entities in each question in conversations. Further, the scores are jointly cooperated with the similarity between questions and answers to obtain the correct answers in conversational KBQA systems. Our proposed method improved the accuracy with 5.5% at topic entity prediction and 1.5% at conversational KBQA on benchmark datasets compared with baseline methods in more real-world settings respectively. Experiment results on two datasets demonstrate that our proposed method improves the performance of topic tracing and conversational KBQA.
{"title":"Improving Topic Tracing with a Textual Reader for Conversational Knowledge Based Question Answering","authors":"Zhipeng Liu;Jing He;Tao Gong;Heng Weng;Fu Lee Wang;Hai Liu;Tianyong Hao","doi":"10.1109/TETCI.2024.3369478","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3369478","url":null,"abstract":"Conversational KBQA(Knowledge Based Question Answering) is a sequential question-answering process in the form of conversation based on knowledge, and it has been paid great attention in recent years. One of the major challenges in conversational KBQA is the ellipsis and co-reference of topic entities in follow-up questions, which affects the performance of the whole conversational KBQA. Previous approaches identified the topics of current turn questions by encoding conversation records or modeling entities in conversation records. However, they ignored the meanings carried by the entities themselves in the modeling process. To solve the above problem and mitigate the impact of the problem on the whole KBQA system, we propose a new textual reader to integrate entity-related textual information and construct a graph-based neural network containing the textual reader to determine the topics of questions. The graph-based neural network scores entities in each question in conversations. Further, the scores are jointly cooperated with the similarity between questions and answers to obtain the correct answers in conversational KBQA systems. Our proposed method improved the accuracy with 5.5% at topic entity prediction and 1.5% at conversational KBQA on benchmark datasets compared with baseline methods in more real-world settings respectively. Experiment results on two datasets demonstrate that our proposed method improves the performance of topic tracing and conversational KBQA.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3369482
Lin Wei;Long Jin
This paper focuses on an emerging topic that current neural dynamics methods generally fail to accurately solve time-varying nonconvex optimization problems especially when noises are taken into consideration. A collaborative neural solution that fuses the advantages of evolutionary computation and neural dynamics methods is proposed, which follows a meta-heuristic rule and exploits the robust gradient-based neural solution to deal with different noises. The gradient-based neural solution with robustness (GNSR) is proven to converge with the disturbance of noises and experts in local search. Besides, theoretical analysis ensures that the meta-heuristic rule guarantees the optimal solution for the global search with probability one. Lastly, simulative comparisons with existing methods and an application to manipulability optimization on a redundant manipulator substantiate the superiority of the proposed collaborative neural solution in solving the nonconvex time-varying optimization problems.
{"title":"Collaborative Neural Solution for Time-Varying Nonconvex Optimization With Noise Rejection","authors":"Lin Wei;Long Jin","doi":"10.1109/TETCI.2024.3369482","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3369482","url":null,"abstract":"This paper focuses on an emerging topic that current neural dynamics methods generally fail to accurately solve time-varying nonconvex optimization problems especially when noises are taken into consideration. A collaborative neural solution that fuses the advantages of evolutionary computation and neural dynamics methods is proposed, which follows a meta-heuristic rule and exploits the robust gradient-based neural solution to deal with different noises. The gradient-based neural solution with robustness (GNSR) is proven to converge with the disturbance of noises and experts in local search. Besides, theoretical analysis ensures that the meta-heuristic rule guarantees the optimal solution for the global search with probability one. Lastly, simulative comparisons with existing methods and an application to manipulability optimization on a redundant manipulator substantiate the superiority of the proposed collaborative neural solution in solving the nonconvex time-varying optimization problems.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141965869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3369403
Mingyang Lei;Hong Song;Jingfan Fan;Deqiang Xiao;Danni Ai;Ying Gu;Jian Yang
Adversarial attack of convolutional neural networks (CNN) is a technique for deceiving models with perturbations, which provides a way to evaluate the robustness of models. Adversarial attack research has primarily focused on single images. However, videos are more widely used. The existing attack methods generally require iterative optimization on different video sequences with high time-consuming. In this paper, we propose a simple and effective approach for attacking video sequences, called Ghost Adversarial Attack (GAA), to greatly degrade the tracking performance of the state-of-the-art (SOTA) CNN-based trackers with the minimum ghost perturbations. Considering the timeliness of the attack, we only generate the ghost adversarial example once with a novel ghost-generator and use a less computable attack way in subsequent frames. The ghost-generator is used to extract the target region and generate the indistinguishable ghost noise of the target, hence misleading the tracker. Moreover, we propose a novel combined loss that includes the content loss, the ghost loss, and the transferred-fixed loss, which are used in different parts of the proposed method. The combined loss can help to generate similar adversarial examples with slight noises, like a ghost of the real target. Experiments were conducted on six benchmark datasets (UAV123, UAV20L, NFS, LaSOT, OTB50, and OTB100). The experimental results indicate that the ghost adversarial examples produced by GAA are well stealthy while remaining effective in fooling SOTA trackers with high transferability. The GAA can reduce the tracking success rate by an average of 66.6% and the precision rate by an average of 68.3%.
{"title":"GAA: Ghost Adversarial Attack for Object Tracking","authors":"Mingyang Lei;Hong Song;Jingfan Fan;Deqiang Xiao;Danni Ai;Ying Gu;Jian Yang","doi":"10.1109/TETCI.2024.3369403","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3369403","url":null,"abstract":"Adversarial attack of convolutional neural networks (CNN) is a technique for deceiving models with perturbations, which provides a way to evaluate the robustness of models. Adversarial attack research has primarily focused on single images. However, videos are more widely used. The existing attack methods generally require iterative optimization on different video sequences with high time-consuming. In this paper, we propose a simple and effective approach for attacking video sequences, called Ghost Adversarial Attack (GAA), to greatly degrade the tracking performance of the state-of-the-art (SOTA) CNN-based trackers with the minimum ghost perturbations. Considering the timeliness of the attack, we only generate the ghost adversarial example once with a novel ghost-generator and use a less computable attack way in subsequent frames. The ghost-generator is used to extract the target region and generate the indistinguishable ghost noise of the target, hence misleading the tracker. Moreover, we propose a novel combined loss that includes the content loss, the ghost loss, and the transferred-fixed loss, which are used in different parts of the proposed method. The combined loss can help to generate similar adversarial examples with slight noises, like a ghost of the real target. Experiments were conducted on six benchmark datasets (UAV123, UAV20L, NFS, LaSOT, OTB50, and OTB100). The experimental results indicate that the ghost adversarial examples produced by GAA are well stealthy while remaining effective in fooling SOTA trackers with high transferability. The GAA can reduce the tracking success rate by an average of 66.6% and the precision rate by an average of 68.3%.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3370004
Xiang Li;Changfei Zhao;Xinyang Deng;Wen Jiang
Research on the robustness of deep neural networks to adversarial samples has grown rapidly since studies have shown that deep learning is susceptible to adversarial perturbation noise. Adversarial training is widely regarded as the most powerful defence strategy against adversarial attacks out of many defence strategies. It has been shown that the adversarial vulnerability of models is due to the learned non-robust feature in the data. However, few methods have attempted to improve adversarial training by enhancing the critical information in the data, i.e., the important region of the object. Moreover, adversarial training is prone to overfitting the model due to the overuse of training set samples. In this paper, we propose a new adversarial training framework with visual transformation and feature robustness, named VTFR-AT. The visual transformation (VT) module enhances principal information in images, weakens background information, and eliminates nuisance noise by pre-processing images. The feature robustness (FR) loss function enhances the network feature extraction partly against perturbation by constraining the feature similarity of the network on similar images. Extensive experiments have shown that the VTFR framework can substantially promote the performance of models on adversarial samples and improve the adversarial robustness and generalization capabilities. As a plug-and-play module, the proposed framework can be easily combined with various existing adversarial training methods.
{"title":"VTFR-AT: Adversarial Training With Visual Transformation and Feature Robustness","authors":"Xiang Li;Changfei Zhao;Xinyang Deng;Wen Jiang","doi":"10.1109/TETCI.2024.3370004","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3370004","url":null,"abstract":"Research on the robustness of deep neural networks to adversarial samples has grown rapidly since studies have shown that deep learning is susceptible to adversarial perturbation noise. Adversarial training is widely regarded as the most powerful defence strategy against adversarial attacks out of many defence strategies. It has been shown that the adversarial vulnerability of models is due to the learned non-robust feature in the data. However, few methods have attempted to improve adversarial training by enhancing the critical information in the data, i.e., the important region of the object. Moreover, adversarial training is prone to overfitting the model due to the overuse of training set samples. In this paper, we propose a new adversarial training framework with visual transformation and feature robustness, named VTFR-AT. The visual transformation (VT) module enhances principal information in images, weakens background information, and eliminates nuisance noise by pre-processing images. The feature robustness (FR) loss function enhances the network feature extraction partly against perturbation by constraining the feature similarity of the network on similar images. Extensive experiments have shown that the VTFR framework can substantially promote the performance of models on adversarial samples and improve the adversarial robustness and generalization capabilities. As a plug-and-play module, the proposed framework can be easily combined with various existing adversarial training methods.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3358180
Yiming Liu;Yanwei Pang;Ruiqi Jin;Yonghong Hou;Xuelong Li
A major drawback in Magnetic Resonance Imaging (MRI) is the long scan times necessary to acquire complete K-space matrices using phase encoding. This paper proposes a transformer-based deep Reinforcement Learning (RL) framework (called TITLE) to reduce the scan time by sequentially selecting partial phases in real-time so that a slice can be accurately reconstructed from the resultant slice-specific incomplete K-space matrix. As a deep learning based slice-specific method, the TITLE method has the following characteristic and merits: (1) It is real-time because the decision of which phase to be encoded in next time can be made within the period between the time at which an echo signal is obtained and the time at which the next 180° RF pulse is activated. (2) It exploits the powerful feature representation ability of transformer, a self-attention based neural network, for predicting phases with the mechanism of deep reinforcement learning. (3) Both historically selected phases (called phase-indicator vector) and the corresponding undersampled image of the slice being scanned are used for extracting features by transformer. Experimental results on the fastMRI dataset demonstrate that the proposed method is 150 times faster than the state-of-the-art reinforcement learning based method and outperforms the state-of-the-art deep learning based methods in reconstruction accuracy. The source codes are available.
磁共振成像(MRI)的一个主要缺点是使用相位编码获取完整的 K 空间矩阵所需的扫描时间较长。本文提出了一种基于变压器的深度强化学习(RL)框架(称为 TITLE),通过实时依次选择部分相位来缩短扫描时间,这样就能从由此产生的特定切片的不完整 K 空间矩阵中准确地重建切片。作为一种基于深度学习的切片特定方法,TITLE 方法具有以下特点和优点:(1)它具有实时性,因为从获得回波信号到激活下一个 180° 射频脉冲之间的时间段内就能决定下一次要编码的相位。(2) 利用基于自我注意的神经网络变压器强大的特征表示能力,通过深度强化学习机制预测相位。(3) 变压器利用历史选定的相位(称为相位指示向量)和相应的扫描切片欠采样图像来提取特征。在 fastMRI 数据集上的实验结果表明,所提出的方法比最先进的基于强化学习的方法快 150 倍,并且在重建精度上优于最先进的基于深度学习的方法。源代码已发布。
{"title":"Reinforcement Learning and Transformer for Fast Magnetic Resonance Imaging Scan","authors":"Yiming Liu;Yanwei Pang;Ruiqi Jin;Yonghong Hou;Xuelong Li","doi":"10.1109/TETCI.2024.3358180","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3358180","url":null,"abstract":"A major drawback in Magnetic Resonance Imaging (MRI) is the long scan times necessary to acquire complete K-space matrices using phase encoding. This paper proposes a transformer-based deep Reinforcement Learning (RL) framework (called TITLE) to reduce the scan time by sequentially selecting partial phases in real-time so that a slice can be accurately reconstructed from the resultant slice-specific incomplete K-space matrix. As a deep learning based slice-specific method, the TITLE method has the following characteristic and merits: (1) It is real-time because the decision of which phase to be encoded in next time can be made within the period between the time at which an echo signal is obtained and the time at which the next 180° RF pulse is activated. (2) It exploits the powerful feature representation ability of transformer, a self-attention based neural network, for predicting phases with the mechanism of deep reinforcement learning. (3) Both historically selected phases (called phase-indicator vector) and the corresponding undersampled image of the slice being scanned are used for extracting features by transformer. Experimental results on the fastMRI dataset demonstrate that the proposed method is 150 times faster than the state-of-the-art reinforcement learning based method and outperforms the state-of-the-art deep learning based methods in reconstruction accuracy. The source codes are available.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141094881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-18DOI: 10.1109/TETCI.2024.3370033
Yahui Cao;Tao Zhang;Xin Zhao;Yuzheng Yan;Shuxin Cui
Compared with the traditional liquid crystal displays (LCD) systems, the local dimming systems can obtain higher display quality with lower power consumption. Considering local dimming of the static image as an optimization problem and solving it based on an evolutionary algorithm, a set of optimal backlight matrix can be obtained. However, the local dimming algorithm based on evolutionary algorithm is no longer applicable for the video sequences because the calculation is very time-consuming. This paper proposes a local dimming algorithm based on improved surrogate model assisted evolutional algorithm (ISAEA-LD). In this algorithm, the surrogate model assisted evolutionary algorithm is applied to solve the local dimming problem of the video sequences. The surrogate model is used to reduce the complexity of individual fitness evaluation of the evolutionary algorithm. Firstly, a surrogate model based on convolutional neural network is adopted to improve the accuracy of individual fitness evaluation of surrogate model. Secondly, the algorithm introduces the backlight update strategy based on the content correlation between the video sequences' adjacent frames and the model transfer strategy based on transfer learning to improve the efficiency of the algorithm. Experimental results show that the proposed ISAEA-LD algorithm can obtain better visual quality and higher algorithm efficiency.
{"title":"Local Dimming for Video Based on an Improved Surrogate Model Assisted Evolutionary Algorithm","authors":"Yahui Cao;Tao Zhang;Xin Zhao;Yuzheng Yan;Shuxin Cui","doi":"10.1109/TETCI.2024.3370033","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3370033","url":null,"abstract":"Compared with the traditional liquid crystal displays (LCD) systems, the local dimming systems can obtain higher display quality with lower power consumption. Considering local dimming of the static image as an optimization problem and solving it based on an evolutionary algorithm, a set of optimal backlight matrix can be obtained. However, the local dimming algorithm based on evolutionary algorithm is no longer applicable for the video sequences because the calculation is very time-consuming. This paper proposes a local dimming algorithm based on improved surrogate model assisted evolutional algorithm (ISAEA-LD). In this algorithm, the surrogate model assisted evolutionary algorithm is applied to solve the local dimming problem of the video sequences. The surrogate model is used to reduce the complexity of individual fitness evaluation of the evolutionary algorithm. Firstly, a surrogate model based on convolutional neural network is adopted to improve the accuracy of individual fitness evaluation of surrogate model. Secondly, the algorithm introduces the backlight update strategy based on the content correlation between the video sequences' adjacent frames and the model transfer strategy based on transfer learning to improve the efficiency of the algorithm. Experimental results show that the proposed ISAEA-LD algorithm can obtain better visual quality and higher algorithm efficiency.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-17DOI: 10.1109/TETCI.2024.3398015
Arbind Agrahari Baniya;Tsz-Kwan Lee;Peter W. Eklund;Sunil Aryal
Video super-resolution (VSR) is a prominent research topic in low-level computer vision, where deep learning technologies have played a significant role. The rapid progress in deep learning and its applications in VSR has led to a proliferation of tools and techniques in the literature. However, the usage of these methods is often not adequately explained, and decisions are primarily driven by quantitative improvements. Given the significance of VSR's potential influence across multiple domains, it is imperative to conduct a comprehensive analysis of the elements and deep learning methodologies employed in VSR research. This methodical analysis will facilitate the informed development of models tailored to specific application needs. In this paper, we present an overarching overview of deep learning-based video super-resolution models, investigating each component and discussing its implications. Furthermore, we provide a synopsis of key components and technologies employed by state-of-the-art and earlier VSR models. By elucidating the underlying methodologies and categorising them systematically, we identified trends, requirements, and challenges in the domain. As a first-of-its-kind survey of deep learning-based VSR models, this work also establishes a multi-level taxonomy to guide current and future VSR research, enhancing the maturation and interpretation of VSR practices for various practical applications.
{"title":"A Survey of Deep Learning Video Super-Resolution","authors":"Arbind Agrahari Baniya;Tsz-Kwan Lee;Peter W. Eklund;Sunil Aryal","doi":"10.1109/TETCI.2024.3398015","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3398015","url":null,"abstract":"Video super-resolution (VSR) is a prominent research topic in low-level computer vision, where deep learning technologies have played a significant role. The rapid progress in deep learning and its applications in VSR has led to a proliferation of tools and techniques in the literature. However, the usage of these methods is often not adequately explained, and decisions are primarily driven by quantitative improvements. Given the significance of VSR's potential influence across multiple domains, it is imperative to conduct a comprehensive analysis of the elements and deep learning methodologies employed in VSR research. This methodical analysis will facilitate the informed development of models tailored to specific application needs. In this paper, we present an overarching overview of deep learning-based video super-resolution models, investigating each component and discussing its implications. Furthermore, we provide a synopsis of key components and technologies employed by state-of-the-art and earlier VSR models. By elucidating the underlying methodologies and categorising them systematically, we identified trends, requirements, and challenges in the domain. As a first-of-its-kind survey of deep learning-based VSR models, this work also establishes a multi-level taxonomy to guide current and future VSR research, enhancing the maturation and interpretation of VSR practices for various practical applications.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141965140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}