Pub Date : 2024-09-03DOI: 10.1109/TETCI.2024.3418837
Daohan Yu;Liqing Qiu
The primary objective of model compression is to maintain the performance of the original model while reducing its size as much as possible. Knowledge distillation has become the mainstream method in the field of model compression due to its excellent performance. However, current knowledge distillation methods for medium and small pre-trained models struggle to effectively extract knowledge from large pre-trained models. Similarly, methods targeting large pre-trained models face challenges in compressing the model to a smaller scale. Therefore, this paper proposes a new model compression method called Attention-based Replacement Compression (ARC), which introduces layer random replacement based on fine-grained self-attention distillation. This method first obtains the important features of the original model through fine-grained self-attention distillation in the pre-training distillation stage. More information can be obtained by extracting the upper layers of the large teacher model. Then, the one-to-one Transformer-layer random replacement training fully explores the hidden knowledge of the large pre-trained model in the fine-tuning compression stage. Compared with other complex compression methods, ARC not only simplifies the training process of model compression but also enhances the applicability of the compressed model. This paper compares knowledge distillation methods for pre-trained models of different sizes on the GLUE benchmark. Experimental results demonstrate that the proposed method achieves significant improvements across different parameter scales, especially in terms of accuracy and inference speed.
{"title":"ARC: A Layer Replacement Compression Method Based on Fine-Grained Self-Attention Distillation for Compressing Pre-Trained Language Models","authors":"Daohan Yu;Liqing Qiu","doi":"10.1109/TETCI.2024.3418837","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3418837","url":null,"abstract":"The primary objective of model compression is to maintain the performance of the original model while reducing its size as much as possible. Knowledge distillation has become the mainstream method in the field of model compression due to its excellent performance. However, current knowledge distillation methods for medium and small pre-trained models struggle to effectively extract knowledge from large pre-trained models. Similarly, methods targeting large pre-trained models face challenges in compressing the model to a smaller scale. Therefore, this paper proposes a new model compression method called Attention-based Replacement Compression (ARC), which introduces layer random replacement based on fine-grained self-attention distillation. This method first obtains the important features of the original model through fine-grained self-attention distillation in the pre-training distillation stage. More information can be obtained by extracting the upper layers of the large teacher model. Then, the one-to-one Transformer-layer random replacement training fully explores the hidden knowledge of the large pre-trained model in the fine-tuning compression stage. Compared with other complex compression methods, ARC not only simplifies the training process of model compression but also enhances the applicability of the compressed model. This paper compares knowledge distillation methods for pre-trained models of different sizes on the GLUE benchmark. Experimental results demonstrate that the proposed method achieves significant improvements across different parameter scales, especially in terms of accuracy and inference speed.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"848-860"},"PeriodicalIF":5.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1109/TETCI.2024.3449890
Zhiye Bai;Shenggang Li;Heng Liu
Noting that in traditional adaptive fuzzy controller (AFC) design, only the convergence of tracking error rather than fuzzy approximation error can be guaranteed. This paper focuses on tracking control of fractional-order systems subjected to model uncertainties together with external disturbances. Firstly, an AFC that blends the fuzzy logic system and the input constraint is proposed, where a disturbance observer is constructed to estimate the compounded disturbance. To improve the fuzzy approximation performance, a fractional-order serial parallel estimation model that combines with a fuzzy logic system and a disturbance observer is exploited to generate prediction errors, and both tracking errors and prediction errors are utilized simultaneously to construct parameter update laws, so that a composite learning fuzzy controller (CLFC) is implemented. In addition, a compound disturbance observer is proposed based on the system state and the prediction error while the disturbance estimation error is ensured to remain inside a bounded closed set. The proposed CLFC can not only assure the stability of the closed-loop system but also achieve an accurate estimation of function uncertainties and unknown compounded disturbances. Finally, the effectiveness of the proposed control algorithm is demonstrated via simulation results.
{"title":"Fuzzy Composite Learning Control of Uncertain Fractional-Order Nonlinear Systems Using Disturbance Observer","authors":"Zhiye Bai;Shenggang Li;Heng Liu","doi":"10.1109/TETCI.2024.3449890","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3449890","url":null,"abstract":"Noting that in traditional adaptive fuzzy controller (AFC) design, only the convergence of tracking error rather than fuzzy approximation error can be guaranteed. This paper focuses on tracking control of fractional-order systems subjected to model uncertainties together with external disturbances. Firstly, an AFC that blends the fuzzy logic system and the input constraint is proposed, where a disturbance observer is constructed to estimate the compounded disturbance. To improve the fuzzy approximation performance, a fractional-order serial parallel estimation model that combines with a fuzzy logic system and a disturbance observer is exploited to generate prediction errors, and both tracking errors and prediction errors are utilized simultaneously to construct parameter update laws, so that a composite learning fuzzy controller (CLFC) is implemented. In addition, a compound disturbance observer is proposed based on the system state and the prediction error while the disturbance estimation error is ensured to remain inside a bounded closed set. The proposed CLFC can not only assure the stability of the closed-loop system but also achieve an accurate estimation of function uncertainties and unknown compounded disturbances. Finally, the effectiveness of the proposed control algorithm is demonstrated via simulation results.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"1078-1090"},"PeriodicalIF":5.3,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, with the rapid growth number of multi-modal 3D shapes, it has become increasingly important to efficiently recognize a vast number of unlabeled multi-modal 3D shapes through clustering. However, the multi-modal 3D shape instances are usually incomplete in practical applications, which poses a considerable challenge for multi-modal 3D shape clustering. To this end, this paper proposes an incomplete multi-modal 3D shape clustering method with cross mapping and dual adaptive fusion, termed as 3D-IMMC, to alleviate the negative impact of the missing modal instances in multi-modal 3D shapes, thus obtaining competitive clustering results. To the best of our knowledge, this paper is the first attempt to the incomplete multi-modal 3D shape clustering task. By exploring the spatial relationship between different 3D shape modalities, a spatial-aware representation cross-mapping module is proposed to generate representations of missing modal instances. Then, a dual adaptive representation fusion module is designed to obtain comprehensive 3D shape representations for clustering. Extensive experiments on the 3D shape benchmark datasets (i.e., ModelNet10 and ModelNet40) have demonstrated that the proposed 3D-IMMC achieves promising 3D shape clustering performance.
{"title":"3D-IMMC: Incomplete Multi-Modal 3D Shape Clustering via Cross Mapping and Dual Adaptive Fusion","authors":"Tianyi Qin;Bo Peng;Jianjun Lei;Jiahui Song;Liying Xu;Qingming Huang","doi":"10.1109/TETCI.2024.3436866","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3436866","url":null,"abstract":"In recent years, with the rapid growth number of multi-modal 3D shapes, it has become increasingly important to efficiently recognize a vast number of unlabeled multi-modal 3D shapes through clustering. However, the multi-modal 3D shape instances are usually incomplete in practical applications, which poses a considerable challenge for multi-modal 3D shape clustering. To this end, this paper proposes an incomplete multi-modal 3D shape clustering method with cross mapping and dual adaptive fusion, termed as 3D-IMMC, to alleviate the negative impact of the missing modal instances in multi-modal 3D shapes, thus obtaining competitive clustering results. To the best of our knowledge, this paper is the first attempt to the incomplete multi-modal 3D shape clustering task. By exploring the spatial relationship between different 3D shape modalities, a spatial-aware representation cross-mapping module is proposed to generate representations of missing modal instances. Then, a dual adaptive representation fusion module is designed to obtain comprehensive 3D shape representations for clustering. Extensive experiments on the 3D shape benchmark datasets (i.e., ModelNet10 and ModelNet40) have demonstrated that the proposed 3D-IMMC achieves promising 3D shape clustering performance.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"99-108"},"PeriodicalIF":5.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electroencephalogram (EEG) is a widely used neural imaging technique for modeling motor characteristics. However, current studies have primarily focused on temporal representations of EEG, with less emphasis on the spatial and functional connections among electrodes. This study introduces a novel two-stream model to analyze both temporal and spatial representations of EEG for learning motor characteristics. Temporal representations are extracted with a set of convolutional neural networks (CNN) treated as dynamic filters, while spatial representations are learned by graph neural networks (GNN) using learnable adjacency matrices. At each stage, a res-block is designed to integrate temporal and spatial representations, facilitating a fusion of temporal-spatial information. Finally, the summarized representations of both streams are fused with fully connected neural networks to learn motor characteristics. Experimental evaluations on the Physionet, OpenBMI, and BCI Competition IV Dataset 2a demonstrate the model's efficacy, achieving accuracies of $73.6%/70.4%$ for four-class subject-dependent/independent paradigms, $84.2%/82.0%$ for two-class subject-dependent/independent paradigms, and 78.5% for a four-class subject-dependent paradigm, respectively. The encouraged results underscore the model's potential in understanding EEG-based motor characteristics, paving the way for advanced brain-computer interface systems.
{"title":"Learning EEG Motor Characteristics via Temporal-Spatial Representations","authors":"Tian-Yu Xiang;Xiao-Hu Zhou;Xiao-Liang Xie;Shi-Qi Liu;Hong-Jun Yang;Zhen-Qiu Feng;Mei-Jiang Gui;Hao Li;De-Xing Huang;Xiu-Ling Liu;Zeng-Guang Hou","doi":"10.1109/TETCI.2024.3425328","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3425328","url":null,"abstract":"Electroencephalogram (EEG) is a widely used neural imaging technique for modeling motor characteristics. However, current studies have primarily focused on temporal representations of EEG, with less emphasis on the spatial and functional connections among electrodes. This study introduces a novel two-stream model to analyze both temporal and spatial representations of EEG for learning motor characteristics. Temporal representations are extracted with a set of convolutional neural networks (CNN) treated as dynamic filters, while spatial representations are learned by graph neural networks (GNN) using learnable adjacency matrices. At each stage, a res-block is designed to integrate temporal and spatial representations, facilitating a fusion of temporal-spatial information. Finally, the summarized representations of both streams are fused with fully connected neural networks to learn motor characteristics. Experimental evaluations on the Physionet, OpenBMI, and BCI Competition IV Dataset 2a demonstrate the model's efficacy, achieving accuracies of <inline-formula><tex-math>$73.6%/70.4%$</tex-math></inline-formula> for four-class subject-dependent/independent paradigms, <inline-formula><tex-math>$84.2%/82.0%$</tex-math></inline-formula> for two-class subject-dependent/independent paradigms, and 78.5% for a four-class subject-dependent paradigm, respectively. The encouraged results underscore the model's potential in understanding EEG-based motor characteristics, paving the way for advanced brain-computer interface systems.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"933-945"},"PeriodicalIF":5.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep neural networks (DNNs) have been widely applied to the synthetic aperture radar (SAR) images detection and classification recently while different kinds of adversarial attacks from malicious adversary and the hidden vulnerability of DNNs may lead to serious security threats. The state-of-the-art DNNs-based SAR image detection models are designed manually by only considering the test accuracy performance on clean datasets but neglecting the models' adversarial robustness under various types of adversarial attacks. In order to obtain the best trade-off between the clean accuracy and adversarial robustness in robust convolutional neural networks (CNNs)-based SAR image classification models, this work makes the first attempt to develop a multi-objective adversarially robust CNN, called MoAR-CNN. In the MoAR-CNN, we propose a multi-objective automatic design method of the cells-based neural architectures and some critical hyperparameters such as the optimizer type and learning rate of CNNs. A Squeeze-and-Excitation (SE) layer is introduced after each cell to improve the computational efficiency and robustness. The experiments on FUSAR-Ship and OpenSARShip datasets against seven types of adversarial attacks have been implemented to demonstrate the superiority of the proposed MoAR-CNN to six classical manually designed CNNs and four robust neural architectures search methods in terms of clean accuracy, adversarial accuracy, and model size. Furthermore, we also demonstrate the advantages of using SE layer in MoAR-CNN, the transferability of MoAR-CNN, search costs, adversarial training, and the developed NSGA-II in MoAR-CNN through experiments.
{"title":"MoAR-CNN: Multi-Objective Adversarially Robust Convolutional Neural Network for SAR Image Classification","authors":"Hai-Nan Wei;Guo-Qiang Zeng;Kang-Di Lu;Guang-Gang Geng;Jian Weng","doi":"10.1109/TETCI.2024.3449908","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3449908","url":null,"abstract":"Deep neural networks (DNNs) have been widely applied to the synthetic aperture radar (SAR) images detection and classification recently while different kinds of adversarial attacks from malicious adversary and the hidden vulnerability of DNNs may lead to serious security threats. The state-of-the-art DNNs-based SAR image detection models are designed manually by only considering the test accuracy performance on clean datasets but neglecting the models' adversarial robustness under various types of adversarial attacks. In order to obtain the best trade-off between the clean accuracy and adversarial robustness in robust convolutional neural networks (CNNs)-based SAR image classification models, this work makes the first attempt to develop a multi-objective adversarially robust CNN, called MoAR-CNN. In the MoAR-CNN, we propose a multi-objective automatic design method of the cells-based neural architectures and some critical hyperparameters such as the optimizer type and learning rate of CNNs. A Squeeze-and-Excitation (SE) layer is introduced after each cell to improve the computational efficiency and robustness. The experiments on FUSAR-Ship and OpenSARShip datasets against seven types of adversarial attacks have been implemented to demonstrate the superiority of the proposed MoAR-CNN to six classical manually designed CNNs and four robust neural architectures search methods in terms of clean accuracy, adversarial accuracy, and model size. Furthermore, we also demonstrate the advantages of using SE layer in MoAR-CNN, the transferability of MoAR-CNN, search costs, adversarial training, and the developed NSGA-II in MoAR-CNN through experiments.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"57-74"},"PeriodicalIF":5.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-02DOI: 10.1109/TETCI.2024.3448490
Hongfeng You;Xiaobing Chen;Kun Yu;Guangbo Fu;Fei Mao;Xin Ning;Xiao Bai;Weiwei Cai
This article proposes a feature self-selection and sequence integration network, namely FASSI-Net, for medical image classification, which can extract representative deep features and contextual semantic information. In this network, FASSI-Net uses a new feature selection and integration module (FSIM) to compress the depth features, which uses a sequence model to replace the Flatten layer. This strategy introduces two sets of multi-scale convolutions, where a cross-attention mechanism assigns two sets of weights (i.e., vertical and horizontal weights) to each convolution. We then calculate the Euclidean distance between different scale feature points to measure the correlation between them. Specifically, the feature points are divided into useful features and redundant features. In addition, a feature dimension compression (CRI) module is constructed to reconstruct the redundant feature structure, and the residual structure is used to extract the representative features from the redundant features. Meantime, a sequence model is introduced to compress the deep features and obtain the context relationship between feature points. Experimental results on three datasets show that the proposed method significantly outperforms previous methods.
{"title":"Feature Autonomous Screening and Sequence Integration Network for Medical Image Classification","authors":"Hongfeng You;Xiaobing Chen;Kun Yu;Guangbo Fu;Fei Mao;Xin Ning;Xiao Bai;Weiwei Cai","doi":"10.1109/TETCI.2024.3448490","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3448490","url":null,"abstract":"This article proposes a feature self-selection and sequence integration network, namely FASSI-Net, for medical image classification, which can extract representative deep features and contextual semantic information. In this network, FASSI-Net uses a new feature selection and integration module (FSIM) to compress the depth features, which uses a sequence model to replace the Flatten layer. This strategy introduces two sets of multi-scale convolutions, where a cross-attention mechanism assigns two sets of weights (i.e., vertical and horizontal weights) to each convolution. We then calculate the Euclidean distance between different scale feature points to measure the correlation between them. Specifically, the feature points are divided into useful features and redundant features. In addition, a feature dimension compression (CRI) module is constructed to reconstruct the redundant feature structure, and the residual structure is used to extract the representative features from the redundant features. Meantime, a sequence model is introduced to compress the deep features and obtain the context relationship between feature points. Experimental results on three datasets show that the proposed method significantly outperforms previous methods.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"1034-1048"},"PeriodicalIF":5.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143360993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Capacitated Electric Vehicle Routing Problem (CEVRP) poses a novel challenge within the field of vehicle routing optimization, as it requires consideration of both customer service requirements and electric vehicle recharging schedules. In addressing the CEVRP, Adaptive Large Neighborhood Search (ALNS) has garnered widespread acclaim due to its remarkable adaptability and versatility. However, the original ALNS, using a weight-based scoring method, relies solely on the past performances of operators to determine their weights, thereby failing to capture crucial information about the ongoing search process. Moreover, it often employs a fixed single charging strategy for the CEVRP, neglecting the potential impact of alternative charging strategies on solution improvement. Therefore, this study treats the selection of operators as a Markov Decision Process and introduces a novel approach based on Deep Reinforcement Learning (DRL) for operator selection. This approach enables adaptive selection of both destroy and repair operators, alongside charging strategies, based on the current state of the search process. More specifically, a state extraction method is devised to extract features not only from the problem itself but also from the solutions generated during the iterative process. Additionally, a novel reward function is designed to guide the DRL network in selecting an appropriate operator portfolio for the CEVRP. Experimental results demonstrate that the proposed algorithm excels in instances with fewer than 100 customers, achieving the best values in 7 out of 8 test instances. It also maintains competitive performance in instances with over 100 customers and requires less time compared to population-based methods.
{"title":"A Deep Reinforcement Learning-Based Adaptive Large Neighborhood Search for Capacitated Electric Vehicle Routing Problems","authors":"Chao Wang;Mengmeng Cao;Hao Jiang;Xiaoshu Xiang;Xingyi Zhang","doi":"10.1109/TETCI.2024.3444698","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3444698","url":null,"abstract":"The Capacitated Electric Vehicle Routing Problem (CEVRP) poses a novel challenge within the field of vehicle routing optimization, as it requires consideration of both customer service requirements and electric vehicle recharging schedules. In addressing the CEVRP, Adaptive Large Neighborhood Search (ALNS) has garnered widespread acclaim due to its remarkable adaptability and versatility. However, the original ALNS, using a weight-based scoring method, relies solely on the past performances of operators to determine their weights, thereby failing to capture crucial information about the ongoing search process. Moreover, it often employs a fixed single charging strategy for the CEVRP, neglecting the potential impact of alternative charging strategies on solution improvement. Therefore, this study treats the selection of operators as a Markov Decision Process and introduces a novel approach based on Deep Reinforcement Learning (DRL) for operator selection. This approach enables adaptive selection of both destroy and repair operators, alongside charging strategies, based on the current state of the search process. More specifically, a state extraction method is devised to extract features not only from the problem itself but also from the solutions generated during the iterative process. Additionally, a novel reward function is designed to guide the DRL network in selecting an appropriate operator portfolio for the CEVRP. Experimental results demonstrate that the proposed algorithm excels in instances with fewer than 100 customers, achieving the best values in 7 out of 8 test instances. It also maintains competitive performance in instances with over 100 customers and requires less time compared to population-based methods.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"131-144"},"PeriodicalIF":5.3,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29DOI: 10.1109/TETCI.2024.3446695
Gavin S. Black;Bhaskar P. Rimal;Varghese Mathew Vaidyan
Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.
{"title":"Balancing Security and Correctness in Code Generation: An Empirical Study on Commercial Large Language Models","authors":"Gavin S. Black;Bhaskar P. Rimal;Varghese Mathew Vaidyan","doi":"10.1109/TETCI.2024.3446695","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3446695","url":null,"abstract":"Large language models (LLMs) continue to be adopted for a multitude of previously manual tasks, with code generation as a prominent use. Multiple commercial models have seen wide adoption due to the accessible nature of the interface. Simple prompts can lead to working solutions that save developers time. However, the generated code has a significant challenge with maintaining security. There are no guarantees on code safety, and LLM responses can readily include known weaknesses. To address this concern, our research examines different prompt types for shaping responses from code generation tasks to produce safer outputs. The top set of common weaknesses is generated through unconditioned prompts to create vulnerable code across multiple commercial LLMs. These inputs are then paired with different contexts, roles, and identification prompts intended to improve security. Our findings show that the inclusion of appropriate guidance reduces vulnerabilities in generated code, with the choice of model having the most significant effect. Additionally, timings are presented to demonstrate the efficiency of singular requests that limit the number of model interactions.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"419-430"},"PeriodicalIF":5.3,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-28DOI: 10.1109/TETCI.2024.3446449
Haiqiao Wu;Dapeng Oliver Wu;Peng Gong
Super-resolution is a promising solution to improve the quality of experience (QoE) for cloud-based video streaming when the network resources between clients and the cloud vendors become scarce. Specifically, the received video can be enhanced with a trained super-resolution model running on the client-side. However, all the existing solutions ignore the content-induced performance variability of Super-Resolution Deep Neural Network (SR-DNN) models, which means the same super-resolution models have different enhancement effects on the different parts of videos because of video content variation. That leads to unreasonable bitrate selection, resulting in low video QoE, e.g., low bitrate, rebuffering, or video quality jitters. Thus, in this paper, we propose SR-ABR, a super-resolution integrated adaptive bitrate (ABR) algorithm, which considers the content-induced performance variability of SR-DNNs into the bitrate decision process. Due to complex network conditions and video content, SR-ABR adopts deep reinforcement learning (DRL) to select future bitrate for adapting to a wide range of environments. Moreover, to utilize the content-induced performance variability of SR-DNNs efficiently, we first define the performance variability of SR-DNNs over different video content, and then use a 2D convolution kernel to distill the features of the performance variability of the SR-DNNs to a short future video segment (several chunks) as part of the inputs. We compare SR-ABR with the related state-of-the-art works using trace-driven simulation under various real-world traces. The experiments show that SR-ABR outperforms the best state-of-the-art work NAS with the gain in average QoE of 4.3%–46.2% and 18.9%–42.1% under FCC and 3G/HSDPA network traces, respectively.
{"title":"SR-ABR: Super Resolution Integrated ABR Algorithm for Cloud-Based Video Streaming","authors":"Haiqiao Wu;Dapeng Oliver Wu;Peng Gong","doi":"10.1109/TETCI.2024.3446449","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3446449","url":null,"abstract":"Super-resolution is a promising solution to improve the quality of experience (QoE) for cloud-based video streaming when the network resources between clients and the cloud vendors become scarce. Specifically, the received video can be enhanced with a trained super-resolution model running on the client-side. However, all the existing solutions ignore the content-induced performance variability of Super-Resolution Deep Neural Network (SR-DNN) models, which means the same super-resolution models have different enhancement effects on the different parts of videos because of video content variation. That leads to unreasonable bitrate selection, resulting in low video QoE, e.g., low bitrate, rebuffering, or video quality jitters. Thus, in this paper, we propose SR-ABR, a super-resolution integrated adaptive bitrate (ABR) algorithm, which considers the content-induced performance variability of SR-DNNs into the bitrate decision process. Due to complex network conditions and video content, SR-ABR adopts deep reinforcement learning (DRL) to select future bitrate for adapting to a wide range of environments. Moreover, to utilize the content-induced performance variability of SR-DNNs efficiently, we first define the performance variability of SR-DNNs over different video content, and then use a 2D convolution kernel to distill the features of the performance variability of the SR-DNNs to a short future video segment (several chunks) as part of the inputs. We compare SR-ABR with the related state-of-the-art works using trace-driven simulation under various real-world traces. The experiments show that SR-ABR outperforms the best state-of-the-art work NAS with the gain in average QoE of 4.3%–46.2% and 18.9%–42.1% under FCC and 3G/HSDPA network traces, respectively.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"87-98"},"PeriodicalIF":5.3,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Global wildfire models play a crucial role in anticipating and responding to changing wildfire regimes. JULES-INFERNO is a global vegetation and fire model simulating wildfire emissions and area burnt on a global scale. However, because of the high data dimensionality and system complexity, JULES-INFERNO's computational costs make it challenging to apply to fire risk forecasting with unseen initial conditions. Typically, running JULES-INFERNO for 30 years of prediction will take several hours on High Performance Computing (HPC) clusters. To tackle this bottleneck, two data-driven models are built in this work based on Deep Learning techniques to surrogate the JULES-INFERNO model and speed up global wildfire forecasting. More precisely, these machine learning models take global temperature, vegetation density, soil moisture and previous forecasts as inputs to predict the subsequent global area burnt on an iterative basis. Average Error per Pixel (AEP) and Structural Similarity Index Measure (SSIM) are used as metrics to evaluate the performance of the proposed surrogate models. A fine tuning strategy is also proposed in this work to improve the algorithm performance for unseen scenarios. Numerical results show a strong performance of the proposed models, in terms of both computational efficiency (less than 20 seconds for 30 years of prediction on a laptop CPU) and prediction accuracy (with AEP under 0.3% and SSIM over 98% compared to the outputs of JULES-INFERNO).
{"title":"Deep Learning Surrogate Models of JULES-INFERNO for Wildfire Prediction on a Global Scale","authors":"Sibo Cheng;Hector Chassagnon;Matthew Kasoar;Yike Guo;Rossella Arcucci","doi":"10.1109/TETCI.2024.3445450","DOIUrl":"https://doi.org/10.1109/TETCI.2024.3445450","url":null,"abstract":"Global wildfire models play a crucial role in anticipating and responding to changing wildfire regimes. JULES-INFERNO is a global vegetation and fire model simulating wildfire emissions and area burnt on a global scale. However, because of the high data dimensionality and system complexity, JULES-INFERNO's computational costs make it challenging to apply to fire risk forecasting with unseen initial conditions. Typically, running JULES-INFERNO for 30 years of prediction will take several hours on High Performance Computing (HPC) clusters. To tackle this bottleneck, two data-driven models are built in this work based on Deep Learning techniques to surrogate the JULES-INFERNO model and speed up global wildfire forecasting. More precisely, these machine learning models take global temperature, vegetation density, soil moisture and previous forecasts as inputs to predict the subsequent global area burnt on an iterative basis. Average Error per Pixel (AEP) and Structural Similarity Index Measure (SSIM) are used as metrics to evaluate the performance of the proposed surrogate models. A fine tuning strategy is also proposed in this work to improve the algorithm performance for unseen scenarios. Numerical results show a strong performance of the proposed models, in terms of both computational efficiency (less than 20 seconds for 30 years of prediction on a laptop CPU) and prediction accuracy (with AEP under 0.3% and SSIM over 98% compared to the outputs of JULES-INFERNO).","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 1","pages":"444-454"},"PeriodicalIF":5.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143361374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}