Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469556
Jing Xu, Qing Sun
This paper is dealt with the rotor speed tracking problem of variable-speed wind turbine systems operating under the partial load condition. Singular perturbation techniques are used to characterize the two-time-scale property of the wind turbine system, and a linear parameter varying (LPV) model is formed to approximate nonlinear behaviours of a wind turbine system. Based on the slow-fast decomposition method, slow and fast subsystems are constructed: one for the mechanical dynamics and the other for the electrical dynamics. Slow and fast controls are derived, respectively, and then a local state feedback controller, sum of the slow and fast control, is formulated. A design procedure, using the linear parameter varying control to combine local controllers, is proposed to guarantee the robustness of the closed-loop nonlinear wind turbine system. Numerical examples are given to show the validity of the proposed control scheme.
{"title":"Linear Parameter Varying Control of Wind Energy Conversion Systems in Partial Load","authors":"Jing Xu, Qing Sun","doi":"10.1109/ICMLC51923.2020.9469556","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469556","url":null,"abstract":"This paper is dealt with the rotor speed tracking problem of variable-speed wind turbine systems operating under the partial load condition. Singular perturbation techniques are used to characterize the two-time-scale property of the wind turbine system, and a linear parameter varying (LPV) model is formed to approximate nonlinear behaviours of a wind turbine system. Based on the slow-fast decomposition method, slow and fast subsystems are constructed: one for the mechanical dynamics and the other for the electrical dynamics. Slow and fast controls are derived, respectively, and then a local state feedback controller, sum of the slow and fast control, is formulated. A design procedure, using the linear parameter varying control to combine local controllers, is proposed to guarantee the robustness of the closed-loop nonlinear wind turbine system. Numerical examples are given to show the validity of the proposed control scheme.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128559694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469037
Natasha Kees, Yaxuan Wang, Yiling Jiang, Fang Lue, P. Chan
Backdoor attacks have become a serious security concern because of the rising popularity of unverified third party machine learning resources such as datasets, pretrained models, and processors. Pre-trained models and shared datasets have become popular due to the high training requirement of deep learning. This raises a serious security concern since the shared models and datasets may be modified intentionally in order to reduce system efficacy. A backdoor attack is difficult to detect since the embedded adversarial decision rule will only be triggered by a pre-chosen pattern, and the contaminated model behaves normally on benign samples. This paper devises a backdoor attack detection method to identify whether a sample is attacked for image-related applications. The information consistence provided by an image without each segment is considered. The absence of the segment containing a trigger strongly affects the consistence since the trigger dominates the decision. Our proposed method is evaluated empirically to confirm the effectiveness in various settings. As there is no restrictive assumption on the trigger of backdoor attacks, we expect our proposed model is generalizable and can defend against a wider range of modern attacks.
{"title":"Segmentation Based Backdoor Attack Detection","authors":"Natasha Kees, Yaxuan Wang, Yiling Jiang, Fang Lue, P. Chan","doi":"10.1109/ICMLC51923.2020.9469037","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469037","url":null,"abstract":"Backdoor attacks have become a serious security concern because of the rising popularity of unverified third party machine learning resources such as datasets, pretrained models, and processors. Pre-trained models and shared datasets have become popular due to the high training requirement of deep learning. This raises a serious security concern since the shared models and datasets may be modified intentionally in order to reduce system efficacy. A backdoor attack is difficult to detect since the embedded adversarial decision rule will only be triggered by a pre-chosen pattern, and the contaminated model behaves normally on benign samples. This paper devises a backdoor attack detection method to identify whether a sample is attacked for image-related applications. The information consistence provided by an image without each segment is considered. The absence of the segment containing a trigger strongly affects the consistence since the trigger dominates the decision. Our proposed method is evaluated empirically to confirm the effectiveness in various settings. As there is no restrictive assumption on the trigger of backdoor attacks, we expect our proposed model is generalizable and can defend against a wider range of modern attacks.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130934182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469589
D. Nguyen, Arvind Rajagopalan, Jijoong Kim, C. Lim, David Hubczenko
In this paper, we present a decentralised online decision-making strategy for multi-agents carrying out a cooperative mission. Our solution provides the capability for agents to dynamically choose their best targets and arrive at their target locations simultaneously at pre-specified angles. Additionally, the agents are able to cope with any obstacles encountered without compromising the mission goals. The algorithm combines game-theoretic regret minimisation with current best-practice solutions to satisfy complex mission requirements. It is decentralised and readily scalable to a large number of agents for wide area operations. Simulation results show it can be applied to teams of agents in challenging environments and exhibits fast convergence and adaptability.
{"title":"Dynamic Multi-Target Assignment with Decentralised Online Learning to Achieve Multiple Synchronised Goals","authors":"D. Nguyen, Arvind Rajagopalan, Jijoong Kim, C. Lim, David Hubczenko","doi":"10.1109/ICMLC51923.2020.9469589","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469589","url":null,"abstract":"In this paper, we present a decentralised online decision-making strategy for multi-agents carrying out a cooperative mission. Our solution provides the capability for agents to dynamically choose their best targets and arrive at their target locations simultaneously at pre-specified angles. Additionally, the agents are able to cope with any obstacles encountered without compromising the mission goals. The algorithm combines game-theoretic regret minimisation with current best-practice solutions to satisfy complex mission requirements. It is decentralised and readily scalable to a large number of agents for wide area operations. Simulation results show it can be applied to teams of agents in challenging environments and exhibits fast convergence and adaptability.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116990283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469572
Haiwei Xue, Xueming Yan, Shengyi Jiang, Helang Lai
Multimodal sentiment analysis is a highly sought-after topic in natural language processing. In this paper, a multi-tensor fusion network with hybrid attention architecture for multimodal sentiment analysis is proposed. Firstly, Bi-LSTM is applied to encode contextual representation in different modalities. Following this, modalities features are extracted and interacted with by the hybrid attention mechanism. Finally, multi-tensor fusion approach is used to further enhance the effectiveness of fusing interaction features in different modalities. The proposed approach outperforms the existing advanced approaches on two benchmarks through a series of regression experiments for sentiment intensity prediction, as it improves F1-score by 3.4% and 2.1% points respectively. Our architecture would be open-sourced on Github1 for researchers to use.
{"title":"Multi-Tensor Fusion Network with Hybrid Attention for Multimodal Sentiment Analysis","authors":"Haiwei Xue, Xueming Yan, Shengyi Jiang, Helang Lai","doi":"10.1109/ICMLC51923.2020.9469572","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469572","url":null,"abstract":"Multimodal sentiment analysis is a highly sought-after topic in natural language processing. In this paper, a multi-tensor fusion network with hybrid attention architecture for multimodal sentiment analysis is proposed. Firstly, Bi-LSTM is applied to encode contextual representation in different modalities. Following this, modalities features are extracted and interacted with by the hybrid attention mechanism. Finally, multi-tensor fusion approach is used to further enhance the effectiveness of fusing interaction features in different modalities. The proposed approach outperforms the existing advanced approaches on two benchmarks through a series of regression experiments for sentiment intensity prediction, as it improves F1-score by 3.4% and 2.1% points respectively. Our architecture would be open-sourced on Github1 for researchers to use.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124492206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469577
Hao Ke
Online news services have become the first choice to read news for many internet users. However, thousands of news articles are released and updated on a daily basis, which makes it impossible for users to select relevant and intriguing stories by themselves. The news recommendation models are developed to tackle information overload. News stories on various topics are recommended to users from diversified backgrounds by an automated system. In this paper, we propose a neural news recommendation model with self-attention jointly trained by document classification, SARC. The self-attention mechanism captures the long-term relationships among words. The joint training of recommendation and classification improves representation and generalization capability. We demonstrate our model’s superior performances over other state-of-the-art baselines on a large-scale news recommendation dataset.
{"title":"Improving Self-Attention Based News Recommendation with Document Classification","authors":"Hao Ke","doi":"10.1109/ICMLC51923.2020.9469577","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469577","url":null,"abstract":"Online news services have become the first choice to read news for many internet users. However, thousands of news articles are released and updated on a daily basis, which makes it impossible for users to select relevant and intriguing stories by themselves. The news recommendation models are developed to tackle information overload. News stories on various topics are recommended to users from diversified backgrounds by an automated system. In this paper, we propose a neural news recommendation model with self-attention jointly trained by document classification, SARC. The self-attention mechanism captures the long-term relationships among words. The joint training of recommendation and classification improves representation and generalization capability. We demonstrate our model’s superior performances over other state-of-the-art baselines on a large-scale news recommendation dataset.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469568
Jun-Hai Zhai, Sufang Zhang, Mo-Han Wang, Yan Li
In the real world, there are many imbalanced data classification problems, such as extreme weather prediction, software defect prediction, machinery fault diagnosis, spam filtering, etc. It has important theoretical and practical value to study the problem of imbalanced data classification. In the framework of binary imbalanced data classification, a three-stage method for classification of binary imbalanced big data was proposed in this paper. Specifically, in the first stage, the negative class big data was clustered into K clusters by K-means algorithm on Hadoop platform. In the second stage, we use instance selection method to select important samples from each cluster in parallel, and obtain K negative class subsets. In the third stage, we first construct K balanced training sets which consist of negative class subset and positive class subset, and then train K classifiers, and finally we integrate these classifiers to classify the unseen samples. Some experiments are conducted to compare the proposed method with two state-of-the-art methods on G-means. The experimental results demonstrate that the proposed method is more effective and efficient than the compared approaches.
{"title":"A Three-stage Method for Classification of Binary Imbalanced Big Data","authors":"Jun-Hai Zhai, Sufang Zhang, Mo-Han Wang, Yan Li","doi":"10.1109/ICMLC51923.2020.9469568","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469568","url":null,"abstract":"In the real world, there are many imbalanced data classification problems, such as extreme weather prediction, software defect prediction, machinery fault diagnosis, spam filtering, etc. It has important theoretical and practical value to study the problem of imbalanced data classification. In the framework of binary imbalanced data classification, a three-stage method for classification of binary imbalanced big data was proposed in this paper. Specifically, in the first stage, the negative class big data was clustered into K clusters by K-means algorithm on Hadoop platform. In the second stage, we use instance selection method to select important samples from each cluster in parallel, and obtain K negative class subsets. In the third stage, we first construct K balanced training sets which consist of negative class subset and positive class subset, and then train K classifiers, and finally we integrate these classifiers to classify the unseen samples. Some experiments are conducted to compare the proposed method with two state-of-the-art methods on G-means. The experimental results demonstrate that the proposed method is more effective and efficient than the compared approaches.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116314148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469541
Sufang Zhang, Jun-Hai Zhai, Shi Tian, Xiang Zhou, Yan Li
With the rapid development of computer network technology and wireless sensor technology, as well as the arrival of the era of big data, the dimension and sample number of data are growing rapidly. Accordingly, it is important to investigate the problem of feature selection for big data and to design feature selection algorithm for big data. Based on MapReduce and voting mechanism, a feature selection method for big data is proposed in this paper. The proposed methods include three steps: Firstly, partition big data set into m subsets, and deploy the subsets to m computing nodes of Hadoop. Secondly, on the m computing nodes, we employ a feature selection algorithm based on genetic algorithm to select important features in parallel using local data subset, and obtain m feature subsets. Finally, for each feature, m feature subsets are used to vote on it, and the final feature subset is selected according to the voting results. Experimental results on four big data sets demonstrate that the proposed method is effective and efficient.
{"title":"Feature Selection for Big Data Based on Mapreduce and Voting Mechanism","authors":"Sufang Zhang, Jun-Hai Zhai, Shi Tian, Xiang Zhou, Yan Li","doi":"10.1109/ICMLC51923.2020.9469541","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469541","url":null,"abstract":"With the rapid development of computer network technology and wireless sensor technology, as well as the arrival of the era of big data, the dimension and sample number of data are growing rapidly. Accordingly, it is important to investigate the problem of feature selection for big data and to design feature selection algorithm for big data. Based on MapReduce and voting mechanism, a feature selection method for big data is proposed in this paper. The proposed methods include three steps: Firstly, partition big data set into m subsets, and deploy the subsets to m computing nodes of Hadoop. Secondly, on the m computing nodes, we employ a feature selection algorithm based on genetic algorithm to select important features in parallel using local data subset, and obtain m feature subsets. Finally, for each feature, m feature subsets are used to vote on it, and the final feature subset is selected according to the voting results. Experimental results on four big data sets demonstrate that the proposed method is effective and efficient.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124581120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469564
Jincheng Li
The principle of computational toxicity prediction is that chemicals with similar molecular structures may possess similar toxicological pathways and effects. There have been many methods that represented each chemical by a set of descriptors, which are identified by experts as promising properties for predicting biological activity or toxicity. These chemical descriptors play a critical role in computational methods, that task correlated descriptors are favorable to achieve high prediction performance. However, there are few work compare the effectiveness of chemical descriptors and evaluate their performance in toxicity prediction. In this paper, we propose a novel ensemble feature selection method based on random under-sampling to analysis the effectiveness of chemical descriptors adopted in toxicity prediction application. The proposed method is efficient and can relief the imbalanced data problem of toxicity. Experiment results on the tox21 toxicity prediction dataset show that "molecular property", "connectivity" and "topological" descriptor are the three most important descriptors for toxicity prediction tasks among the 12 popular descriptors adopted in toxicity prediction applications. The results of this study can be used as a guide to propose new descriptors for chemical toxicity prediction.
{"title":"Feature Selection on Imbalanced Data and Its Application on Toxicity Prediction","authors":"Jincheng Li","doi":"10.1109/ICMLC51923.2020.9469564","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469564","url":null,"abstract":"The principle of computational toxicity prediction is that chemicals with similar molecular structures may possess similar toxicological pathways and effects. There have been many methods that represented each chemical by a set of descriptors, which are identified by experts as promising properties for predicting biological activity or toxicity. These chemical descriptors play a critical role in computational methods, that task correlated descriptors are favorable to achieve high prediction performance. However, there are few work compare the effectiveness of chemical descriptors and evaluate their performance in toxicity prediction. In this paper, we propose a novel ensemble feature selection method based on random under-sampling to analysis the effectiveness of chemical descriptors adopted in toxicity prediction application. The proposed method is efficient and can relief the imbalanced data problem of toxicity. Experiment results on the tox21 toxicity prediction dataset show that \"molecular property\", \"connectivity\" and \"topological\" descriptor are the three most important descriptors for toxicity prediction tasks among the 12 popular descriptors adopted in toxicity prediction applications. The results of this study can be used as a guide to propose new descriptors for chemical toxicity prediction.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124273288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469550
T. Murata, S. Date, Yusuke Goto, T. Hanawa, Takuya Harada, M. Ichikawa, Lee Hao, M. Munetomo, Akiyoshi Sugiki
In this paper, we introduce a distribution system of synthesized data of Japanese population using Interdisciplinary Large-scale Information Infra-structures in Japan. Synthetic population is synthesized based on the statistics of the census that are conducted by the government and publicly released. Therefore, the synthesized data have no privacy data. However, it is easy to estimate the compositions of households, working status in a certain area from the synthetic population. Therefore, we currently distribute the synthesized data only for public or academic purposes. For academic purposes, it is important to encourage scholars or researchers to use a large-scale data of households, we define protection levels for the attributes in the synthetic populations. According to the protection levels, we distribute the data with proper attributes to those who try to use them. We encourage researchers to use the synthetic populations to be familiar to large-scale data processing.
{"title":"Distribution System for Japanese Synthetic Population Data with Protection Level","authors":"T. Murata, S. Date, Yusuke Goto, T. Hanawa, Takuya Harada, M. Ichikawa, Lee Hao, M. Munetomo, Akiyoshi Sugiki","doi":"10.1109/ICMLC51923.2020.9469550","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469550","url":null,"abstract":"In this paper, we introduce a distribution system of synthesized data of Japanese population using Interdisciplinary Large-scale Information Infra-structures in Japan. Synthetic population is synthesized based on the statistics of the census that are conducted by the government and publicly released. Therefore, the synthesized data have no privacy data. However, it is easy to estimate the compositions of households, working status in a certain area from the synthetic population. Therefore, we currently distribute the synthesized data only for public or academic purposes. For academic purposes, it is important to encourage scholars or researchers to use a large-scale data of households, we define protection levels for the attributes in the synthetic populations. According to the protection levels, we distribute the data with proper attributes to those who try to use them. We encourage researchers to use the synthetic populations to be familiar to large-scale data processing.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121255562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-02DOI: 10.1109/ICMLC51923.2020.9469046
Jingyuan Liu, M. Rajati
Fine-tuning pre-trained models is arguably one of the most significant approaches in transfer learning. Recent studies focus on methods whose performance is superior to standard fine-tuning methods, such as Adaptive Filter Fine-tuning and Fine-tuning last-k. The SpotTune model outperforms most common fine-tuning methods due to a novel adaptive fine-tuning approach. Since there is a trade-off between the number of parameters and performance, the SpotTune model is not parameter efficient. In this paper, we propose a shapeshift adapter module that can help reduce training parameters in deep learning models while pre-serving the high-performance merit of SpotTune. The shapeshift adapter yields a flexible structure, which allows us to find a balance between the number of parameters and performance. We integrate our proposed module with the residual blocks in ResNet and conduct several experiments on the SpotTune model. On the Visual Decathlon Challenge, our proposed method gets a score close to SpotTune and it outperforms the SpotTune model over half of the datasets. Particularly, our proposed method notably uses only about 20% of the parameters that are needed when training using a standard fine-tuning approach.
{"title":"Transfer Learning with Shapeshift Adapter: A Parameter-Efficient Module for Deep Learning Model","authors":"Jingyuan Liu, M. Rajati","doi":"10.1109/ICMLC51923.2020.9469046","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469046","url":null,"abstract":"Fine-tuning pre-trained models is arguably one of the most significant approaches in transfer learning. Recent studies focus on methods whose performance is superior to standard fine-tuning methods, such as Adaptive Filter Fine-tuning and Fine-tuning last-k. The SpotTune model outperforms most common fine-tuning methods due to a novel adaptive fine-tuning approach. Since there is a trade-off between the number of parameters and performance, the SpotTune model is not parameter efficient. In this paper, we propose a shapeshift adapter module that can help reduce training parameters in deep learning models while pre-serving the high-performance merit of SpotTune. The shapeshift adapter yields a flexible structure, which allows us to find a balance between the number of parameters and performance. We integrate our proposed module with the residual blocks in ResNet and conduct several experiments on the SpotTune model. On the Visual Decathlon Challenge, our proposed method gets a score close to SpotTune and it outperforms the SpotTune model over half of the datasets. Particularly, our proposed method notably uses only about 20% of the parameters that are needed when training using a standard fine-tuning approach.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130735589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}