Pruning of neural networks is undoubtedly a popular approach to cope with the current compression of large-scale, high-cost network models. However, most of the existing methods require a high level of human-regulated pruning criteria, which requires a lot of human effort to figure out a reasonable pruning strength. One of the main reasons is that there are different levels of sensitivity distribution in the network. Our main goal is to discover compression methods that adapt to this distribution to avoid deep architectural damage to the network due to unnecessary pruning. In this paper, we propose a filter texture distribution that affects the training of the network. We also analyze the sensitivity of each of the diverse states of this distribution. To do so, we first use a multidimensional penalty method that can analyze the potential sensitivity based on this texture distribution to obtain a pruning-friendly sparse environment. Then, we set up a lightweight dynamic threshold container in order to prune the sparse network. By providing each filter with a suitable threshold for that filter at a low cost, a massive reduction in the number of parameters is achieved without affecting the contribution of certain pruning-sensitive layers to the network as a whole. In the final experiments, our two methods adapted to texture distribution were applied to ResNet Deep Neural Network (DNN) and VGG-16, which were deployed on the classical CIFAR-10/100 and ImageNet datasets with excellent results in order to facilitate comparison with good cutting-edge pruning methods. Code is available at https://github.com/wangyuzhe27/CDP-and-DTC.
神经网络的剪枝无疑是当前处理大规模、高成本网络模型压缩的一种流行方法。然而,现有的大多数方法都需要高水平的人为调节修剪标准,这需要大量的人力来计算出合理的修剪强度。其中一个主要原因是网络中存在不同程度的灵敏度分布。我们的主要目标是发现适应这种分布的压缩方法,以避免由于不必要的修剪而对网络造成深刻的体系结构破坏。在本文中,我们提出了一种影响网络训练的过滤器纹理分布。我们还分析了该分布的每个不同状态的敏感性。为此,我们首先使用一种多维惩罚方法,该方法可以分析基于该纹理分布的潜在灵敏度,以获得修剪友好的稀疏环境。然后,我们建立了一个轻量级的动态阈值容器来对稀疏网络进行修剪。通过以较低的成本为每个过滤器提供合适的阈值,在不影响某些修剪敏感层对整个网络的贡献的情况下,实现了参数数量的大量减少。在最后的实验中,我们将这两种适合纹理分布的方法应用于ResNet Deep Neural Network (DNN)和VGG-16,并将它们部署在经典的CIFAR-10/100和ImageNet数据集上,得到了很好的结果,以便与优秀的前沿修剪方法进行比较。代码可从https://github.com/wangyuzhe27/CDP-and-DTC获得。
{"title":"Dynamic finegrained structured pruning sensitive to filter texture distribution","authors":"P. Li, Yuzhe Wang, Cong Wu, Xiatao Kang","doi":"10.3233/aic-230046","DOIUrl":"https://doi.org/10.3233/aic-230046","url":null,"abstract":"Pruning of neural networks is undoubtedly a popular approach to cope with the current compression of large-scale, high-cost network models. However, most of the existing methods require a high level of human-regulated pruning criteria, which requires a lot of human effort to figure out a reasonable pruning strength. One of the main reasons is that there are different levels of sensitivity distribution in the network. Our main goal is to discover compression methods that adapt to this distribution to avoid deep architectural damage to the network due to unnecessary pruning. In this paper, we propose a filter texture distribution that affects the training of the network. We also analyze the sensitivity of each of the diverse states of this distribution. To do so, we first use a multidimensional penalty method that can analyze the potential sensitivity based on this texture distribution to obtain a pruning-friendly sparse environment. Then, we set up a lightweight dynamic threshold container in order to prune the sparse network. By providing each filter with a suitable threshold for that filter at a low cost, a massive reduction in the number of parameters is achieved without affecting the contribution of certain pruning-sensitive layers to the network as a whole. In the final experiments, our two methods adapted to texture distribution were applied to ResNet Deep Neural Network (DNN) and VGG-16, which were deployed on the classical CIFAR-10/100 and ImageNet datasets with excellent results in order to facilitate comparison with good cutting-edge pruning methods. Code is available at https://github.com/wangyuzhe27/CDP-and-DTC.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46787595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Object localization has been the focus of research in Fine-Grained Visual Categorization (FGVC). With the aim of improving the accuracy and precision of object localization in multi-branch networks, as well as the robustness and universality of object localization methods, our study mainly focus on how to combine coordinate attention and feature activation map for target localization. The model in this paper is a three-branch model including raw branch, object branch and part branch. The images are fed directly into the raw branch. Coordinate Attention Object Localization Module (CAOLM) is used to localize and crop objects in the image to generate the input for the object branch. Attention Partial Proposal Module (APPM) is used to propose part regions at different scales. The three classes of input images undergo end-to-end weakly supervised learning through different branches of the network. The model expands the receptive field to capture multi-scale features by Selective Branch Atrous Spatial Pooling Pyramid (SB-ASPP). It can fuse the feature maps obtained from the raw branch and the object branch with Selective Branch Block (SBBlock), and the complete features of the raw branch are used to supplement the missing information of the object branch. Extensive experimental results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets show that our method has the best classification performance on FGVC-Aircraft and also has competitive performance on other datasets. Few parameters and fast inference speed are also the advantages of our model.
{"title":"Multi-branch selection fusion fine-grained classification algorithm based on coordinate attention localization","authors":"Feng Zhang, Gaocai Wang, Man Wu, Shuqiang Huang","doi":"10.3233/aic-220187","DOIUrl":"https://doi.org/10.3233/aic-220187","url":null,"abstract":"Object localization has been the focus of research in Fine-Grained Visual Categorization (FGVC). With the aim of improving the accuracy and precision of object localization in multi-branch networks, as well as the robustness and universality of object localization methods, our study mainly focus on how to combine coordinate attention and feature activation map for target localization. The model in this paper is a three-branch model including raw branch, object branch and part branch. The images are fed directly into the raw branch. Coordinate Attention Object Localization Module (CAOLM) is used to localize and crop objects in the image to generate the input for the object branch. Attention Partial Proposal Module (APPM) is used to propose part regions at different scales. The three classes of input images undergo end-to-end weakly supervised learning through different branches of the network. The model expands the receptive field to capture multi-scale features by Selective Branch Atrous Spatial Pooling Pyramid (SB-ASPP). It can fuse the feature maps obtained from the raw branch and the object branch with Selective Branch Block (SBBlock), and the complete features of the raw branch are used to supplement the missing information of the object branch. Extensive experimental results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets show that our method has the best classification performance on FGVC-Aircraft and also has competitive performance on other datasets. Few parameters and fast inference speed are also the advantages of our model.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135771027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The most widely used two-stream architectures and building blocks for human action recognition in videos generally consist of 2D or 3D convolution neural networks. 3D convolution can abstract motion messages between video frames, which is essential for video classification. 3D convolution neural networks usually obtain good performance compared with 2D cases, however it also increases computational cost. In this paper, we propose a heterogeneous two-stream architecture which incorporates two convolutional networks. One uses a mixed convolution network (MCN), which combines some 3D convolutions in the middle of 2D convolutions to train RGB frames, another one adopts BN-Inception network to train Optical Flow frames. Considering the redundancy of neighborhood video frames, we adopt a sparse sampling strategy to decrease the computational cost. Our architecture is trained and evaluated on the standard video actions benchmarks of HMDB51 and UCF101. Experimental results show our approach obtains the state-of-the-art performance on the datasets of HMDB51 (73.04%) and UCF101 (95.27%).
{"title":"A heterogeneous two-stream network for human action recognition","authors":"Shengbin Liao, Xiaofeng Wang, Zongkai Yang","doi":"10.3233/aic-220188","DOIUrl":"https://doi.org/10.3233/aic-220188","url":null,"abstract":"The most widely used two-stream architectures and building blocks for human action recognition in videos generally consist of 2D or 3D convolution neural networks. 3D convolution can abstract motion messages between video frames, which is essential for video classification. 3D convolution neural networks usually obtain good performance compared with 2D cases, however it also increases computational cost. In this paper, we propose a heterogeneous two-stream architecture which incorporates two convolutional networks. One uses a mixed convolution network (MCN), which combines some 3D convolutions in the middle of 2D convolutions to train RGB frames, another one adopts BN-Inception network to train Optical Flow frames. Considering the redundancy of neighborhood video frames, we adopt a sparse sampling strategy to decrease the computational cost. Our architecture is trained and evaluated on the standard video actions benchmarks of HMDB51 and UCF101. Experimental results show our approach obtains the state-of-the-art performance on the datasets of HMDB51 (73.04%) and UCF101 (95.27%).","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"1 1","pages":"219-233"},"PeriodicalIF":0.8,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76007808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixin Xiong, Yongcheng Zhou, Yujuan Wang, Quanxing Liu, Lei Deng
Lung cancer is the leading cause of cancer death worldwide, and most patients are diagnosed with advanced stages for lack of symptoms in the early stages of the disease, leading to poor prognosis. It is thus of great importance to detect lung cancer in the early stages which can reduce mortality and improve patient survival significantly. Although there are many computer aided diagnosis (CAD) systems used for detecting pulmonary nodules, there are still few CAD systems for detection and segmentation, and their performance on small nodules is not ideal. Thus, in this paper, we propose a deep cascaded multitask framework called mobilenet split-attention Yolo unet, the mobilenet split-attention Yolo(Msa-yolo) greatly enhance the feature of small nodules and boost up their performance, the overall result shows that the mean accuracy precision (mAP) of our Msa-Yolo compared to Yolox has increased from 85.10% to 86.64% on LUNA16 dataset, and from 90.13% to 94.15% on LCS dataset compared to YoloX. Besides, we get only 8.35 average number of candidates per scan with 96.32% sensitivity on LUNA16 dataset, which greatly outperforms other existing systems. At the segmentation stage, the mean intersection over union (mIOU) of our CAD system has increased from 71.66% to 76.84% on LCS dataset comparing to baseline. Conclusion: A fast, accurate and robust CAD system for nodule detection, segmentation and classification is proposed in this paper. And it is confirmed by the experimental results that the proposed system possesses the ability to detect and segment small nodules.
{"title":"Fully Automated Neural Network Framework for Pulmonary Nodules Detection and Segmentation","authors":"Yixin Xiong, Yongcheng Zhou, Yujuan Wang, Quanxing Liu, Lei Deng","doi":"10.3233/aic-220318","DOIUrl":"https://doi.org/10.3233/aic-220318","url":null,"abstract":"Lung cancer is the leading cause of cancer death worldwide, and most patients are diagnosed with advanced stages for lack of symptoms in the early stages of the disease, leading to poor prognosis. It is thus of great importance to detect lung cancer in the early stages which can reduce mortality and improve patient survival significantly. Although there are many computer aided diagnosis (CAD) systems used for detecting pulmonary nodules, there are still few CAD systems for detection and segmentation, and their performance on small nodules is not ideal. Thus, in this paper, we propose a deep cascaded multitask framework called mobilenet split-attention Yolo unet, the mobilenet split-attention Yolo(Msa-yolo) greatly enhance the feature of small nodules and boost up their performance, the overall result shows that the mean accuracy precision (mAP) of our Msa-Yolo compared to Yolox has increased from 85.10% to 86.64% on LUNA16 dataset, and from 90.13% to 94.15% on LCS dataset compared to YoloX. Besides, we get only 8.35 average number of candidates per scan with 96.32% sensitivity on LUNA16 dataset, which greatly outperforms other existing systems. At the segmentation stage, the mean intersection over union (mIOU) of our CAD system has increased from 71.66% to 76.84% on LCS dataset comparing to baseline. Conclusion: A fast, accurate and robust CAD system for nodule detection, segmentation and classification is proposed in this paper. And it is confirmed by the experimental results that the proposed system possesses the ability to detect and segment small nodules.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"1 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41366759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Connected Autonomous vehicles (CAVs) are expected to improve the safety and efficiency of traffic by automating driving tasks. Amongst those, lane changing is particularly challenging, as it requires the vehicle to be aware of its highly-dynamic surrounding environment, make decisions, and enact them within very short time windows. As CAVs need to optimise their actions based on a large set of data collected from the environment, Reinforcement Learning (RL) has been widely used to develop CAV motion controllers. These controllers learn to make efficient and safe lane changing decisions using on-board sensors and inter-vehicle communication. This paper, first presents four overlapping fields that are key to the future of safe self-driving cars: CAVs, motion control, RL, and safe control. It then defines the requirements for a safe CAV controller. These are used firstly to compare applications of Multi-Agent Reinforcement Learning (MARL) to CAV lane change controllers. The requirements are then used to evaluate state-of-the-art safety methods used for RL-based motion controllers. The final section summarises research gaps and possible opportunities for the future development of safe MARL-based CAV motion controllers. In particular, it highlights the requirement to design MARL controllers with continuous control for lane changing. Moreover, as RL algorithms by themselves do not guarantee the level of safety required for such safety-critical applications, it offers insights and challenges to integrate safe RL methods with MARL-based CAV motion controllers.
{"title":"Multi-agent reinforcement learning for safe lane changes by connected and autonomous vehicles: A survey","authors":"Bharathkumar Hegde, Mélanie Bouroche","doi":"10.3233/aic-220316","DOIUrl":"https://doi.org/10.3233/aic-220316","url":null,"abstract":"Connected Autonomous vehicles (CAVs) are expected to improve the safety and efficiency of traffic by automating driving tasks. Amongst those, lane changing is particularly challenging, as it requires the vehicle to be aware of its highly-dynamic surrounding environment, make decisions, and enact them within very short time windows. As CAVs need to optimise their actions based on a large set of data collected from the environment, Reinforcement Learning (RL) has been widely used to develop CAV motion controllers. These controllers learn to make efficient and safe lane changing decisions using on-board sensors and inter-vehicle communication. This paper, first presents four overlapping fields that are key to the future of safe self-driving cars: CAVs, motion control, RL, and safe control. It then defines the requirements for a safe CAV controller. These are used firstly to compare applications of Multi-Agent Reinforcement Learning (MARL) to CAV lane change controllers. The requirements are then used to evaluate state-of-the-art safety methods used for RL-based motion controllers. The final section summarises research gaps and possible opportunities for the future development of safe MARL-based CAV motion controllers. In particular, it highlights the requirement to design MARL controllers with continuous control for lane changing. Moreover, as RL algorithms by themselves do not guarantee the level of safety required for such safety-critical applications, it offers insights and challenges to integrate safe RL methods with MARL-based CAV motion controllers.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42056604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajalakshmi Sivanaiah, R. S. Milton, T. T. Mirnalinee
Recommendation systems help customers to find interesting and valuable resources in the internet services. Their priority is to create and examine users’ individual profiles, which contain their preferences, and then update their profile content with additional features to finally increase the users’ satisfaction. Specific characteristics or descriptions and reviews of the items to recommend also play a significant part in identifying the preferences. However, inferring the user’s interest from his activities is a challenging task. Hence it is crucial to identify the interests of the user without the intervention of the user. This work elucidates the effectiveness of textual content together with metadata and explicit ratings in boosting collaborative techniques. In order to infer user’s preferences, metadata content information is boosted with user-features and item-features extracted from the text reviews using sentiment analysis by Vader lexicon-based approach. Before doing sentiment analysis, ironic and sarcastic reviews are removed for better performance since those reviews inverse the polarity of sentiments. Amazon product dataset is used for the analysis. From the text reviews, we identified the reasons that would have led the user to the overall rating given by him, referred to as features of interest (FoI). FoI are formulated as multi-criteria and the ratings for multiple criteria are computed from the single rating given by the user. Multi-Criteria-based Content Boosted Hybrid Filtering techniques (MCCBHF) are devised to analyze the user preferences from their review texts and the ratings. This technique is used to enhance various collaborative filtering methods and the enhanced proposed MCKNN, MCEMF, MCTFM, MCFM techniques provide better personalized product recommendations to users. In the proposed MCCBHF algorithms, MCFM yields better results with the least RMSE value of 1.03 when compared to other algorithms.
{"title":"Factoring textual reviews into user preferences in multi-criteria based content boosted hybrid filtering (MCCBHF) recommendation system","authors":"Rajalakshmi Sivanaiah, R. S. Milton, T. T. Mirnalinee","doi":"10.3233/aic-220122","DOIUrl":"https://doi.org/10.3233/aic-220122","url":null,"abstract":"Recommendation systems help customers to find interesting and valuable resources in the internet services. Their priority is to create and examine users’ individual profiles, which contain their preferences, and then update their profile content with additional features to finally increase the users’ satisfaction. Specific characteristics or descriptions and reviews of the items to recommend also play a significant part in identifying the preferences. However, inferring the user’s interest from his activities is a challenging task. Hence it is crucial to identify the interests of the user without the intervention of the user. This work elucidates the effectiveness of textual content together with metadata and explicit ratings in boosting collaborative techniques. In order to infer user’s preferences, metadata content information is boosted with user-features and item-features extracted from the text reviews using sentiment analysis by Vader lexicon-based approach. Before doing sentiment analysis, ironic and sarcastic reviews are removed for better performance since those reviews inverse the polarity of sentiments. Amazon product dataset is used for the analysis. From the text reviews, we identified the reasons that would have led the user to the overall rating given by him, referred to as features of interest (FoI). FoI are formulated as multi-criteria and the ratings for multiple criteria are computed from the single rating given by the user. Multi-Criteria-based Content Boosted Hybrid Filtering techniques (MCCBHF) are devised to analyze the user preferences from their review texts and the ratings. This technique is used to enhance various collaborative filtering methods and the enhanced proposed MCKNN, MCEMF, MCTFM, MCFM techniques provide better personalized product recommendations to users. In the proposed MCCBHF algorithms, MCFM yields better results with the least RMSE value of 1.03 when compared to other algorithms.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"89 6 1","pages":"175-190"},"PeriodicalIF":0.8,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83635316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aitor López Sánchez, Marin Lujak, F. Semet, Holger Billhardt
A cooperative is a business entity with the primary objective of providing benefits, services, and goods to its members, who both own and exercise democratic control over it. In the context of a cooperative, a fleet typically consists of vehicles owned by self-concerned individually rational owners who prioritize their own efficiency and the fairness of the system. This fairness refers to how their individual gain aligns with the gain of others. In this paper, we focus on the routing of such cooperative fleets. Considering only the fleet’s efficiency in terms of minimising its overall cost, the studied problem corresponds to the multiple Traveling Salesman Problem (mTSP). However, our interest lies in finding both efficient and fair solutions, so we propose two new variants of this problem that integrate and maximise the fleet’s egalitarian and elitist social welfare. Additionally, to enhance the balance between fleet efficiency and fairness, we propose the systematic elitist and systematic egalitarian social welfare optimisation algorithm. Through simulation results, we observe a wide diversity of routes depending on the approach considered. Therefore, a cooperative may choose a model that best balances its fleet’s efficiency and fairness based on its specific requirements.
{"title":"How to achieve fair and efficient cooperative vehicle routing?","authors":"Aitor López Sánchez, Marin Lujak, F. Semet, Holger Billhardt","doi":"10.3233/aic-220315","DOIUrl":"https://doi.org/10.3233/aic-220315","url":null,"abstract":"A cooperative is a business entity with the primary objective of providing benefits, services, and goods to its members, who both own and exercise democratic control over it. In the context of a cooperative, a fleet typically consists of vehicles owned by self-concerned individually rational owners who prioritize their own efficiency and the fairness of the system. This fairness refers to how their individual gain aligns with the gain of others. In this paper, we focus on the routing of such cooperative fleets. Considering only the fleet’s efficiency in terms of minimising its overall cost, the studied problem corresponds to the multiple Traveling Salesman Problem (mTSP). However, our interest lies in finding both efficient and fair solutions, so we propose two new variants of this problem that integrate and maximise the fleet’s egalitarian and elitist social welfare. Additionally, to enhance the balance between fleet efficiency and fairness, we propose the systematic elitist and systematic egalitarian social welfare optimisation algorithm. Through simulation results, we observe a wide diversity of routes depending on the approach considered. Therefore, a cooperative may choose a model that best balances its fleet’s efficiency and fairness based on its specific requirements.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":" ","pages":""},"PeriodicalIF":0.8,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45979179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Trujillo, Joel Nation, Luis Muñoz Delgado, E. Galván
In Transfer Learning (TL) a model that is trained on one problem is used to simplify the learning process on a second problem. TL has achieved impressive results for Deep Learning, but has been scarcely studied in genetic programming (GP). Moreover, predicting when, or why, TL might succeed is an open question. This work presents an approach to determine when two problems might be compatible for TL. This question is studied for TL with GP for the first time, focusing on multiclass classification. Using a set of reference problems, each problem pair is categorized into one of two groups. TL compatible problems are problem pairs where TL was successful, while TL non-compatible problems are problem pairs where TL was unsuccessful, relative to baseline methods. DeepInsight is used to extract a 2D projection of the feature space of each problem, and a similarity measure is computed by registering the feature space representation of both problems. Results show that it is possible to distinguish between both groups with statistical significant results. The proposal does not require model training or inference, and can be applied to problems from different domains, with a different a number of samples, features and classes.
{"title":"Predicting the success of transfer learning for genetic programming using DeepInsight feature space alignment","authors":"L. Trujillo, Joel Nation, Luis Muñoz Delgado, E. Galván","doi":"10.3233/aic-230104","DOIUrl":"https://doi.org/10.3233/aic-230104","url":null,"abstract":"In Transfer Learning (TL) a model that is trained on one problem is used to simplify the learning process on a second problem. TL has achieved impressive results for Deep Learning, but has been scarcely studied in genetic programming (GP). Moreover, predicting when, or why, TL might succeed is an open question. This work presents an approach to determine when two problems might be compatible for TL. This question is studied for TL with GP for the first time, focusing on multiclass classification. Using a set of reference problems, each problem pair is categorized into one of two groups. TL compatible problems are problem pairs where TL was successful, while TL non-compatible problems are problem pairs where TL was unsuccessful, relative to baseline methods. DeepInsight is used to extract a 2D projection of the feature space of each problem, and a similarity measure is computed by registering the feature space representation of both problems. Results show that it is possible to distinguish between both groups with statistical significant results. The proposal does not require model training or inference, and can be applied to problems from different domains, with a different a number of samples, features and classes.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"4 1","pages":"159-173"},"PeriodicalIF":0.8,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82364871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph Convolution Network (GCN) algorithms have greatly improved the accuracy of skeleton-based human action recognition. GCN can utilize the spatial information between skeletal joints in subsequent frames better than other deep learning algorithms, which is beneficial for achieving high accuracy. However, the traditional GCN algorithms consume lots of computation for the stack of multiple primary GCN layers. Aiming at solving the problem, we introduce a lightweight network, a Differential Learning and Parallel Convolutional Networks (DL-PCN), whose key modules are Differential Learning (DLM) and the Parallel Convolutional Network (PCN). DLM features a feedforward connection, which carries the error information of GCN modules with the same structure, where GCN and CNN modules directly extract the original information from the input data, making the spatiotemporal information extracted by these modules more complete than that of GCN and CNN tandem structure. PCN comprises GCN and Convolution Neural Network (CNN) in parallel. Our network achieves comparable performance on the NTU RGB+D 60 dataset, the NTU RGB+D 120 dataset and the Northwestern-UCLA dataset while considering both accuracy and calculation parameters.
{"title":"DL-PCN: Differential learning and parallel convolutional network for action recognition","authors":"Qinyang Zeng, Ronghao Dang, Qin Fang, Chengju Liu, Qi Chen","doi":"10.3233/aic-220268","DOIUrl":"https://doi.org/10.3233/aic-220268","url":null,"abstract":"Graph Convolution Network (GCN) algorithms have greatly improved the accuracy of skeleton-based human action recognition. GCN can utilize the spatial information between skeletal joints in subsequent frames better than other deep learning algorithms, which is beneficial for achieving high accuracy. However, the traditional GCN algorithms consume lots of computation for the stack of multiple primary GCN layers. Aiming at solving the problem, we introduce a lightweight network, a Differential Learning and Parallel Convolutional Networks (DL-PCN), whose key modules are Differential Learning (DLM) and the Parallel Convolutional Network (PCN). DLM features a feedforward connection, which carries the error information of GCN modules with the same structure, where GCN and CNN modules directly extract the original information from the input data, making the spatiotemporal information extracted by these modules more complete than that of GCN and CNN tandem structure. PCN comprises GCN and Convolution Neural Network (CNN) in parallel. Our network achieves comparable performance on the NTU RGB+D 60 dataset, the NTU RGB+D 120 dataset and the Northwestern-UCLA dataset while considering both accuracy and calculation parameters.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"29 1","pages":"235-249"},"PeriodicalIF":0.8,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81230678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The multi-layer feature pyramid structure, represented by FPN, is widely used in object detection. However, due to the aliasing effect brought by up-sampling, the current feature pyramid structure still has defects, such as loss of high-level feature information and weakening of low-level small object features. In this paper, we propose FI-FPN to solve these problems, which is mainly composed of a multi-receptive field fusion (MRF) module, contextual information filtering (CIF) module, and efficient semantic information fusion (ESF) module. Particularly, MRF stacks dilated convolutional layers and max-pooling layers to obtain receptive fields of different scales, reducing the information loss of high-level features; CIF introduces a channel attention mechanism, and the channel attention weights are reassigned; ESF introduces channel concatenation instead of element-wise operation for bottom-up feature fusion and alleviating aliasing effects, facilitating efficient information flow. Experiments show that under the ResNet50 backbone, our method improves the performance of Faster RCNN and RetinaNet by 3.5 and 4.6 mAP, respectively. Our method has competitive performance compared to other advanced methods.
{"title":"FI-FPN: Feature-integration feature pyramid network for object detection","authors":"Qichen Su, Guangjian Zhang, Shuang Wu, Yiming Yin","doi":"10.3233/aic-220183","DOIUrl":"https://doi.org/10.3233/aic-220183","url":null,"abstract":"The multi-layer feature pyramid structure, represented by FPN, is widely used in object detection. However, due to the aliasing effect brought by up-sampling, the current feature pyramid structure still has defects, such as loss of high-level feature information and weakening of low-level small object features. In this paper, we propose FI-FPN to solve these problems, which is mainly composed of a multi-receptive field fusion (MRF) module, contextual information filtering (CIF) module, and efficient semantic information fusion (ESF) module. Particularly, MRF stacks dilated convolutional layers and max-pooling layers to obtain receptive fields of different scales, reducing the information loss of high-level features; CIF introduces a channel attention mechanism, and the channel attention weights are reassigned; ESF introduces channel concatenation instead of element-wise operation for bottom-up feature fusion and alleviating aliasing effects, facilitating efficient information flow. Experiments show that under the ResNet50 backbone, our method improves the performance of Faster RCNN and RetinaNet by 3.5 and 4.6 mAP, respectively. Our method has competitive performance compared to other advanced methods.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"2004 1","pages":"191-203"},"PeriodicalIF":0.8,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78269722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}