For robots that perform autonomous exploration work in unknown environments, local planning is a key technol-ogy to determine whether the robot can work safely. In this paper, we propose a local planner based on the geometric features of the 3D point cloud for rescue robots, where the raw 3D point cloud are divided into Passable Areas (PA), Surmountable Obstacles (SO) and Insurmountable Obstacles (IO). The robot will obtain the orientation information of SO in real time. In the process of autonomous exploration, in order to ensure the safety of the robot, the robot should operate in the passable areas as much as possible. When there is no way, the robot can cross SO with lower risk according to its own obstacle crossing ability. This local planner newly defines the obstacles that can be crossed, so the robot has more flexible choices in the exploration process. The experimental results show that the safety can be improved for rescue robots during autonomous exploration.
{"title":"A Local Planner Based on Environmental Geometric Features for Rescue Robots","authors":"Ruihan Zeng, Wei Dai, Huimin Lu, Jiayang Liu, Hui Zhang","doi":"10.1109/ICNSC55942.2022.10004162","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004162","url":null,"abstract":"For robots that perform autonomous exploration work in unknown environments, local planning is a key technol-ogy to determine whether the robot can work safely. In this paper, we propose a local planner based on the geometric features of the 3D point cloud for rescue robots, where the raw 3D point cloud are divided into Passable Areas (PA), Surmountable Obstacles (SO) and Insurmountable Obstacles (IO). The robot will obtain the orientation information of SO in real time. In the process of autonomous exploration, in order to ensure the safety of the robot, the robot should operate in the passable areas as much as possible. When there is no way, the robot can cross SO with lower risk according to its own obstacle crossing ability. This local planner newly defines the obstacles that can be crossed, so the robot has more flexible choices in the exploration process. The experimental results show that the safety can be improved for rescue robots during autonomous exploration.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127043009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004052
Zhixue Liang, Wenyong Dong, Bo Zhang
Transformer-based architectures have recently witnessed significant progress in visual object tracking. However, most transformer-based trackers adopt hybrid networks, which use the convolutional neural networks (CNNs) to extract the features and the transformers to fuse and enhance them. Furthermore, most of transformer-based trackers only consider spatial dependencies between the target object and the search region, but ignore temporal relations. Simultaneously considered the temporal and spatial properties inherent in video sequences, this paper presents a hierarchical transformer with temporal memory and spatial attention network for visual tracking, named HTransT ++. The proposed network employs a hierarchical transformer as the backbone to extract multi-level features. By adopting transformer-based encoder and decoder to fuse historic template features and search region image features, the spatial and temporal dependencies across video frames are captured in tracking. Extensive experiments show that our proposed method (HTransT ++) achieves outstanding performance on four visual tracking benchmarks, including VOT2018, GOT-10K, TrackingNet, and LaSOT, while running at real-time speed.
{"title":"HTransT++: Hierarchical Transformer with Temporal Memory and Spatial Attention for Visual Tracking","authors":"Zhixue Liang, Wenyong Dong, Bo Zhang","doi":"10.1109/ICNSC55942.2022.10004052","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004052","url":null,"abstract":"Transformer-based architectures have recently witnessed significant progress in visual object tracking. However, most transformer-based trackers adopt hybrid networks, which use the convolutional neural networks (CNNs) to extract the features and the transformers to fuse and enhance them. Furthermore, most of transformer-based trackers only consider spatial dependencies between the target object and the search region, but ignore temporal relations. Simultaneously considered the temporal and spatial properties inherent in video sequences, this paper presents a hierarchical transformer with temporal memory and spatial attention network for visual tracking, named HTransT ++. The proposed network employs a hierarchical transformer as the backbone to extract multi-level features. By adopting transformer-based encoder and decoder to fuse historic template features and search region image features, the spatial and temporal dependencies across video frames are captured in tracking. Extensive experiments show that our proposed method (HTransT ++) achieves outstanding performance on four visual tracking benchmarks, including VOT2018, GOT-10K, TrackingNet, and LaSOT, while running at real-time speed.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129967719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004112
Zhidong Zhang, Xiubin Zhu, Ding Liu
Random forests (RF) is an ensemble classification approach, which is easy to use and is helpful to avoid over-fitting. However, in the complex data environment, its prediction accuracy could be deteriorated. Gradient boosting decision tree (GBDT) is another widely used in classification problems because of its high prediction accuracy and interpretability. In order to improve the performance of random forest in solving classification problems, this paper proposes a gradient boosting random forest (GBRF) algorithm. GBRF algorithm employs the idea of gradient to optimize decision tree at the bottom of random forest into gradient boosting decision tree, which improves the prediction accuracy of the bottom tree, and thus improves the prediction performance of random forest. To verify the effectiveness of GBRF algorithm, data sets in UCI and KEEL are used for group testing. The results show that the classification accuracy of GBRF algorithm has a higher prediction accuracy improvement compared to random forest and the performance improvement is more than 5 percent, which indicates that GBRF algorithm performs better than the original random forest.
{"title":"Model of Gradient Boosting Random Forest Prediction","authors":"Zhidong Zhang, Xiubin Zhu, Ding Liu","doi":"10.1109/ICNSC55942.2022.10004112","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004112","url":null,"abstract":"Random forests (RF) is an ensemble classification approach, which is easy to use and is helpful to avoid over-fitting. However, in the complex data environment, its prediction accuracy could be deteriorated. Gradient boosting decision tree (GBDT) is another widely used in classification problems because of its high prediction accuracy and interpretability. In order to improve the performance of random forest in solving classification problems, this paper proposes a gradient boosting random forest (GBRF) algorithm. GBRF algorithm employs the idea of gradient to optimize decision tree at the bottom of random forest into gradient boosting decision tree, which improves the prediction accuracy of the bottom tree, and thus improves the prediction performance of random forest. To verify the effectiveness of GBRF algorithm, data sets in UCI and KEEL are used for group testing. The results show that the classification accuracy of GBRF algorithm has a higher prediction accuracy improvement compared to random forest and the performance improvement is more than 5 percent, which indicates that GBRF algorithm performs better than the original random forest.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132854808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The issue of resource shortage has received much attention in recent years. Recycling end-of-life (EOL) products is conducive to alleviating the issue as well protecting the environment. In a practical disassembly process, the disassembly time of EOL products is affected by many factors. In this paper, we address the impact of tool deterioration on disassembly time. A disassembly line balancing model with a goal to maximize disassembly profit is established. In addition, we use a migratory bird optimizer to solve the problem. The feasibility and superiority of the algorithm are verified by comparing it with the salp swarm algorithm.
{"title":"Discrete Migratory Bird Optimizer for Disassembly Line Balancing Problem Considering Tool Deterioration","authors":"Jiaxin Wang, Xiwang Guo, Jiacun Wang, Shujin Qin, Liang Qi, Ying Tang","doi":"10.1109/ICNSC55942.2022.10004124","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004124","url":null,"abstract":"The issue of resource shortage has received much attention in recent years. Recycling end-of-life (EOL) products is conducive to alleviating the issue as well protecting the environment. In a practical disassembly process, the disassembly time of EOL products is affected by many factors. In this paper, we address the impact of tool deterioration on disassembly time. A disassembly line balancing model with a goal to maximize disassembly profit is established. In addition, we use a migratory bird optimizer to solve the problem. The feasibility and superiority of the algorithm are verified by comparing it with the salp swarm algorithm.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133002077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, based on the collected magnetic signals by sensors, mineral sorting machines can separate iron ore rapidly. However, there are too many interfering signals to reduce the accuracy of the ore separation. For mineral sorting machines, frame resonance is one common factor and it generates interference signal such that the collected magnetic signals contain many noise. Thus, to batter collect the signal from sensors, it is necessary to reduce the frame resonance. To deal with this problem, this paper establishes a finite element assembly model of the mineral sorting frame by SolidWorks. Then, ANSYS Workbench is adopted to obtain the natural frequency and vibration mode of the mineral sorting frame in the free-state. Besides, the effectiveness of the theoretical analysis is verified by comparing the modal test and simulation results. Based on the characteristics of external excitation frequency, the motor parameters and frame structure are adjusted respectively. The results show that, when the motor speed is 200 r/min, the corresponding meshing excitation of chain-driven frequency is 66.670Hz. Further, to reduce the meshing excitation frequencies of the chain-driven, the connecting beam is installed on the upper level of the frame. In this way, the frame resonance could be avoided effectively such that the related noise is removed and sensors can collect high quality signals.
{"title":"Noise Reduction Study of Signal Detection in Hall Sensors by Modal Analysis","authors":"FaHua Zeng, Wenqing Xiong, Chunrong Pan, Lingzhi Li, Yankui Ren","doi":"10.1109/ICNSC55942.2022.10004177","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004177","url":null,"abstract":"Nowadays, based on the collected magnetic signals by sensors, mineral sorting machines can separate iron ore rapidly. However, there are too many interfering signals to reduce the accuracy of the ore separation. For mineral sorting machines, frame resonance is one common factor and it generates interference signal such that the collected magnetic signals contain many noise. Thus, to batter collect the signal from sensors, it is necessary to reduce the frame resonance. To deal with this problem, this paper establishes a finite element assembly model of the mineral sorting frame by SolidWorks. Then, ANSYS Workbench is adopted to obtain the natural frequency and vibration mode of the mineral sorting frame in the free-state. Besides, the effectiveness of the theoretical analysis is verified by comparing the modal test and simulation results. Based on the characteristics of external excitation frequency, the motor parameters and frame structure are adjusted respectively. The results show that, when the motor speed is 200 r/min, the corresponding meshing excitation of chain-driven frequency is 66.670Hz. Further, to reduce the meshing excitation frequencies of the chain-driven, the connecting beam is installed on the upper level of the frame. In this way, the frame resonance could be avoided effectively such that the related noise is removed and sensors can collect high quality signals.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130910440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004130
Yu Wang, He Huang, Yunni Xia
Text classification is an almost unavoidable process in natural language processing and has a wide range of application scenarios in industry. Although many existing methods can achieve superior classification results, raising the effect of text classification not only poses a great challenge, but also provides a longitudinal study of technological improvement. Based on the pre-trained bidirectional encoder representations from transformer (BERT) model and in-depth research on deep learning, we propose a multi-model, mixed-Chinese classification model (MCCM) based on BERT (MCCM-BERT) to process Chinese text-classification tasks. The experimental results show that the proposed MCCM BERT model outperforms BERT in text classification tasks, especially in Chinese long text classification, with an accuracy improvement of up to 2.28%.
{"title":"Improving Multi-model Hybrid Chinese Long-text Classification through BERT Optimisation","authors":"Yu Wang, He Huang, Yunni Xia","doi":"10.1109/ICNSC55942.2022.10004130","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004130","url":null,"abstract":"Text classification is an almost unavoidable process in natural language processing and has a wide range of application scenarios in industry. Although many existing methods can achieve superior classification results, raising the effect of text classification not only poses a great challenge, but also provides a longitudinal study of technological improvement. Based on the pre-trained bidirectional encoder representations from transformer (BERT) model and in-depth research on deep learning, we propose a multi-model, mixed-Chinese classification model (MCCM) based on BERT (MCCM-BERT) to process Chinese text-classification tasks. The experimental results show that the proposed MCCM BERT model outperforms BERT in text classification tasks, especially in Chinese long text classification, with an accuracy improvement of up to 2.28%.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130219829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004178
Zhuohan Zhang, Ziyan Zhao, Yang Zhang, Shixin Liu
Logistics planning is a key to the coordination of multiple processes in steel production systems. This work investigates a new and practical bi-objective logistics planning problem arising from steelmaking-hot rolling-cold rolling processes. Its first objective is to minimize the sum of fixed costs, transportation costs, out-of-stock penalties, and inventory costs. The second one is to balance the workload of parallel machines. A mixed integer linear program is formulated for the concerned problem. To solve it, a genetic algorithm is problem-specifically designed. In it, the concerned bi-objective optimization problem is first transformed into a single-objective one by weighting two objective functions. Then, Pareto solutions are obtained through the presented algorithm by adjusting the weighted coefficients. Experimental results obtained by the presented algorithm are compared with those obtained by solving the mixed integer linear program with CPLEX. Its great performance is verified, thus showing its readiness to be applied in practice.
{"title":"Multi-Process Logistics Planning for Cost Minimization and Workload Balance in Steel Production Systems","authors":"Zhuohan Zhang, Ziyan Zhao, Yang Zhang, Shixin Liu","doi":"10.1109/ICNSC55942.2022.10004178","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004178","url":null,"abstract":"Logistics planning is a key to the coordination of multiple processes in steel production systems. This work investigates a new and practical bi-objective logistics planning problem arising from steelmaking-hot rolling-cold rolling processes. Its first objective is to minimize the sum of fixed costs, transportation costs, out-of-stock penalties, and inventory costs. The second one is to balance the workload of parallel machines. A mixed integer linear program is formulated for the concerned problem. To solve it, a genetic algorithm is problem-specifically designed. In it, the concerned bi-objective optimization problem is first transformed into a single-objective one by weighting two objective functions. Then, Pareto solutions are obtained through the presented algorithm by adjusting the weighted coefficients. Experimental results obtained by the presented algorithm are compared with those obtained by solving the mixed integer linear program with CPLEX. Its great performance is verified, thus showing its readiness to be applied in practice.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004054
Xin Liang, Yifan Hou, Mi Zhao
Colored Petri nets (CPN) can be used to model and study systems with discrete, asynchronous, and concurrent behaviors. Microgrid systems have these features, that can be modeled and studied by CPN. In this paper, the energy scheduling between various distributed power sources and users in a microgrid system is studied based on the analysis of working characteristics of wind turbines, photovoltaic arrays and flexible loads. A CPN model of a microgrid system including distributed power generations is established, which can realize the functions of scheduling energy generated by distributed power generators in the microgrid system and interacting with the external power grid. Owing to the modular and hierarchical modeling method, the proposed model can be conveniently expanded in the scale and function as required, which has universality and adaptability. Finally, the theoretical significance and practical values of the established model are demonstrated by the system simulation.
{"title":"Modeling and Analysis of Microgrid Energy Scheduling Based on Colored Petri Net","authors":"Xin Liang, Yifan Hou, Mi Zhao","doi":"10.1109/ICNSC55942.2022.10004054","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004054","url":null,"abstract":"Colored Petri nets (CPN) can be used to model and study systems with discrete, asynchronous, and concurrent behaviors. Microgrid systems have these features, that can be modeled and studied by CPN. In this paper, the energy scheduling between various distributed power sources and users in a microgrid system is studied based on the analysis of working characteristics of wind turbines, photovoltaic arrays and flexible loads. A CPN model of a microgrid system including distributed power generations is established, which can realize the functions of scheduling energy generated by distributed power generators in the microgrid system and interacting with the external power grid. Owing to the modular and hierarchical modeling method, the proposed model can be conveniently expanded in the scale and function as required, which has universality and adaptability. Finally, the theoretical significance and practical values of the established model are demonstrated by the system simulation.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125240918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004088
Geng Wang, H. Fang, S. Zhang
For addressing the problems that the existing data cannot be effectively integrated with the process control decision-making in the digital mine system, a digital twin system for underground locomotive dispatching management is proposed in this work. Firstly, the designed digital twin system has a four-tier system architecture, while the part of theoretical prior model is established by Petri nets, and the actual dispatching is constructed by integrating simulation software and a three-dimensional (3D) virtual mapping for underground mines. Then, the effectiveness of dispatching algorithm and the consistency of twin information are systematically evaluated through the setting of simulation parameters and the access of twin data. The designed digital twin system of underground locomotive dispatching can not only visualize the dispatching system in two directions, but also provide decision-making basis for the safety of locomotive dispatching, which deepen and widen the application scope of digital twin system in the field of intelligent mine construction.
{"title":"An Approach of Digital Twin System Construction for Underground Locomotive Dispatching Management","authors":"Geng Wang, H. Fang, S. Zhang","doi":"10.1109/ICNSC55942.2022.10004088","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004088","url":null,"abstract":"For addressing the problems that the existing data cannot be effectively integrated with the process control decision-making in the digital mine system, a digital twin system for underground locomotive dispatching management is proposed in this work. Firstly, the designed digital twin system has a four-tier system architecture, while the part of theoretical prior model is established by Petri nets, and the actual dispatching is constructed by integrating simulation software and a three-dimensional (3D) virtual mapping for underground mines. Then, the effectiveness of dispatching algorithm and the consistency of twin information are systematically evaluated through the setting of simulation parameters and the access of twin data. The designed digital twin system of underground locomotive dispatching can not only visualize the dispatching system in two directions, but also provide decision-making basis for the safety of locomotive dispatching, which deepen and widen the application scope of digital twin system in the field of intelligent mine construction.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127648278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-15DOI: 10.1109/ICNSC55942.2022.10004147
Yin Liu, Lingyun Wang, Xin Xu, Xiaopeng Luo
Weakly supervised object localization(WSOL) is a task that only uses image-level supervision information to locate objects. Traditional CNN-based methods always locate the most discriminative regions of objects and cannot well balance the accuracy of classification and localization. To solve this problem, we propose an enhanced and global semantic activation(EGSA) method based on the vision transformer model. We first use an attention reassign module to get a comprehensive attention map that contains the correlation between each image patch and the global dependency of the class token. Then a mask selection module that generates a mask map by comparing with mask threshold is proposed to obtain the token feature map of the non-discriminative object region. By coupling the above two maps and combining it with a semantic aware map contains the information of class token, the final localization map with enhanced and global semantic activation can be built. And experiments on two common benchmark datasets CUB-200-2011 and ILSVRC demonstrate the efficiency of our method.
{"title":"EGSA: Enhanced and Global Semantic Activation for Weakly Supervised Object Localization","authors":"Yin Liu, Lingyun Wang, Xin Xu, Xiaopeng Luo","doi":"10.1109/ICNSC55942.2022.10004147","DOIUrl":"https://doi.org/10.1109/ICNSC55942.2022.10004147","url":null,"abstract":"Weakly supervised object localization(WSOL) is a task that only uses image-level supervision information to locate objects. Traditional CNN-based methods always locate the most discriminative regions of objects and cannot well balance the accuracy of classification and localization. To solve this problem, we propose an enhanced and global semantic activation(EGSA) method based on the vision transformer model. We first use an attention reassign module to get a comprehensive attention map that contains the correlation between each image patch and the global dependency of the class token. Then a mask selection module that generates a mask map by comparing with mask threshold is proposed to obtain the token feature map of the non-discriminative object region. By coupling the above two maps and combining it with a semantic aware map contains the information of class token, the final localization map with enhanced and global semantic activation can be built. And experiments on two common benchmark datasets CUB-200-2011 and ILSVRC demonstrate the efficiency of our method.","PeriodicalId":230499,"journal":{"name":"2022 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127683559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}