Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu
{"title":"视频目标跟踪预测决策网络","authors":"Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu","doi":"10.1109/ICIP40778.2020.9191145","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"832 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction-Decision Network For Video Object Tracking\",\"authors\":\"Yasheng Sun, Tao He, Ying-hong Peng, Jin Qi, Jie Hu\",\"doi\":\"10.1109/ICIP40778.2020.9191145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.\",\"PeriodicalId\":405734,\"journal\":{\"name\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"832 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP40778.2020.9191145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP40778.2020.9191145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prediction-Decision Network For Video Object Tracking
In this paper, we introduce an approach for visual tracking in videos that predicts the bounding box location of a target object at every frame. This tracking problem is formulated as a sequential decision-making process where both historical and current information are taken into account to decide the correct object location. We develop a deep reinforcement learning based strategy, via which the target object position is predicted and decided in a unified framework. Specifically, a RNN based prediction network is developed where local features and global features are fused together to predict object movement. Together with the predicted movement, some predefined possible offsets and detection results form into an action space. A decision network is trained in a reinforcement manner to learn to select the most reasonable tracking box from the action space, through which the target object is tracked at each frame. Experiments in an existing tracking benchmark demonstrate the effectiveness and robustness of our proposed strategy.