Zhiwen Chen;Jinjian Wu;Weisheng Dong;Leida Li;Guangming Shi
{"title":"交叉:用事件相机增强面向运动的对象跟踪","authors":"Zhiwen Chen;Jinjian Wu;Weisheng Dong;Leida Li;Guangming Shi","doi":"10.1109/TIP.2024.3505672","DOIUrl":null,"url":null,"abstract":"With the differential sensitivity and high time resolution, event cameras can record detailed motion clues, which form a complementary advantage with frame-based cameras to enhance the object tracking, especially in challenging dynamic scenes. However, how to better match heterogeneous event-image data and exploit rich complementary cues from them still remains an open issue. In this paper, we align event-image modalities by proposing a motion adaptive event sampling method, and we revisit the cross-complementarities of event-image data to design a bidirectional-enhanced fusion framework. Specifically, this sampling strategy can adapt to different dynamic scenes and integrate aligned event-image pairs. Besides, we design an image-guided motion estimation unit for extracting explicit instance-level motions, aiming at refining the uncertain event clues to distinguish primary objects and background. Then, a semantic modulation module is devised to utilize the enhanced object motion to modify the image features. Coupled with these two modules, this framework learns both the high motion sensitivity of events and the full texture of images to achieve more accurate and robust tracking. The proposed method is easily embedded in existing tracking pipelines, and trained end-to-end. We evaluate it on four large benchmarks, i.e. FE108, VisEvent, FE240hz and CoeSot. Extensive experiments demonstrate our method achieves state-of-the-art performance, and large improvements are pointed as contributions by our sampling strategy and fusion concept.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"73-84"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CrossEI: Boosting Motion-Oriented Object Tracking With an Event Camera\",\"authors\":\"Zhiwen Chen;Jinjian Wu;Weisheng Dong;Leida Li;Guangming Shi\",\"doi\":\"10.1109/TIP.2024.3505672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the differential sensitivity and high time resolution, event cameras can record detailed motion clues, which form a complementary advantage with frame-based cameras to enhance the object tracking, especially in challenging dynamic scenes. However, how to better match heterogeneous event-image data and exploit rich complementary cues from them still remains an open issue. In this paper, we align event-image modalities by proposing a motion adaptive event sampling method, and we revisit the cross-complementarities of event-image data to design a bidirectional-enhanced fusion framework. Specifically, this sampling strategy can adapt to different dynamic scenes and integrate aligned event-image pairs. Besides, we design an image-guided motion estimation unit for extracting explicit instance-level motions, aiming at refining the uncertain event clues to distinguish primary objects and background. Then, a semantic modulation module is devised to utilize the enhanced object motion to modify the image features. Coupled with these two modules, this framework learns both the high motion sensitivity of events and the full texture of images to achieve more accurate and robust tracking. The proposed method is easily embedded in existing tracking pipelines, and trained end-to-end. We evaluate it on four large benchmarks, i.e. FE108, VisEvent, FE240hz and CoeSot. Extensive experiments demonstrate our method achieves state-of-the-art performance, and large improvements are pointed as contributions by our sampling strategy and fusion concept.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"73-84\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10776574/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10776574/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CrossEI: Boosting Motion-Oriented Object Tracking With an Event Camera
With the differential sensitivity and high time resolution, event cameras can record detailed motion clues, which form a complementary advantage with frame-based cameras to enhance the object tracking, especially in challenging dynamic scenes. However, how to better match heterogeneous event-image data and exploit rich complementary cues from them still remains an open issue. In this paper, we align event-image modalities by proposing a motion adaptive event sampling method, and we revisit the cross-complementarities of event-image data to design a bidirectional-enhanced fusion framework. Specifically, this sampling strategy can adapt to different dynamic scenes and integrate aligned event-image pairs. Besides, we design an image-guided motion estimation unit for extracting explicit instance-level motions, aiming at refining the uncertain event clues to distinguish primary objects and background. Then, a semantic modulation module is devised to utilize the enhanced object motion to modify the image features. Coupled with these two modules, this framework learns both the high motion sensitivity of events and the full texture of images to achieve more accurate and robust tracking. The proposed method is easily embedded in existing tracking pipelines, and trained end-to-end. We evaluate it on four large benchmarks, i.e. FE108, VisEvent, FE240hz and CoeSot. Extensive experiments demonstrate our method achieves state-of-the-art performance, and large improvements are pointed as contributions by our sampling strategy and fusion concept.