Rethinking Efficient and Effective Point-Based Networks for Event Camera Classification and Regression

Hongwei Ren;Yue Zhou;Jiadong Zhu;Xiaopeng Lin;Haotian Fu;Yulong Huang;Yuetong Fang;Fei Ma;Hao Yu;Bojun Cheng
{"title":"Rethinking Efficient and Effective Point-Based Networks for Event Camera Classification and Regression","authors":"Hongwei Ren;Yue Zhou;Jiadong Zhu;Xiaopeng Lin;Haotian Fu;Yulong Huang;Yuetong Fang;Fei Ma;Hao Yu;Bojun Cheng","doi":"10.1109/TPAMI.2025.3556561","DOIUrl":null,"url":null,"abstract":"Event cameras draw inspiration from biological systems, boasting low latency and high dynamic range while consuming minimal power. The most current approach to processing Event Cloud often involves converting it into frame-based representations, which neglects the sparsity of events, loses fine-grained temporal information, and increases the computational burden. In contrast, Point Cloud is a popular representation for processing 3-dimensional data and serves as an alternative method to exploit local and global spatial features. Nevertheless, previous point-based methods show an unsatisfactory performance compared to the frame-based method in dealing with spatio-temporal event streams. In order to bridge the gap, we propose EventMamba, an efficient and effective framework based on Point Cloud representation by rethinking the distinction between Event Cloud and Point Cloud, emphasizing vital temporal information. The Event Cloud is subsequently fed into a hierarchical structure with staged modules to process both implicit and explicit temporal features. Specifically, we redesign the global extractor to enhance explicit temporal extraction among a long sequence of events with temporal aggregation and State Space Model (SSM) based Mamba. Our model consumes minimal computational resources in the experiments and still exhibits SOTA point-based performance on six different scales of action recognition datasets. It even outperformed all frame-based methods on both Camera Pose Relocalization (CPR) and eye-tracking regression tasks.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6228-6241"},"PeriodicalIF":18.6000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10946204/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Event cameras draw inspiration from biological systems, boasting low latency and high dynamic range while consuming minimal power. The most current approach to processing Event Cloud often involves converting it into frame-based representations, which neglects the sparsity of events, loses fine-grained temporal information, and increases the computational burden. In contrast, Point Cloud is a popular representation for processing 3-dimensional data and serves as an alternative method to exploit local and global spatial features. Nevertheless, previous point-based methods show an unsatisfactory performance compared to the frame-based method in dealing with spatio-temporal event streams. In order to bridge the gap, we propose EventMamba, an efficient and effective framework based on Point Cloud representation by rethinking the distinction between Event Cloud and Point Cloud, emphasizing vital temporal information. The Event Cloud is subsequently fed into a hierarchical structure with staged modules to process both implicit and explicit temporal features. Specifically, we redesign the global extractor to enhance explicit temporal extraction among a long sequence of events with temporal aggregation and State Space Model (SSM) based Mamba. Our model consumes minimal computational resources in the experiments and still exhibits SOTA point-based performance on six different scales of action recognition datasets. It even outperformed all frame-based methods on both Camera Pose Relocalization (CPR) and eye-tracking regression tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于点的事件相机分类与回归网络的再思考
事件相机从生物系统中汲取灵感,在消耗最小功率的同时具有低延迟和高动态范围。处理事件云的最新方法通常涉及将其转换为基于帧的表示,这忽略了事件的稀疏性,丢失了细粒度的时间信息,并增加了计算负担。相比之下,点云是处理三维数据的一种流行表示,可以作为利用局部和全局空间特征的替代方法。然而,与基于帧的方法相比,以往基于点的方法在处理时空事件流方面表现出令人不满意的性能。为了弥补这一差距,我们重新思考了事件云和点云的区别,提出了一种基于点云表示的高效框架EventMamba,强调重要的时间信息。事件云随后被馈送到一个分层结构中,其中有阶段模块来处理隐式和显式时间特征。具体来说,我们重新设计了全局提取器,通过时间聚合和基于状态空间模型(SSM)的Mamba来增强长序列事件的显式时间提取。我们的模型在实验中消耗最小的计算资源,并且在六个不同尺度的动作识别数据集上仍然表现出基于SOTA点的性能。在相机姿势重新定位(CPR)和眼球追踪回归任务上,它甚至优于所有基于帧的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. Bi-C2R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement. Principled Multimodal Representation Learning. Class-Distribution-Aware Pseudo-Labeling for Semi-Supervised Multi-Label Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1