Parallelization and characterization of SIFT on multi-core systems

Hao Feng, E. Li, Yurong Chen, Yimin Zhang
{"title":"Parallelization and characterization of SIFT on multi-core systems","authors":"Hao Feng, E. Li, Yurong Chen, Yimin Zhang","doi":"10.1109/IISWC.2008.4636087","DOIUrl":null,"url":null,"abstract":"This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2008.4636087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51

Abstract

This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SIFT在多核系统上的并行化与表征
本文在对称多处理器(SMP)平台和大规模芯片多处理器(CMP)模拟器上对一种重要的计算机视觉应用尺度不变特征变换(SIFT)进行了并行化和表征。SIFT是一种从图像中提取显著不变特征的方法,得到了广泛的应用。在许多计算机视觉问题中,都要求SIFT具有实时甚至超实时的处理能力。为了满足计算需求,我们对SIFT进行了优化和并行化,以加快其在多核系统上的执行速度。我们的研究表明,SIFT在16核SMP系统上可以实现9.7x ~ llx的加速。此外,单指令多数据(SIMD)和缓存感知优化最多能带来85%的性能提升。但它仍然比高清电视(HDTV)图像的实时性要求慢三倍。然后在64核CMP模拟器上研究了SIFT的性能。结果表明,对于HDTV图像,SIFT可以实现52倍的加速,并最终实现实时运行。除了并行化和优化工作外,我们还对SIFT在这两个平台上的性能进行了详细的分析。我们发现负载不平衡严重限制了可扩展性,并且SIFT在16核SMP系统上受到突发内存带宽需求的影响。然而,在64核CMP模拟器上,由于共享的最后一级缓存(LLC)在SIFT中容纳了巨大的读写共享,因此内存压力并不高。因此,它不影响扩展性能。简而言之,了解SIFT的特征可以帮助我们确定程序瓶颈,并为我们设计更好的系统提供进一步的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Workload characterization of selected JEE-based Web 2.0 applications Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis Evaluating the impact of dynamic binary translation systems on hardware cache performance On the representativeness of embedded Java benchmarks A workload for evaluating deep packet inspection architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1