Parallelization and characterization of SIFT on multi-core systems

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI:10.1109/IISWC.2008.4636087

Hao Feng, E. Li, Yurong Chen, Yimin Zhang

{"title":"Parallelization and characterization of SIFT on multi-core systems","authors":"Hao Feng, E. Li, Yurong Chen, Yimin Zhang","doi":"10.1109/IISWC.2008.4636087","DOIUrl":null,"url":null,"abstract":"This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2008.4636087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

Abstract

This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SIFT在多核系统上的并行化与表征

本文在对称多处理器(SMP)平台和大规模芯片多处理器(CMP)模拟器上对一种重要的计算机视觉应用尺度不变特征变换(SIFT)进行了并行化和表征。SIFT是一种从图像中提取显著不变特征的方法，得到了广泛的应用。在许多计算机视觉问题中，都要求SIFT具有实时甚至超实时的处理能力。为了满足计算需求，我们对SIFT进行了优化和并行化，以加快其在多核系统上的执行速度。我们的研究表明，SIFT在16核SMP系统上可以实现9.7x ~ llx的加速。此外，单指令多数据(SIMD)和缓存感知优化最多能带来85%的性能提升。但它仍然比高清电视(HDTV)图像的实时性要求慢三倍。然后在64核CMP模拟器上研究了SIFT的性能。结果表明，对于HDTV图像，SIFT可以实现52倍的加速，并最终实现实时运行。除了并行化和优化工作外，我们还对SIFT在这两个平台上的性能进行了详细的分析。我们发现负载不平衡严重限制了可扩展性，并且SIFT在16核SMP系统上受到突发内存带宽需求的影响。然而，在64核CMP模拟器上，由于共享的最后一级缓存(LLC)在SIFT中容纳了巨大的读写共享，因此内存压力并不高。因此，它不影响扩展性能。简而言之，了解SIFT的特征可以帮助我们确定程序瓶颈，并为我们设计更好的系统提供进一步的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 IEEE International Symposium on Workload Characterization

自引率

0.00%

发文量