{"title":"微矩阵的批处理Cholesky分解","authors":"F. Lemaitre, L. Lacassagne","doi":"10.1109/DASIP.2016.7853809","DOIUrl":null,"url":null,"abstract":"Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speedup the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current Intel SIMD architectures (SSE, AVX2, KNC, AVX512). We achieve with these transformations combined with SIMD a speedup from 13 to 31 for the whole resolution compared to the naive code on a single core AVX2 machine and a speedup from 15 to 33 with multithreading compared to the multithreaded naive code.","PeriodicalId":6494,"journal":{"name":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"5 1","pages":"130-137"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Batched Cholesky factorization for tiny matrices\",\"authors\":\"F. Lemaitre, L. Lacassagne\",\"doi\":\"10.1109/DASIP.2016.7853809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speedup the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current Intel SIMD architectures (SSE, AVX2, KNC, AVX512). We achieve with these transformations combined with SIMD a speedup from 13 to 31 for the whole resolution compared to the naive code on a single core AVX2 machine and a speedup from 15 to 33 with multithreading compared to the multithreaded naive code.\",\"PeriodicalId\":6494,\"journal\":{\"name\":\"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)\",\"volume\":\"5 1\",\"pages\":\"130-137\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DASIP.2016.7853809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASIP.2016.7853809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speedup the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current Intel SIMD architectures (SSE, AVX2, KNC, AVX512). We achieve with these transformations combined with SIMD a speedup from 13 to 31 for the whole resolution compared to the naive code on a single core AVX2 machine and a speedup from 15 to 33 with multithreading compared to the multithreaded naive code.