V. Sundaresan, S. Nichani, N. Ranganathan, R. Sankar
{"title":"A VLSI hardware accelerator for dynamic time warping","authors":"V. Sundaresan, S. Nichani, N. Ranganathan, R. Sankar","doi":"10.1109/ICPR.1992.202121","DOIUrl":null,"url":null,"abstract":"Describes an area and time efficient systolic array architecture for computations in Dynamic Time Warping (DTW). The special purpose architecture is used to perform the band matrix multiplication in order to compute the local distance metric based on Itakura's log likelihood distance. The time complexity of the algorithm is O(nk) where n and k are the number of elements in the row of the first and second input matrices. The number of processors is equal to the bandwidth w of the output band matrix. The speedup of the parallel algorithm compared to the sequential algorithm is wz where z is the multiplier stages within a PE. The parallel algorithm can be implemented as a single VLSI chip.<<ETX>>","PeriodicalId":34917,"journal":{"name":"模式识别与人工智能","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1992-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"模式识别与人工智能","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1109/ICPR.1992.202121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 6
Abstract
Describes an area and time efficient systolic array architecture for computations in Dynamic Time Warping (DTW). The special purpose architecture is used to perform the band matrix multiplication in order to compute the local distance metric based on Itakura's log likelihood distance. The time complexity of the algorithm is O(nk) where n and k are the number of elements in the row of the first and second input matrices. The number of processors is equal to the bandwidth w of the output band matrix. The speedup of the parallel algorithm compared to the sequential algorithm is wz where z is the multiplier stages within a PE. The parallel algorithm can be implemented as a single VLSI chip.<>