A. Britto, P. L. D. Souza, R. Sabourin, S. Souza, D. Borges
{"title":"基于集群计算的低成本并行K-means VQ算法","authors":"A. Britto, P. L. D. Souza, R. Sabourin, S. Souza, D. Borges","doi":"10.1109/ICDAR.2003.1227780","DOIUrl":null,"url":null,"abstract":"In this paper we propose a parallel approach for the K-meansVector Quantization (VQ) algorithm used in a two-stageHidden Markov Model (HMM)-based system forrecognizing handwritten numeral strings. With thisparallel algorithm, based on the master/slave paradigm,we overcome two drawbacks of the sequential version: a)the time taken to create the codebook; and b) the amountof memory necessary to work with large trainingdatabases. Distributing the training samples over theslaves' local disks reduces the overhead associated withthe communication process. In addition, modelspredicting computation and communication time havebeen developed. These models are useful to predict theoptimal number of slaves taking into account the numberof training samples and codebook size.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A low-cost parallel K-means VQ algorithm using cluster computing\",\"authors\":\"A. Britto, P. L. D. Souza, R. Sabourin, S. Souza, D. Borges\",\"doi\":\"10.1109/ICDAR.2003.1227780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a parallel approach for the K-meansVector Quantization (VQ) algorithm used in a two-stageHidden Markov Model (HMM)-based system forrecognizing handwritten numeral strings. With thisparallel algorithm, based on the master/slave paradigm,we overcome two drawbacks of the sequential version: a)the time taken to create the codebook; and b) the amountof memory necessary to work with large trainingdatabases. Distributing the training samples over theslaves' local disks reduces the overhead associated withthe communication process. In addition, modelspredicting computation and communication time havebeen developed. These models are useful to predict theoptimal number of slaves taking into account the numberof training samples and codebook size.\",\"PeriodicalId\":249193,\"journal\":{\"name\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2003.1227780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A low-cost parallel K-means VQ algorithm using cluster computing
In this paper we propose a parallel approach for the K-meansVector Quantization (VQ) algorithm used in a two-stageHidden Markov Model (HMM)-based system forrecognizing handwritten numeral strings. With thisparallel algorithm, based on the master/slave paradigm,we overcome two drawbacks of the sequential version: a)the time taken to create the codebook; and b) the amountof memory necessary to work with large trainingdatabases. Distributing the training samples over theslaves' local disks reduces the overhead associated withthe communication process. In addition, modelspredicting computation and communication time havebeen developed. These models are useful to predict theoptimal number of slaves taking into account the numberof training samples and codebook size.