{"title":"基于改进Lempel-Ziv-Welch算法的孟加拉文文本压缩","authors":"Linkon Barua, P. K. Dhar, Lamia Alam, I. Echizen","doi":"10.1109/ECACE.2017.7913022","DOIUrl":null,"url":null,"abstract":"Text compression algorithm performs compression at the character level. Bangla text has some unique features such as no distinct upper and lower case letter, consonant cluster (CC) and consonant with dependent vowel sign (CV) etc. The conventional Lempel-Ziv-Welch (LZW) algorithm is not suitable for compressing Bangle text. Therefore, in this paper, we propose a modified LZW (MLZW) algorithm which can compress Bangla text effectively and efficiently. In our proposed method, a dictionary with Unicode ranges from 1–90 is used for Bangla characters. The compression process is started with checking the input character. If input character is a part of CC or CV, then CC or CV is considered as a character and search it in the dictionary. If the character to be encoded is already in dictionary, encode it with the dictionary index. Otherwise, the character is added to the dictionary and is encoded with its corresponding dictionary index. Simulation results indicate that the proposed MLZW algorithm compresses Bangla text effectively and efficiently. We observed that the proposed MLZW provides higher compression rate approximately 3% for dictionary index and 33% for output sequence compared with LZW algorithm.","PeriodicalId":333370,"journal":{"name":"2017 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Bangla text compression based on modified Lempel-Ziv-Welch algorithm\",\"authors\":\"Linkon Barua, P. K. Dhar, Lamia Alam, I. Echizen\",\"doi\":\"10.1109/ECACE.2017.7913022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text compression algorithm performs compression at the character level. Bangla text has some unique features such as no distinct upper and lower case letter, consonant cluster (CC) and consonant with dependent vowel sign (CV) etc. The conventional Lempel-Ziv-Welch (LZW) algorithm is not suitable for compressing Bangle text. Therefore, in this paper, we propose a modified LZW (MLZW) algorithm which can compress Bangla text effectively and efficiently. In our proposed method, a dictionary with Unicode ranges from 1–90 is used for Bangla characters. The compression process is started with checking the input character. If input character is a part of CC or CV, then CC or CV is considered as a character and search it in the dictionary. If the character to be encoded is already in dictionary, encode it with the dictionary index. Otherwise, the character is added to the dictionary and is encoded with its corresponding dictionary index. Simulation results indicate that the proposed MLZW algorithm compresses Bangla text effectively and efficiently. We observed that the proposed MLZW provides higher compression rate approximately 3% for dictionary index and 33% for output sequence compared with LZW algorithm.\",\"PeriodicalId\":333370,\"journal\":{\"name\":\"2017 International Conference on Electrical, Computer and Communication Engineering (ECCE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Electrical, Computer and Communication Engineering (ECCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECACE.2017.7913022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECACE.2017.7913022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bangla text compression based on modified Lempel-Ziv-Welch algorithm
Text compression algorithm performs compression at the character level. Bangla text has some unique features such as no distinct upper and lower case letter, consonant cluster (CC) and consonant with dependent vowel sign (CV) etc. The conventional Lempel-Ziv-Welch (LZW) algorithm is not suitable for compressing Bangle text. Therefore, in this paper, we propose a modified LZW (MLZW) algorithm which can compress Bangla text effectively and efficiently. In our proposed method, a dictionary with Unicode ranges from 1–90 is used for Bangla characters. The compression process is started with checking the input character. If input character is a part of CC or CV, then CC or CV is considered as a character and search it in the dictionary. If the character to be encoded is already in dictionary, encode it with the dictionary index. Otherwise, the character is added to the dictionary and is encoded with its corresponding dictionary index. Simulation results indicate that the proposed MLZW algorithm compresses Bangla text effectively and efficiently. We observed that the proposed MLZW provides higher compression rate approximately 3% for dictionary index and 33% for output sequence compared with LZW algorithm.