{"title":"A Dictionary based Compression Scheme for Natural Language Text with Reduced Bit Encoding","authors":"Md. Ashiq Mahmood, K. Hasan","doi":"10.1109/RAAICON48939.2019.62","DOIUrl":null,"url":null,"abstract":"Data compression, also called compaction, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques. Character encoding is genuinely related to data compression which represents characters with a type of encoding technique. Encoding characterizes the way toward putting a movement of characters into a specific arrangement for incredible transmission or point of confinement. Compression of data covers a goliath space of employments including data correspondence, data securing and database improvement. For the most part two surely understood compression procedures named Huffman and LZW are really utilized for text compression. In this paper, we propose an effective and straightforward compression techniques for huge common text by a 5 bit encoding scheme which can convert 8 bit characters to 5 bit named 5 Bit Encoding Scheme (5BE). It can most likely beat Huffman and LZW regarding compression proportion. This plan gives an encoding calculation changing over any 8 bit characters in English and Bangla by 5 bit by using a look up table. The look up table is created by utilizing Zipf dissemination which is a discrete circulation of generally utilized characters in various dialects. In the wake of changing over the characters into 5 bit, we consistently ascertain a k-Series scheme to build a database dictionary. With the penalty of storage for the dictionary, we compress a natural text by 87%. This dictionary will be used by the compression and decompression algorithms and to be employed in the client side. Therefore, constructed only once. Hence the facilities provided by the compression technique will be found without interruption. The reverse algorithm to recuperate the genuine data is additionally illustrated. We compare our algorithm to both the known Huffman and LZW technique. Promising efficiency is exhibited by our experimental result.","PeriodicalId":102214,"journal":{"name":"2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things (RAAICON)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things (RAAICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAAICON48939.2019.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Data compression, also called compaction, the process of reducing the amount of data needed for the storage or transmission of a given piece of information, typically by the use of encoding techniques. Character encoding is genuinely related to data compression which represents characters with a type of encoding technique. Encoding characterizes the way toward putting a movement of characters into a specific arrangement for incredible transmission or point of confinement. Compression of data covers a goliath space of employments including data correspondence, data securing and database improvement. For the most part two surely understood compression procedures named Huffman and LZW are really utilized for text compression. In this paper, we propose an effective and straightforward compression techniques for huge common text by a 5 bit encoding scheme which can convert 8 bit characters to 5 bit named 5 Bit Encoding Scheme (5BE). It can most likely beat Huffman and LZW regarding compression proportion. This plan gives an encoding calculation changing over any 8 bit characters in English and Bangla by 5 bit by using a look up table. The look up table is created by utilizing Zipf dissemination which is a discrete circulation of generally utilized characters in various dialects. In the wake of changing over the characters into 5 bit, we consistently ascertain a k-Series scheme to build a database dictionary. With the penalty of storage for the dictionary, we compress a natural text by 87%. This dictionary will be used by the compression and decompression algorithms and to be employed in the client side. Therefore, constructed only once. Hence the facilities provided by the compression technique will be found without interruption. The reverse algorithm to recuperate the genuine data is additionally illustrated. We compare our algorithm to both the known Huffman and LZW technique. Promising efficiency is exhibited by our experimental result.