Efficient Lossless Compression Strategy Using Data Distribution Further
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the domain of digital information, the efficient storage and transmission of data have become increasingly crucial. Lossless compression is a powerful technique that is widely used for compressing text, images, and videos for reducing the size of data, saving storage space, and enhancing data transmission efficiency across networks without any quality loss. For the purpose of achieving lossless compression, various algorithms are utilized by making trade-offs among compression ratio, speed, and computational efficiency. Among these algorithms are Huffman coding, Run-Length Encoding, and Lempel-Ziv compression. However, these algorithms disregard the self-description information of data, namely symbol type, quantity, and distribution characteristics. They are incapable of objectively analyzing the data characteristics and thus cannot guide a future choice of appropriate compression strategies in accordance with data features. Therefore, we introduce an innovative lossless compression strategy which utilizes data distribution further to overcome the existing drawbacks and improve the overall efficiency and accuracy of data compression. The proposed scheme relies on a three-step methodology. First, the original data is transformed into a sequence of tokens table. Then, tokens table is fed into an analysis framework to execute the analysis of data characteristics using an objective comparison parameter as a criterion, which generates a sorted token alphabet along with its location in accordance with the occurrence frequency of the tokens. Finally, select a series of classic compression algorithms which are ranked appropriately to compress the token sequence at different stages. The experiment on the compression ratio and the speed is investigated by means of the Canterbury large corpus. The results show that the proposed scheme is feasible and offers improved performance significantly.