Mahidol University Logo
Faculty of ICT, Mahidol University
 

Admissions

Printable Version

 

TEXT COMPRESSION WITH MODIFIED LENGTH INDEX PRESERVING TRANSFORMATION USING SEMI-DYNAMIC AND DYNAMIC DICTIONARY

 

TITLE TEXT COMPRESSION WITH MODIFIED LENGTH INDEX PRESERVING TRANSFORMATION USING SEMI-DYNAMIC AND DYNAMIC DICTIONARY.
AUTHOR KITTI DISSONRAT
DEGREE MASTER OF SCIENCE PROGRAMME IN COMPUTER SCIENCE
FACULTY FACULTY OF SCIENCE
ADVISOR DAMRAS WONSAWANG
CO-ADVISOR SUKANYA PHONGSUPHAP
 
ABSTRACT
This thesis proposes an alternative approach called semi-dynamic and dynamic LIPT or SD/DLIPT to improve ‘lossless data’ compression problems. The researches of ‘lossless data’ compression was developed using highly sophisticated algorithms such as Huffman encoding, Arithmetic encoding, the Lempel-Ziv family, and a new paradigm Burrows-Wheeler Transform (BWT) based algorithms. All of these algorithms have their own strength and weakness. The Length Index Preserving Transformation (LIPT) is one of the most desirable transformation-based algorithms recently introduced. In LIPT, the length of the input words and their relative starting point (called the offset) in the static dictionary are represented by alphabets. The encoding scheme utilizes recurrence of same length of words in the English to create context in the output transformed text. However, the static dictionary of LIPT has three main problems: Firstly, dictionary contains too many words and wastes too much space; Secondly, it is difficult to maintain synchronization between the two dictionaries at host and remote location; and, Lastly, some specific words which are not in the static dictionary, cannot be transformed. The alternative approach proposed here uses a semi-dynamic dictionary which stores exploitable words and spares a block for unavailable words dynamically. The dynamic dictionary is an intelligent dictionary which is constructed on the fly while transforming forward and backward. The concept of Lempel-Ziv Welch or LZW is applied as our transformation approach. The prototype of SD/DLIPT was developed, analyzed and tested with various types of text files from Calgary, Canterbury and Gutenberg Corpus. The results of transformation were later compressed with compression algorithms currently in use such as BZIP2, PKZIP, ARJ, HA, GZIP, Unix Compress, Huffman Coding and Arithmetic Coding. The results showed that compression efficiency in term of compression rates were significantly improved for Huffman Coding and Arithmetic Coding compression algorithms. Efficiency was slightly improved for GZIP, Unix Compression algorithms and the same compression rates were obtained for BZIP2, PKZIP, ARJ and HA. The results from this study suggest that SD/DLIPT may be useful for some applications such as Facsimile G3, JPEG, MPEG, etc. which apply Huffman and Arithmetic coding compression algorithms. This thesis presents in details the SD/DLIPT model including analysis, implementation and experiments. Finally, future research and development recommendations include research into improvements in processing speed as well as more efficient searching, such a Hashing or Binary Searching systems.
KEYWORD TEXT COMPRESSION / LOSSLESS COMPRESSION / TEXT TRANSFORMATION / LIPT / SEMI DYNAMIC DICTIONARY / DYNAMIC DICTIONARY / WORD BASED LZW

 

Go to Top

 

ICT Building, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, Nakhonpathom 73170 Tel. +66 02 441-0909 Fax. +66 02 849-6099
Mahidol University Computing Center, The Faculty of ICT, Mahidol University , Rama 6 Road, Rajathevi, Bangkok 10400 Tel. +66 02 354-4333 Fax. +66 02 354-7333