Mahidol University Logo
Faculty of ICT, Mahidol University
 

Admissions

Printable Version

 

TEXT COMPRESSION BY CHARACTER SEQUENCE ANALYSIS METHOD

 

TITLE TEXT COMPRESSION BY CHARACTER SEQUENCE ANALYSIS METHOD.
AUTHOR SONGKRIT KRITSADEERATTANAMANEE
DEGREE MASTER OF SCIENCE PROGRAMME IN COMPUTER SCIENCE
FACULTY FACULTY OF SCIENCE
ADVISOR DAMRAS WONGSAWANG
CO-ADVISOR THANWADEE SUNETNANTA
 
ABSTRACT
Text compression is one of the interesting topics among groups of researchers due to its wide variety of applications. However, searching directly in the compressed text is still problematic. Many compression schemes that provide search capability have been proposed and implemented by many researchers. This study also aims at obtaining effective search capability. The present work is based on the research entitled "Text Compression Scheme that allows Fast Searching Directly in The Compressed File" by Udi Manber of the University of Arizona. This study tries to develop and improve the above scheme further by having the target of more than 30% compression rate with the capability of direct searching in compressed files. The basic idea of the newly developed scheme, called CSAM (Character Sequence Analysis Method), has been proposed by the researcher of this study. The CSAM scheme is similar to the Udi Manber's scheme in that it still applies the pattern substitution method. However, CSAM looks into texts in more detail and carefully analyzes character sequence appearing in the actual text to find the best substitution. CSAM can achieve more than 30% for the compression saving, in average, while the ability of pattern searching without decompression is still provided. The speed of compression may not be attractive for general applications at this stage of the development. However, this scheme may be suitable for applications which need to read quite often but seldom write. In this Thesis, the CSAM scheme was described, the prototype developed, tested, and implemented. Moreover, the performance and the experimental results have been presented and discussed. Finally, further improvements of the scheme have also been suggested.
KEYWORD TEXT COMPRESSION

 

Go to Top

 

ICT Building, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, Nakhonpathom 73170 Tel. +66 02 441-0909 Fax. +66 02 849-6099
Mahidol University Computing Center, The Faculty of ICT, Mahidol University , Rama 6 Road, Rajathevi, Bangkok 10400 Tel. +66 02 354-4333 Fax. +66 02 354-7333