Mahidol University Logo
Faculty of ICT, Mahidol University
 

Admissions

Printable Version

 

CLOSEST STRING MATCHING

 

TITLE CLOSEST STRING MATCHING.
AUTHOR TANACHAI PATHOMRAT
DEGREE MASTER OF SCIENCE PROGRAMME IN COMPUTER SCIENCE
FACULTY FACULTY OF SCIENCE
ADVISOR DAMRAS WONGSWANG
CO-ADVISOR SUKANYA PHONGSUPHAP
 
ABSTRACT
The problem of finding a center string that is close to every given string is called the closest string problem. This problem is defined as: Given a set of strings S={s1,s2,กญ,sn} each of length m, the problem is to find the smallest d and a string s of length m, which is within Hamming distance d to each si กส S. The problem arises in computational molecular biology in identifying genetic drug and generating genetic probes. This thesis proposed an algorithm to solve the closest string problem. The algorithm starts with generating all possible permuted strings of length m. Then, it eliminates strings which are not candidate solutions using many elimination rules. Next, Hamming distances are calculated and compared. Then, the best solutions are finally selected. Three versions of the proposed algorithm have been developed: Finding Closest String Matching Using Array (FCMA), Finding Closest String Matching Using Text File (FCMT) and Finding Closest String Matching Using Database File (FCMD). All of them process in a similar manner, except the storage devices and programming techniques used. FMCA uses only main memory, so it can run faster but itกฏs suitable for small problems. FCMT and FCMD use a secondary storage device to store an enormous number of permuted strings. They can solve larger problems with slower processing time. From the experiments, we found that the proposed algorithm can run in very much less than exponential time. All three versions of the proposed algorithm can find all exact solutions in all cases of m, n ก� 20 with reasonable processing time. We also investigated and compared with similar applications currently in use called, BLAST and FASTA. However, a direct comparison can not be performed.
KEYWORD CLOSEST STRING / HAMMING DISTANCE / PERMUTATION / STRING MATCHING ALGORITHM / APPROXIMATE STRING MATCHING

 

Go to Top

 

ICT Building, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, Nakhonpathom 73170 Tel. +66 02 441-0909 Fax. +66 02 849-6099
Mahidol University Computing Center, The Faculty of ICT, Mahidol University , Rama 6 Road, Rajathevi, Bangkok 10400 Tel. +66 02 354-4333 Fax. +66 02 354-7333