ABSTRACT |
The problem of finding a center string that is close to every given string is called the closest string problem. This problem is defined as: Given a set of strings S={s1,s2,กญ,sn} each of length m, the problem is to find the smallest d and a string s of length m, which is within Hamming distance d to each si กส S. The problem arises in computational molecular biology in identifying genetic drug and generating genetic probes. This thesis proposed an algorithm to solve the closest string problem. The algorithm starts with generating all possible permuted strings of length m. Then, it eliminates strings which are not candidate solutions using many elimination rules. Next, Hamming distances are calculated and compared. Then, the best solutions are finally selected. Three versions of the proposed algorithm have been developed: Finding Closest String Matching Using Array (FCMA), Finding Closest String Matching Using Text File (FCMT) and Finding Closest String Matching Using Database File (FCMD). All of them process in a similar manner, except the storage devices and programming techniques used. FMCA uses only main memory, so it can run faster but itกฏs suitable for small problems. FCMT and FCMD use a secondary storage device to store an enormous number of permuted strings. They can solve larger problems with slower processing time. From the experiments, we found that the proposed algorithm can run in very much less than exponential time. All three versions of the proposed algorithm can find all exact solutions in all cases of m, n ก� 20 with reasonable processing time. We also investigated and compared with similar applications currently in use called, BLAST and FASTA. However, a direct comparison can not be performed. |