Mahidol University Logo
Faculty of ICT, Mahidol University
 

Admissions

Printable Version

 

PARSING THAI TEXT WITH SYNTACTIC ANALYSIS USING DIGRAPH REPRESENTATION

 

TITLE PARSING THAI TEXT WITH SYNTACTIC ANALYSIS USING DIGRAPH REPRESENTATION.
AUTHOR ANUGOON CHIMPIPOP
DEGREE MASTER OF SCIENCE PROGRAMME IN COMPUTER SCIENCE
FACULTY FACULTY OF SCIENCE
ADVISOR DAMRAS WONGSAWANG
CO-ADVISOR
 
ABSTRACT
Parsing Thai text is the method to determine the word boundary in the sentence of Thai language. Many parsing methods have been proposed and implemented. However, most of the methods do not take the grammatical of Thai language into consideration. Among them, the longest matching is one of the most effective methods implemented. This thesis proposed the algorithm, called Parsing Thai Text with Syntactic Analysis (PTTSA), applying the longest matching method enhanced by analyzing the Thai grammar. For analyzing the Thai grammatical, we proposed syntactic structure model that is structured from Thai language structure. Digraph is the way to represent our model. The probabilities of segmentation patterns are calculated when each segmentation patterns traverse in digraph. The highest probability is selected to be the best of segmentations. We simulated the test environments and compared the parsing results between our approach with paring without syntactic analysis. We found that our approach gives higher accuracy of paring than parsing without syntactic analysis for most of documents tested. We further found that, the accuracy of parsing results depends on the syntactic structure model, probabilities of edges in digraph, and style of expression in documents. Adjusting the probabilities of edges in digraph, the accuracy of parsing may result in better or worse than that previous one depending on type of documents. Thus, the accuracy of parsing results will be increased only when the probabilities of edges in digraph appropriate with the style of expression in documents. This thesis described PTTSA in detail including the formulation, analysis and implementation of the model. The prototype of PTTSA were developed and tested. The parsing results were presented and discussed. Finally, improvements of the model have been proposed.
KEYWORD PARSING / DICTIONARY / SYNTACTIC STRUCTURE / DIGRAPH / THAI

 

Go to Top

 

ICT Building, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, Nakhonpathom 73170 Tel. +66 02 441-0909 Fax. +66 02 849-6099
Mahidol University Computing Center, The Faculty of ICT, Mahidol University , Rama 6 Road, Rajathevi, Bangkok 10400 Tel. +66 02 354-4333 Fax. +66 02 354-7333