DOI QR코드

DOI QR Code

Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information

  • Kim, Chang-Geun (Department of Computer Science, Jinju National University) ;
  • Tack, Han-Ho (Department of Electronic Engineering, Jinju National University)
  • Published : 2004.12.01

Abstract

This paper suggests a reverse segmentation algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns is mostly derived from Chinese characters, and it includes some preference patterns utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36,061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory results from the comparative experimentation with other algorithms. Especially, most of the four-syllable or five-syllable compound nouns were successfully segmented without fail.

Keywords

References

  1. K. J. Chen & S. H. Uu, 'Word Identification for Mandarin Chinese Sentences,' Proceedings of the 14th International Confereace on Computational Linguistics, pp. 101-107, 1992
  2. R. Sproat, C. Shih, W. Gale & N. Chang, 'A Stochastic Finite-state Word Segmentation Algorithm for Chinese,' Proceedings of ACL, 1994
  3. K. Yosiyuki & T. Hozumi, 'Analysis of Japanese Compound Nouns Using Collocation Information,' Proceedings of the 15th international Conference on Computational Linguistics, pp. 865 - 869, 1994
  4. T. Hisamitsu & Y. Nitta, 'Analysis of Japanese Compound Nouns by Direct Text Scanning,' Proceedings of the 16th international Coriference on Computational Linguistics, pp. 550 - 555, 1996
  5. Bo-Hyun Yun, Ho Lee & Hae-Chang Rim, 'Analysis of Korean Compound Nouns Using Statistical Information,' Proceedings of the 1995 international Conference on the Computer Processing of Oriental Languages, pp. 7679, 1995
  6. Yi Hyeon-min & Pak Hyeok-ro, 'Reverse Segmentation Algorithm of Compound Nouns,' Korea Information Processing Society, Journal B, No. 8-B, Vol. 4, 2001
  7. Kang Seung-sik, 'Segmentation Algorithm of Korean Compound Nouns,' Korea information Science Society, Journal B, Vol. 25-1, pp. 172-182, 1998
  8. Sim Gwang-seop, 'Parsing of Compound Nouns by Using the Composed Mutual Information,' Korea information Science Society, Journal B, Vol. 24-117, pp. 1307-1317, 1997
  9. Pak Hyeok-Ro & Shin Jung-Ho, 'Analysis of Korean Compound Nouns by Using biverty Leaming Algorithm,' Korea information Science Society, 1997
  10. Sim Gwang-seop, 'Automatic Korean Spacing by Using Mutual Syllable lnfomation,' Korea iriforrnation Science Society, Journal B, Vol. 23-9, pp. 991-1000, 1996
  11. Choe Jae-hyeok, 'Parsing of Korean Compound Nouns According to Syllable Numbers,' 8th Hangeul and Korean Information Processing Seminar Synopsis, pp. 262-267, 1996
  12. Yun Bo-hyeon, Jo Jeong-min & Im Hae-chang, 'Parsing of Korean Compound Nouns by Using Statistical Information and Preference Rules,' Korea information Science Society, Journal B, Vol. 24-8, pp. 925 - 928, 1995