Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information

Kim, Chang-Geun;Tack, Han-Ho;

doi:10.5391/IJFIS.2004.4.3.306

International Journal of Fuzzy Logic and Intelligent Systems

Volume 4 Issue 3
/
Pages.306-310
/
2004
/
1598-2645(pISSN)
/
2093-744X(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information

Kim, Chang-Geun (Department of Computer Science, Jinju National University) ;
Tack, Han-Ho (Department of Electronic Engineering, Jinju National University)

Published : 2004.12.01

https://doi.org/10.5391/IJFIS.2004.4.3.306 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper suggests a reverse segmentation algorithm using affix information and some preference pattern information of Korean compound nouns. The structure of Korean compound nouns is mostly derived from Chinese characters, and it includes some preference patterns utilized as a segmentation rule in this paper. To evaluate the accuracy of the proposed algorithm, an experiment was performed with 36,061 compound nouns. The experiment resulted in getting 99.3% of correct segmentation and showed excellent satisfactory results from the comparative experimentation with other algorithms. Especially, most of the four-syllable or five-syllable compound nouns were successfully segmented without fail.

Keywords

References

K. J. Chen & S. H. Uu, 'Word Identification for Mandarin Chinese Sentences,' Proceedings of the 14th International Confereace on Computational Linguistics, pp. 101-107, 1992
R. Sproat, C. Shih, W. Gale & N. Chang, 'A Stochastic Finite-state Word Segmentation Algorithm for Chinese,' Proceedings of ACL, 1994
K. Yosiyuki & T. Hozumi, 'Analysis of Japanese Compound Nouns Using Collocation Information,' Proceedings of the 15th international Conference on Computational Linguistics, pp. 865 - 869, 1994
T. Hisamitsu & Y. Nitta, 'Analysis of Japanese Compound Nouns by Direct Text Scanning,' Proceedings of the 16th international Coriference on Computational Linguistics, pp. 550 - 555, 1996
Bo-Hyun Yun, Ho Lee & Hae-Chang Rim, 'Analysis of Korean Compound Nouns Using Statistical Information,' Proceedings of the 1995 international Conference on the Computer Processing of Oriental Languages, pp. 7679, 1995
Yi Hyeon-min & Pak Hyeok-ro, 'Reverse Segmentation Algorithm of Compound Nouns,' Korea Information Processing Society, Journal B, No. 8-B, Vol. 4, 2001
Kang Seung-sik, 'Segmentation Algorithm of Korean Compound Nouns,' Korea information Science Society, Journal B, Vol. 25-1, pp. 172-182, 1998
Sim Gwang-seop, 'Parsing of Compound Nouns by Using the Composed Mutual Information,' Korea information Science Society, Journal B, Vol. 24-117, pp. 1307-1317, 1997
Pak Hyeok-Ro & Shin Jung-Ho, 'Analysis of Korean Compound Nouns by Using biverty Leaming Algorithm,' Korea information Science Society, 1997
Sim Gwang-seop, 'Automatic Korean Spacing by Using Mutual Syllable lnfomation,' Korea iriforrnation Science Society, Journal B, Vol. 23-9, pp. 991-1000, 1996
Choe Jae-hyeok, 'Parsing of Korean Compound Nouns According to Syllable Numbers,' 8th Hangeul and Korean Information Processing Seminar Synopsis, pp. 262-267, 1996
Yun Bo-hyeon, Jo Jeong-min & Im Hae-chang, 'Parsing of Korean Compound Nouns by Using Statistical Information and Preference Rules,' Korea information Science Society, Journal B, Vol. 24-8, pp. 925 - 928, 1995

International Journal of Fuzzy Logic and Intelligent Systems

Design and Implementation of the Compound Noun Segmentation Algorithm Based on Statistical Information

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)