Search | Korea Science

Choi, Maeng-Sik;Kim, Hark-Soo
- The KIPS Transactions:PartB
- /
- v.18B no.1
- /
- pp.45-50
- /
- 2011
Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.
https://doi.org/10.3745/KIPSTB.2011.18B.1.045 인용 PDF KSCI

Kim, Bogyum;Lee, Jae Sung
- Journal of KIISE
- /
- v.43 no.4
- /
- pp.430-436
- /
- 2016
Processing of unknown words such as proper nouns and newly coined words is important for a morphological analyzer to process documents in various domains. In this study, a segmentation and tagging method for unknown Korean words is proposed for the 3-step probabilistic morphological analysis. For guessing unknown word, it uses rich suffixes that are attached to open class words, such as general nouns and proper nouns. We propose a method to learn the suffix patterns from a morpheme tagged corpus, and calculate their probabilities for unknown open word segmentation and tagging in the probabilistic morphological analysis model. Results of the experiment showed that the performance of unknown word processing is greatly improved in the documents containing many unregistered words.
https://doi.org/10.5626/JOK.2016.43.4.430 인용 PDF KSCI