Search | Korea Science

Nam, Gyu-Hyeon;Lee, Hyun-Young;Kang, Seung-Shik
- Journal of Multimedia Information System
- /
- v.9 no.2
- /
- pp.93-102
- /
- 2022
Dependency parsing is a decision problem of the syntactic relation between words in a sentence. Recently, deep learning models are used for dependency parsing based on the word representations in a continuous vector space. However, it causes a mislabeled tagging problem for the proper nouns that rarely appear in the training corpus because it is difficult to express out-of-vocabulary (OOV) words in a continuous vector space. To solve the OOV problem in dependency parsing, we explored the proper noun embedding method according to the embedding unit. Before representing words in a continuous vector space, we replace the proper nouns with a special token and train them for the contextual features by using the multi-layer bidirectional LSTM. Two models of the syllable-based and morpheme-based unit are proposed for proper noun embedding and the performance of the dependency parsing is more improved in the ensemble model than each syllable and morpheme embedding model. The experimental results showed that our ensemble model improved 1.69%p in UAS and 2.17%p in LAS than the same arc-eager approach-based Malt parser.
https://doi.org/10.33851/JMIS.2022.9.2.93 인용 PDF KSCI HTML

Lee, Hyun Young;Kang, Seung Shik
- Smart Media Journal
- /
- v.8 no.2
- /
- pp.74-79
- /
- 2019
Traditional compound noun decomposition algorithms often face challenges of decomposing compound nouns into separated nouns when unregistered unit noun is included. It is very difficult for those traditional approach to handle such issues because it is impossible to register all existing unit nouns into the dictionary such as proper nouns, coined words, and foreign words in advance. In this paper, in order to solve this problem, compound noun decomposition problem is defined as tag sequence labeling problem and compound noun decomposition method to use syllable unit embedding and deep learning technique is proposed. To recognize unregistered unit nouns without constructing unit noun dictionary, compound nouns are decomposed into unit nouns by using LSTM and linear-chain CRF expressing each syllable that constitutes a compound noun in the continuous vector space.
https://doi.org/10.30693/SMJ.2019.8.2.74 인용 PDF KSCI