DOI QR코드

DOI QR Code

An Experimental Approach of Keyword Extraction in Korean-Chinese Text

국한문 혼용 텍스트 색인어 추출기법 연구 『시사총보』를 중심으로

  • 정유경 (연세대학교 근대한국학연구소) ;
  • 반재유 (연세대학교 근대한국학연구소)
  • Received : 2019.08.18
  • Accepted : 2019.12.13
  • Published : 2019.12.30

Abstract

The aim of this study is to develop a technique for keyword extraction in Korean-Chinese text in the modern period. We considered a Korean morphological analyzer and a particle in classical Chinese as a possible method for this study. We applied our method to the journal "Sisachongbo," employing proper-noun dictionaries and a list of stop words to extract index terms. The results show that our system achieved better performance than a Chinese morphological analyzer in terms of recall and precision. This study is the first research to develop an automatic indexing system in the traditional Korean-Chinese mixed text.

본 연구는 국한문 혼용 텍스트를 대상으로 한글 형태소 분석 기법과 한문 어조사를 반영한 색인어 추출기법을 제안하였다. 국한문 혼용체로 작성된 『시사총보』 논설을 대상으로 해당 시기에 사용된 고유명사 및 한자어 사전을 보완하였으며 한자어 불용어 리스트를 고려하여 색인어를 추출하였다. 본 연구에서 제안한 국한문 색인 시스템은 수작업 색인 결과를 기준으로, 중국어형태소 분석기에 비해 재현율과 정확률 측면에서 상대적으로 높은 성능을 보였으며, 어문법이 확립되지 않은 근현대 시기의 국한문 혼용체를 대상으로 한 첫 번째 색인어 추출기법을 제안하였다는 데에서 연구의 차별점이 있다.

Keywords

Acknowledgement

Supported by : 한국연구재단

본 논문은 2017년 대한민국 교육부와 한국연구재단의 지원을 받아 수행된 연구임(NRF-2017S1A6A3A01079581).

References

  1. Kang, Seung-Sik (2002). Korean morphological analysis and information retrieval. Seoul: Hongrung Publishing Company.
  2. Kang, Seung-Sik, Kwon, Hyuk-Il, & Kim, Dong-Ryul (1995). The role of morphological analysis for korean automatic indexing. Proceedings of the 22th KISS Conference, 22(1), 929-932.
  3. Kim, Chang-Sop (2001). Word-formation in sino-korean and a constraint of the grammar of native korean. Journal of Korean Linguistics, 37, 177-195.
  4. Kim, Chang-Sop (2013). The tensification of initial consonant of the sino-korean suffix -jok(的) and the two-character word and three-character word theory. Journal of Korean Linguistics, 68, 167-188. https://doi.org/10.15811/jkl.2013..68.006
  5. Kim, Pan-Jun (2006). Automatic indexing with controlled vocabulary using a descriptor profile. Proceedings of Korea Society for Information Management Conference, 2006, 153-160.
  6. Park, Ju-Hee, & Myaeng, Seong-Hyeon (2017). A method for establishing korean multi-word concept boundary harnessing dictionaries and sentence segmentation for constructing concept graph. Proceedings of the 44th KISS Conference, 44(1), 651-653.
  7. Lee, Sang-Bok (2012). A study on morphological analysis of Korean (1) - For proper nouns. Korean Languages, 50, 1-35.
  8. Lee, Ik-Seop (1969). Chinese non 1syllable-word, a collection of scholarly papers in celebration of the 60th anniversary of Dr. Jae Won Kim, Eulyoo Publishing, 837-844.
  9. Chung, Yeong-Mi (2012). Research in information retrieval. Seoul: Yonsei University Press.
  10. Ju, Ji-Yeon (2015). A research on the distribution of sino-korean morphemes. Journal of Korean Linguistics, 76, 39-66. https://doi.org/10.15811/jkl.2015..76.002
  11. Ju, Ji-Yeon (2017). To establish the sino-korean morpheme concept (1) - Focusing on the morphological status of 2syllable-word component. The Society for Korean Language & Literary Research, 45(1), 67-98.
  12. Hong, Seong-Hyeok, Kim, Cheol-Su, & Lee, Yong-Seok (1996). Construction of electronic dictionary for morphological analysis in hangul - Hanja mixed sentences. Proceedings of the 23th KISS Conference, 22(1), 23(2A), 541-544.