DOI QR코드

DOI QR Code

Effective Thematic Words Extraction from a Book using Compound Noun Phrase Synthesis Method

  • 투고 : 2017.03.07
  • 심사 : 2017.03.23
  • 발행 : 2017.03.31

초록

Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result.

키워드

참고문헌

  1. H. Shin, U. Yun, and K. H. Ryu, "Efficient Blog Retrieval System by Topic-based Weighting," Journal of the Korea Society of Computer and Information, Vol. 15, No. 4, pp.1-9, Apr. 2010. https://doi.org/10.9708/jksci.2010.15.4.001
  2. S. Lee and H. J. Kim, "Keyword Extraction from News Corpus using Modified TF-IDF," The Jounal of Society for e-Business Studies, Vol. 44, No. 4, pp.59-73, Nov. 22009.
  3. S. H. Han, "A Study on Keyword Extraction From a Single Document Using Term Clustering," Journal of the Korean Society for Library and Information Science, Vol. 44, No. 3, pp.155-173, Aug. 2010. https://doi.org/10.4275/KSLIS.2010.44.3.155
  4. H. J. Ahn, G. H. Choi, and S. H. Kim, "Thematic Word Extraction from Book Based on Keyword Weighting Method," Proceedings of the Korean Society of Computer Information Conference, Vol. 23, No. 1, pp.19-22, Jan. 2015.
  5. J. Cho and E. Paek, "Performance Improvements in Keyphrase Extraction via Candidate Phrase Selection Based on Natural Language Processing Techniques," Korean Institute of Information Scientists and Engineers, Vol. 40, pp.729-731, Nov. 2013.
  6. E. S. You, G. H. Choi, and S. H. Kim. "Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels," Journal of the Korea Society of Computer and Information, Vol. 20, No. 2, pp.121-129, Feb. 2015. https://doi.org/10.9708/jksci.2015.20.2.121
  7. H. Won, M. Park and G. Lee, "Integrated Indexing Method using Compound Noun Segmentation and Noun Phrase Synthesis," Journal of KISS : Software and Applications, Vol. 27, No. 1, pp84-95, Jan. 2000.
  8. K. Son and S. Lee, "Weighting Methods for compound Nouns in Patent Retrieval System," Korean Institute of Information Scientists and Engineers, Vol.31, Issue 1, pp.895-897, Apr. 2004.
  9. C. E. Park, B. Ryu, and S.B. Kim, "A Segmentation Method of Compound Nouns Using Syllable Preference," Journal of Korea Multimedia Society, Vol. 9, No. 2, pp151-159, Feb. 2006.
  10. M. H. Cho and D. H. Jeong, "A Method Of Compound Noun Phrase Indexing for Resolving Syntactic Diversity," The Journal of the Korea Contents Association, Vol. 11, No. 3, pp.467-476, Mar. 2011. https://doi.org/10.5392/JKCA.2011.11.3.467
  11. S. S. Kang, H. Lee, S. H. Son, G. C. Hong, and B. J. Moon, "Term Weighting Method by Postposition and Compound Noun Recognition," Korean Institute of Information Scientists and Engineers, Vol. 28, No. 2, pp.196-198, Oct. 2001.
  12. HANNANUM, http://semanticweb.kaist.ac.kr/hannanum/
  13. KKMA, http://kkma.snu.ac.kr/
  14. H. B. Lim, "Discourse-pragmatic notion of topic and syntactic analysis in Korea,", Seoul National University Press, 2007.
  15. Y. Jun, "On 'i/ka' as a Topic Marker," Discourse and Cognition, Vol. 16, No. 3, pp.217-238. Dec. 2009. https://doi.org/10.15718/discog.2009.16.3.217
  16. D. H. Pak, "How to analyse and teach Korean specific particles 'i/ka' and 'eul/leul'," Foreign languages education. Vol. 14 No. 2, Jun. 2007.
  17. I. S. Choe and Y. M. Chung. "A Study on an Automatic Summarization System Using Verb-Based Sentence Patterns," Journal of the Korean Society for Information Management, Vol. 18. No. 4, pp.37-55. Dec. 2001.
  18. Google Books, https://books.google.co.kr/