DOI QR코드

DOI QR Code

Text-mining Based Graph Model for Keyword Extraction from Patent Documents

특허 문서로부터 키워드 추출을 위한 위한 텍스트 마이닝 기반 그래프 모델

  • Lee, Soon Geun (Dept. of Industrial & Management Engineering, Gangneung-Wonju National University) ;
  • Leem, Young Moon (Dept. of Industrial & Management Engineering, Gangneung-Wonju National University) ;
  • Um, Wan Sup (Dept. of Industrial & Management Engineering, Gangneung-Wonju National University)
  • 이순근 (강릉대학교 산업경영공학과) ;
  • 임영문 (강릉대학교 산업경영공학과) ;
  • 엄완섭 (강릉대학교 산업경영공학과)
  • Received : 2015.10.20
  • Accepted : 2015.12.04
  • Published : 2015.12.31

Abstract

The increasing interests on patents have led many individuals and companies to apply for many patents in various areas. Applied patents are stored in the forms of electronic documents. The search and categorization for these documents are issues of major fields in data mining. Especially, the keyword extraction by which we retrieve the representative keywords is important. Most of techniques for it is based on vector space model. But this model is simply based on frequency of terms in documents, gives them weights based on their frequency and selects the keywords according to the order of weights. However, this model has the limit that it cannot reflect the relations between keywords. This paper proposes the advanced way to extract the more representative keywords by overcoming this limit. In this way, the proposed model firstly prepares the candidate set using the vector model, then makes the graph which represents the relation in the pair of candidate keywords in the set and selects the keywords based on this relationship graph.

Keywords

References

  1. Coombs, J. E. & Bierly, P. E.(2006), "Measuring technological capability and performance" R&D Management, 36(4):421-438 https://doi.org/10.1111/j.1467-9310.2006.00444.x
  2. Feldman. R., and J. Sanger(2007), "The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data" New York, NY Cambridge University Press.
  3. G. Salton, A. Wong and C. S. Yang(1975), "A vector space model for automatic indexing" Communications of the ACM, 18:613-620 https://doi.org/10.1145/361219.361220
  4. I.V. Wartburg, T. Teichert, K. Rost(2005), "Inventive progress measured by multistage patent citation analysis" Research Policy 34 (10), 1591-1607. https://doi.org/10.1016/j.respol.2005.08.001
  5. Jae Young, Chang(2013), "A study on research trends of graph-based text representations for text mining" The Journal of The Institute of Internet, Broadcasting and Communication 13: No. 5
  6. Jens-Erik Mai(2005), "Analysis in indexing: document and domain centered approaches" Information Processing and Management 41:599-611 https://doi.org/10.1016/j.ipm.2003.12.004
  7. Jiawei Han, Micheline Kamber(2011), "Data mining concepts and techniques" 2nd-edition Morgan Kaufmann press, 614-628
  8. Jo, Taeho, Lee, Malrey, and Gatton, T. M.(2006), "Keyword extraction from documents using a neural network model," ICHIT'06, 2:194-197.
  9. Kao. A. and S. R. Poteet.(2007), "Natural Language Processing and Text Mining" London Springer-Verlag, 1-7
  10. Li, Y.R, Wang, L.H., & Hong, C. F.(2009), "Extracting the significant-rare keywords for patent analysis" Expert System with Applications, 36(6):5200-5204 https://doi.org/10.1016/j.eswa.2008.06.131
  11. Matsuo, Y., and Ishizuka, M.(2004), "Keyword extraction from a single document using word co-occurrence statistical information," International Journal on Artificial Intelligence Tools, 13:157-169. https://doi.org/10.1142/S0218213004001466
  12. Roberston, S.(2004), "Understanding inverse document frequency: On theoretical argument for IDF" Journal of Documentation, 60(5):503-520. https://doi.org/10.1108/00220410410560582
  13. Yu, J. X., Kitsuregawa, M., and Leong, H. V.(2006), "Keyword Extraction using Support Vector Machine," Lecture notes in computer science, 4016:85-96.
  14. Wang, J., Liu, J., Wang, and Cong(2007), "Keyword extraction based on PageRank," Lecture notes in computer science, 857-864.