Browse > Article
http://dx.doi.org/10.12812/ksms.2015.17.4.335

Text-mining Based Graph Model for Keyword Extraction from Patent Documents  

Lee, Soon Geun (Dept. of Industrial & Management Engineering, Gangneung-Wonju National University)
Leem, Young Moon (Dept. of Industrial & Management Engineering, Gangneung-Wonju National University)
Um, Wan Sup (Dept. of Industrial & Management Engineering, Gangneung-Wonju National University)
Publication Information
Journal of the Korea Safety Management & Science / v.17, no.4, 2015 , pp. 335-342 More about this Journal
Abstract
The increasing interests on patents have led many individuals and companies to apply for many patents in various areas. Applied patents are stored in the forms of electronic documents. The search and categorization for these documents are issues of major fields in data mining. Especially, the keyword extraction by which we retrieve the representative keywords is important. Most of techniques for it is based on vector space model. But this model is simply based on frequency of terms in documents, gives them weights based on their frequency and selects the keywords according to the order of weights. However, this model has the limit that it cannot reflect the relations between keywords. This paper proposes the advanced way to extract the more representative keywords by overcoming this limit. In this way, the proposed model firstly prepares the candidate set using the vector model, then makes the graph which represents the relation in the pair of candidate keywords in the set and selects the keywords based on this relationship graph.
Keywords
Relationship graph model; Patents; Keyword extraction; Text mining;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Coombs, J. E. & Bierly, P. E.(2006), "Measuring technological capability and performance" R&D Management, 36(4):421-438   DOI
2 Feldman. R., and J. Sanger(2007), "The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data" New York, NY Cambridge University Press.
3 G. Salton, A. Wong and C. S. Yang(1975), "A vector space model for automatic indexing" Communications of the ACM, 18:613-620   DOI
4 I.V. Wartburg, T. Teichert, K. Rost(2005), "Inventive progress measured by multistage patent citation analysis" Research Policy 34 (10), 1591-1607.   DOI
5 Jae Young, Chang(2013), "A study on research trends of graph-based text representations for text mining" The Journal of The Institute of Internet, Broadcasting and Communication 13: No. 5
6 Jens-Erik Mai(2005), "Analysis in indexing: document and domain centered approaches" Information Processing and Management 41:599-611   DOI
7 Jiawei Han, Micheline Kamber(2011), "Data mining concepts and techniques" 2nd-edition Morgan Kaufmann press, 614-628
8 Jo, Taeho, Lee, Malrey, and Gatton, T. M.(2006), "Keyword extraction from documents using a neural network model," ICHIT'06, 2:194-197.
9 Kao. A. and S. R. Poteet.(2007), "Natural Language Processing and Text Mining" London Springer-Verlag, 1-7
10 Li, Y.R, Wang, L.H., & Hong, C. F.(2009), "Extracting the significant-rare keywords for patent analysis" Expert System with Applications, 36(6):5200-5204   DOI
11 Matsuo, Y., and Ishizuka, M.(2004), "Keyword extraction from a single document using word co-occurrence statistical information," International Journal on Artificial Intelligence Tools, 13:157-169.   DOI
12 Roberston, S.(2004), "Understanding inverse document frequency: On theoretical argument for IDF" Journal of Documentation, 60(5):503-520.   DOI
13 Yu, J. X., Kitsuregawa, M., and Leong, H. V.(2006), "Keyword Extraction using Support Vector Machine," Lecture notes in computer science, 4016:85-96.
14 Wang, J., Liu, J., Wang, and Cong(2007), "Keyword extraction based on PageRank," Lecture notes in computer science, 857-864.