Browse > Article
http://dx.doi.org/10.6109/jkiice.2018.22.8.1049

Similar Patent Search Service System using Latent Dirichlet Allocation  

Lim, HyunKeun (Department of Computer Engineering, Paichai University)
Kim, Jaeyoon (Department of Computer Engineering, Paichai University)
Jung, Hoekyung (Department of Computer Engineering, Paichai University)
Abstract
Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.
Keywords
Machine Learning; Document Classification; Similar Patent Search; LDA; Keyword Extract;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 K. H. Song, Y. S. Kim, "Automatic Keyword Extraction using Hierarchical Graph Model Based on Word Co-occurrences," Journal of Korean Institute of Information Scientists and Engineers, vol. 44, no. 5, pp. 522-536, May. 2017.
2 S. R. Lim, Y. J. Kwon, "IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents," Journal of Internet Computing and Services, vol. 18, no. 1, pp. 77-88, Feb. 2017.   DOI
3 T. H. Jeen, "Patent documents automatic classification with dimension reduced features using latent semantic analysis," M. S. dissertation, Computer and Information Technology, Korea University, Feb. 2014.
4 R. Mehrotra, S. Sanner, W. Buntine, L. Xie, "Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling," ACM Special Interest Group on Information Retrieval, pp. 889-892, Jul. 2013.
5 W. S. Kim, S. Y. Kim, "Document Clustering Technique by K-means Algorithm and PCA," Journal of the Korea Institute of Information and Communication Engineering, vol. 18, no. 3, pp. 625-630, Mar. 2014.   DOI
6 Suhendra, I. Ranggadara, "Naive Bayes Algorithm with Chi Square and NGram Feature for Reviewing Laptop Product on Amazon Site," International Research Journal of Computer Science, Issue 12, vol. 4, pp. 28-33, Dec. 2017.
7 J. W. Lee, I. S. Kang, H. K. Jung, "XML Document Keyword Weight Analysis based Paragraph Extraction Model," Journal of the Korea Institute of Information and Communication Engineering, vol. 21, no. 11, pp. 2133-2138, Nov. 2017.   DOI