[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jkiice.2018.22.8.1049

Similar Patent Search Service System using Latent Dirichlet Allocation

Lim, HyunKeun (Department of Computer Engineering, Paichai University)
Kim, Jaeyoon (Department of Computer Engineering, Paichai University)
Jung, Hoekyung (Department of Computer Engineering, Paichai University)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.22, no.8, 2018 , pp. 1049-1054 More about this Journal

Abstract

Keyword searching used in the past as a method of finding similar patents, and automated classification by machine learning is using in recently. Keyword searching is a method of analyzing data that is formalized through data refinement. While the accuracy for short text is high, long one consisted of several words like as document that is not able to analyze the meaning contained in sentences. In semantic analysis level, the method of automatic classification is used to classify sentences composed of several words by unstructured data analysis. There was an attempt to find similar documents by combining the two methods. However, it have a problem in the algorithm w the methods of analysis are different ways to use simultaneous unstructured data and regular data. In this paper, we study the method of extracting keywords implied in the document and using the LDA(Latent Semantic Analysis) method to classify documents efficiently without human intervention and finding similar patents.

Keywords

Machine Learning; Document Classification; Similar Patent Search; LDA; Keyword Extract;

Citations & Related Records

Times Cited By KSCI : 4 (Citation Analysis)

Reference
Cited By KSCI

1	K. H. Song, Y. S. Kim, "Automatic Keyword Extraction using Hierarchical Graph Model Based on Word Co-occurrences," Journal of Korean Institute of Information Scientists and Engineers, vol. 44, no. 5, pp. 522-536, May. 2017.
2	S. R. Lim, Y. J. Kwon, "IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents," Journal of Internet Computing and Services, vol. 18, no. 1, pp. 77-88, Feb. 2017. DOI
3	T. H. Jeen, "Patent documents automatic classification with dimension reduced features using latent semantic analysis," M. S. dissertation, Computer and Information Technology, Korea University, Feb. 2014.
4	R. Mehrotra, S. Sanner, W. Buntine, L. Xie, "Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling," ACM Special Interest Group on Information Retrieval, pp. 889-892, Jul. 2013.
5	W. S. Kim, S. Y. Kim, "Document Clustering Technique by K-means Algorithm and PCA," Journal of the Korea Institute of Information and Communication Engineering, vol. 18, no. 3, pp. 625-630, Mar. 2014. DOI
6	Suhendra, I. Ranggadara, "Naive Bayes Algorithm with Chi Square and NGram Feature for Reviewing Laptop Product on Amazon Site," International Research Journal of Computer Science, Issue 12, vol. 4, pp. 28-33, Dec. 2017.
7	J. W. Lee, I. S. Kang, H. K. Jung, "XML Document Keyword Weight Analysis based Paragraph Extraction Model," Journal of the Korea Institute of Information and Communication Engineering, vol. 21, no. 11, pp. 2133-2138, Nov. 2017. DOI

KSCI

Similar Patent Search Service System using Latent Dirichlet Allocation 잠재 의미 분석을 적용한 유사 특허 검색 서비스 시스템

Similar Patent Search Service System using Latent Dirichlet Allocation