Browse > Article
http://dx.doi.org/10.4275/KSLIS.2019.53.2.179

A Study on Patent Literature Classification Using Distributed Representation of Technical Terms  

Choi, Yunsoo (경기대학교 일반대학원 문헌정보학과)
Choi, Sung-Pil (경기대학교 휴먼인재융합대학 문헌정보학과)
Publication Information
Journal of the Korean Society for Library and Information Science / v.53, no.2, 2019 , pp. 179-199 More about this Journal
Abstract
In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.
Keywords
Patent Literature Classification; Distributed Representation; Word Embedding Vector; Deep Learning;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Kim, Jao-Ho and Choi, Key-Sun. 2005. "Patent Document Categorization based on Semantic Structural Information." Proc. of the 17th Annual Conference on Human and Cognitive Language Technology, 28-34.
2 Park, Chanjeong, Kim, Kiyong and Seong, Dongsu. 2014. "Automatic IPC Classification for Patent Documents of Convergence Technology Using KNN." Journal of Korean Institute of Information Technology, 12(3): 175-185.
3 Lim, Sora and Kwon, Yongjin. 2017. "IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents." Review of Korean Society for Internet Information, 18(1): 77-88.
4 Korean Intellectual Property Office. 2018. Intellectual Property Statistics for 2017. Daejeon: Korean Intellectual Property Office.
5 KIST, Convergence Research Policy Center. 2018. Research and Analysis of National Convergence Technology R & D in 2017. Seoul: KIST, Convergence Research Policy Center.
6 Bahdanau D., Cho, K. and Bengio, Y. 2015. "Neural Machine Translation by Jointly Learning to Align and Translate." In Proceeding of ICLR 2015. [arXiv:1409.0473]
7 Bojanowski, P. et al. 2017. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics, 5: 135-146.   DOI
8 Fall, C. et al. 2003. "Automated categorization in the international patent classification." In Acm Sigir Forum, 37(1): 10-25.   DOI
9 Chen, Y. and Chang, Y. 2012. "A three-phase method for patent classification." Information Processing & Management, 48(6): 1017-1030.   DOI
10 Collobert, R. and Weston, J. 2008. "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning." In Proceeding of the 25th International Conference on Maching Learning.
11 Koster, C. and Seutter, M. 2003. "Taming wild phrases." In Proceedings of the 25th European conference on IR research (ECIR'03), 161-176.
12 Larkey, L. 1999. "A patent search and classification system." In Proceedings of the fourth ACM conference on Digital libraries, 179-187.
13 Mikolov, T., Chen, K., Corrado, G. and Dean, J. 2013. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781.
14 Pennington, J., Socher, R. and Manning, C. 2014. "Glove: Global vectors for word representation." In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.
15 Tikk, D., Biro, G. and Torcsvari, A. 2008. "A hierarchical online classifier for patent categorization." Emerging technologies of text mining: Techniques and applications. IGI Global, 244-267.