Browse > Article
http://dx.doi.org/10.7472/jksii.2017.18.1.77

IPC Multi-label Classification based on Functional Characteristics of Fields in Patent Documents  

Lim, Sora (Dept. of Telecommunication and Information Engineering, Korea Aerospace University)
Kwon, YongJin (Dept. of Telecommunication and Information Engineering, Korea Aerospace University)
Publication Information
Journal of Internet Computing and Services / v.18, no.1, 2017 , pp. 77-88 More about this Journal
Abstract
Recently, with the advent of knowledge based society where information and knowledge make values, patents which are the representative form of intellectual property have become important, and the number of the patents follows growing trends. Thus, it needs to classify the patents depending on the technological topic of the invention appropriately in order to use a vast amount of the patent information effectively. IPC (International Patent Classification) is widely used for this situation. Researches about IPC automatic classification have been studied using data mining and machine learning algorithms to improve current IPC classification task which categorizes patent documents by hand. However, most of the previous researches have focused on applying various existing machine learning methods to the patent documents rather than considering on the characteristics of the data or the structure of patent documents. In this paper, therefore, we propose to use two structural fields, technical field and background, considered as having impacts on the patent classification, where the two field are selected by applying of the characteristics of patent documents and the role of the structural fields. We also construct multi-label classification model to reflect what a patent document could have multiple IPCs. Furthermore, we propose a method to classify patent documents at the IPC subclass level comprised of 630 categories so that we investigate the possibility of applying the IPC multi-label classification model into the real field. The effect of structural fields of patent documents are examined using 564,793 registered patents in Korea, and 87.2% precision is obtained in the case of using title, abstract, claims, technical field and background. From this sequence, we verify that the technical field and background have an important role in improving the precision of IPC multi-label classification in IPC subclass level.
Keywords
Patent classification; IPC Classification; Patent Document Fields; Field function; Multi-label classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 "Intellectual Property Statistics for 2014," Korean Intellectual Property Office, ISSN 2092-5417, 2015.
2 International Patent Classification Guide, http://www.kipo.go.kr/kpo/user.tdf?a=user.html.HtmlApp&c=4030 4&catmenu=m06_07_02_05&year=2015&ver=01
3 "Guidelines for Examination," Korean Intellectual Property Office, ISSN 2092-8866.
4 C.J. Fall, A. Torcsvari, K.Benzineb, G. Karetka, "Automated Categorization in the International Patent Classification," In ACM SIGIR forum, April 2003, vol. 37(1), pp. 10-25. http://dx.doi.org/10.1145/945546.945547   DOI
5 LS. Larkey, "A Patent Search and Classification System," In the 4th ACM Conference on Digital Libraries, pages 19-87, Berkeley, CA, August 99. http://dx.doi.org/10.1145/313238.313304   DOI
6 D. Tikk, G. Biro, A. Törcsvari, "A Hierarchical Online Classifier for Patent Categorization," In Emerging Technologies of Text mining: Techniques and Applications (2007), pp. 244-267. https://doi.org/10.4018/9781599043739.ch012   DOI
7 Y.-L. Chen, Y.-C. Chang, "A three-phase method for patent classification," Information Processing and Management, Vol. 48, no. 6, pp. 1017-1030, 2012. https://doi.org/10.1016/j.ipm.2011.11.001   DOI
8 D. Seneviratne, S. Geva, G. Zuccon, and G. Ferraro, "A Signature Approach to Patent Classification," Information Retrieval Technology Vol. 9460, pp. 413-419, 2016. https://doi.org/10.1007/978-3-319-28940-3_35   DOI
9 C. Park, K. Kim, and D. Seong, "Automatic IPC Classification for Patent Documents of Convergence Technology Using KNN," Journal of KIIT. Vol. 12, no. 3, pp. 175-185, Mar. 2014. https://doi.org/10.14801/kiitr.2014.12.3.175   DOI
10 J. Kim, K. Choi, "Patent Document Categorization based on Semantic Structural Information," In Proc. of the 17th Annual Conference on Human and Cognitive Language Technology, pp. 28-34, 2005. http://www.dbpia.co.kr/Article/NODE01065130
11 KIPRIS (Korea Intellectual Property Rights Information Service) plus, http://plus.kipris.or.kr/
12 KLT2000, Korean Morphologigal Analyzer, http://nlp.kookmin.ac.kr/
13 A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, "Multinomial naive bayes for text categorization revisited," In Seventh Australian joint conference on artificial intelligence, Springer, Berlin, pp. 488-499, 2004. https://doi.org/10.1007/978-3-540-30549-1_43   DOI