Browse > Article
http://dx.doi.org/10.3745/KTSDE.2020.9.4.145

Korean Machine Reading Comprehension for Patent Consultation Using BERT  

Min, Jae-Ok (한국특허정보원 R&D센터 연구개발파트)
Park, Jin-Woo (한국특허정보원 R&D센터)
Jo, Yu-Jeong (한국특허정보원 R&D센터)
Lee, Bong-Gun (한국특허정보원 특허넷응용팀 특허넷응용팀)
Publication Information
KIPS Transactions on Software and Data Engineering / v.9, no.4, 2020 , pp. 145-152 More about this Journal
Abstract
MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.
Keywords
Natural Language Processing; MRC; Machine Reading Comprehension; Patent; BERT;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100,000+ questions for machine comprehension of text," arXiv preprint arXiv:1606.05250, 2016.
2 S. Lim, M. Kim, and J. Lee, "KorQuAD: Korean QA Dataset for Machine Comprehension," in Proceedings of the Korea Software Congress 2018, pp.539-541, 2018.
3 D. Jacob, C. Ming-Wei, L. Kenton, and T. Kristina, "Bert: pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
4 A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, "Glue: A multi-task benchmark and analysis platform for natural language understanding," arXiv preprint arXiv:1804.07461, 2018.
5 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and L. Kaiser, "Attention is all you need," Advances in Neural Information Processing Systems. 2017.
6 K. H. Park, S. H. Na, Y.S. Choi, and D. S. Chang, "BERT and Multi-level Co-Attention Fusion for Machine Reading Comprehension," in Proceedings of the Korea Software Congress 2019, pp.643-645, 2019.
7 D. Lee, C. Park, C. Lee, S. Park, S. Lim, M. Kim, and J. Lee, "Korean Machine Reading Comprehension using BERT," in Proceedings of the Korea Computer Congress 2019, pp.557-559, 2019.
8 T. Lei, Y. Zhang, S.I. Wang, H. Dai, and Y. Artzi. "Simple Recurrent Units for highly Parallelizable Recurrence," arXiv:1709.02755v5, 2018.
9 Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "XLNet: Generalized Autoregressive Pretraining for Language Understanding," arXiv preprint arXiv: 1906.08237, 2019.
10 Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "Albert: A lite bert for self-supervised learning of language representations," arXiv preprint arXiv:1909.11942, 2019.
11 Y. Wu, M. Schuster, Z. Chen, Q, V. Le, and M. Norouzi, "Google's neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
12 D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.