Browse > Article
http://dx.doi.org/10.7236/IJIBC.2022.14.2.55

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model  

Jwa, Myeong-Cheol (Korea University of Technology and Education)
Jwa, Jeong-Woo (Department of Telecommunication Eng., Jeju National University)
Publication Information
International Journal of Internet, Broadcasting and Communication / v.14, no.2, 2022 , pp. 55-62 More about this Journal
Abstract
A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.
Keywords
Tourism Information NER; Smart Tourism Chatbot; KoBERT model; Conditional Random Fields (CRF); pre-trained language models (PLMs);
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Kakao khaiii (Kakao Hangul Analyzer III), https://tech.kakao.com/2018/12/13/khaiii/
2 Guendalina Caldarini, Sardar Jaf, Kenneth McGarry, 'A Literature Survey of Recent Advances in Chatbots', Information vol.13, no.1, 41, 2022. DOI: 10.3390/info13010041   DOI
3 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Eric Michael Smith, Y-Lan Boureau, Jason Weston, 'Recipes for Building an Open-Domain Chatbot', EACL 2021, pp. 300-325. 2021. DOI: 10.18653/v1/2021.eacl-main.24,   DOI
4 Dong-Hyun Kim, Hyeon-Su Im, Jong-Heon Hyeon, Jeong-Woo Jwa, "Development of the Rule-based Smart Tourism Chatbot using Neo4J graph database", International Journal of Internet, Broadcasting and Communication, Vol.13, No.2, pp 179-186, 2021. DOI: 10.7236/IJIBC.2021.13.2.179   DOI
5 Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li, "A Survey on Deep Learning for Named Entity Recognition", IEEE Trans. on Knowledge and Data Eng., pp. 50-70, 2020. DOI:10.1109/TKDE.2020.2981314   DOI
6 https://github.com/kmounlp/NER
7 NAVER NLP Challenge 2018 NER data, https://github.com/naver/nlp-challenge/tree/master/missions/ner
8 Visit Jeju Website, https://www.visitjeju.net/kr
9 JeongWoo Jwa, "Development of Personalized Travel Products for Smart Tour Guidance Services", International Journal of Engineering & Technology, 7 (3.33) 58-61, 2018. DOI: DOI: 10.14419/ijet.v7i3.33.18524   DOI
10 National Institute of the Korean Language NER data , https://corpus.korean.go.kr/
11 SKT KoBERT, https://github.com/SKTBrain/KoBERT
12 Neo4j graph database, https://neo4j.com/
13 Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang, "Pre-trained Models for Natural Language Processing: A Survey", Science China Technological Sciences 63(10), pp.1872-1897, 2020. DOI: 10.1007/s11431-020-1647-3   DOI