Acknowledgement
This work was supported by the 2020 sabbatical year research grant of KoreaTech.
References
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, San Francisco, pp. 4117-4186, 2019.
- Z. Yang, D. Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "XLNet: Generalized Autoregressive Pretraining for Language Understanding," in Proceedings of the 33rd Conference on Neural Information Processing System (NeurIPS), Vancouver, pp. 5754-5764, 2019.
- Wikipedia, Wikipedia Dump Data [Online]. Available: https://www.wikipedia.org/.
- P. W. Park, "Text-CNN Based Intent Classification Method for Automatic Input of Intent Sentences in Chatbot," Journal of Korean Institute of Information Technology, vol. 18, no. 1, pp. 19-25, Jan. 2020. https://doi.org/10.14801/jkiit.2020.18.1.19
- J. M. Kim and J. H. Lee, "Text Document Classification Based on Recurrent Neural Network Using Word2vec," Journal of Korean Institute of Intelligent Systems, vol. 27, no. 6, pp. 560-565, Dec. 2017. https://doi.org/10.5391/JKIIS.2017.27.6.560
- H. J. Jeon and C. Koh, "Text Extraction Algorithm using the HTML Logical Structure Analysis," The KDCS Transactions, vol. 16, no. 3, pp. 445-455, Jun. 2015.
- N. Utiu and V. S. Lonescu, "Learning Web Content Extraction with DOM Features," in Proceedings of the 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing(ICCP), Doha, pp. 1724-1734, 2014.
- B. D. Nguyen-Hoang, B. T. Pham-Hong, Y. Jin, and P. T. V. Le, "Genre-Oriented Web Content Extraction with Deep Convolutional Neural Networks and Statistical Methods," in Proceedings of 32nd Pacific Asia Conference on Language, Information and Computation, Hong Kong, pp. 476-485, 2018.
- Huggingface. Transformers [Internet]. Available: https://www.github.com/huggingface/.