Browse > Article
http://dx.doi.org/10.9708/jksci.2018.23.11.031

A Deeping Learning-based Article- and Paragraph-level Classification  

Kim, Euhee (Computer Science & Engineering, Shinhan University)
Abstract
Text classification has been studied for a long time in the Natural Language Processing field. In this paper, we propose an article- and paragraph-level genre classification system using Word2Vec-based LSTM, GRU, and CNN models for large-scale English corpora. Both article- and paragraph-level classification performed best in accuracy with LSTM, which was followed by GRU and CNN in accuracy performance. Thus, it is to be confirmed that in evaluating the classification performance of LSTM, GRU, and CNN, the word sequential information for articles is better than the word feature extraction for paragraphs when the pre-trained Word2Vec-based word embeddings are used in both deep learning-based article- and paragraph-level classification tasks.
Keywords
Genre Classification; Deep Learning; Word2Vec; Long Short-Term Memory (LSTM); Gated Recurrent Unit (GRU); Convolutional Neural Networks (CNN); Word embedding;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 COCA, https://corpus.byu.edu/coca/
2 Sejong Corpus, https://ithub.korean.go.kr/user/main.do
3 J. Swales, "Genre Analysis: English in Academic and Research Settings," Cambridge University Press, 1990.
4 D. Biber, "Variation across Speech and Writing," Cambridge University Press, 1988.
5 H. Jo, J-H. Kim, S. Yoon, K-M. Kim, and B-T. Zhang, "Large-Scale Text Classification with a Convolutional Neural Network," 42th The Korean Institute of Information Scientists and Engineers Annual Meetings, 2015.
6 D. M. Blei, "Probabilistic Topic Models," Communications of the ACM, Vol. 55, No. 4, 77-84, Apr. 2012.   DOI
7 Z. S. Harris, "Distributional Structure," pp.775-794, Springer, 1997.
8 N. Friedman, D. Geiger, and M. Goldszmidt, "Bayesian Network Classifiers," Machine Learning 29.2-3, pp.131-163, Nov. 1997.   DOI
9 H. Jo, J-H. Kim, K-M. Kim, J-H Chang, J-H. Eom, and B-T. Zhang, "Large-Scale Text Classification with Recurrent Neural Networks," 43th The Korean Institute of Information Scientists and Engineers Annual Meetings, 2016.
10 T. Young, D. Hazarika, S. Poria, E. Cambria, "Recent Trends in Deep Learning Based Natural Language Processing," arXiv:1708.02709, Oct. 2018.
11 T. Mikolov, K. Chen, G. Corrado, J. Dean, "Efficient estimation of word representations in vector space," arXiv:1301.3781, Jan. 2013.
12 Q. Le and T. Mikolov, "Distributed representations of sentences and documents," International Conference on Machine Learning, pp. 1188-1196, Jan. 2014.
13 C. Goller and A. Kuchler, "Learning task-dependent distributed representations by backpropergation through structure," Neural Networks, IEEE International Conference, Vol. 1, 1996.
14 Y. Liu and M. Zhang, "Neural Network Methods for Natural Language Processing", Computational Linguistics, Vol. 44, pp.193-195, Mar. 2018.   DOI
15 S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation 9.8, pp. 1735-1780, Nov. 1997.   DOI
16 K. Cho, et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
17 R. Jozefowicz, W. Zaremba, and I. Sutskever, "An empirical exploration of recurrent network architecture," Proceedings of the 32nd Intenational Conference on Machine Learning, 2015.
18 Y. LeCun and Y. Bengio, "Convoluntional networks for images, speech, and time series," In M. A. Arbib (Ed.), The handbook of brain theory and neural networks, Cambridge, MA: MIT Press, pp. 255-258, 1995.
19 Yoon Kim, "Convoluntional Neural Networks for Sentence Classification", Empirical Methods on Natural Language Proceeding, 2014.
20 E-S. You, G-H. Choi, and S-H. Kim, "Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels", Journal of The Korea Society of Computer and Information, Vol. 20(2), pp. 121-129, Feb. 2015.   DOI
21 J. Park, H. Kim, H-G. Kim, T-K. Ahn, and H. Yi "Structuring of Unstructured 눈 Messages on Rail Services using Deep Learning Techniques", Journal of The Korea Society of Computer and Information, Vol. 23(7), pp. 19-26, Jul. 2018.   DOI