Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

Geonu Kim;Jungyeon Jang;Juwon Lee;Kitae Kim;Woonyoung Yeo;Jong Woo Kim;

doi:10.14329/apjis.2019.29.4.771

Asia pacific journal of information systems

Volume 29 Issue 4
/
Pages.771-788
/
2019
/
2288-5404(pISSN)
/
2288-6818(eISSN)

The Korea Society of Management Information Systems (한국경영정보학회)

DOI QR Code

Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

Geonu Kim (School of Business, Hanyang University) ;
Jungyeon Jang (Hyundai Motor Company) ;
Juwon Lee (Korea Ratings) ;
Kitae Kim (Hana Institute of Finance, KEB Hana Bank) ;
Woonyoung Yeo (Business Informatics from Graduate School, Hanyang University) ;
Jong Woo Kim (School of Business, Hanyang University)

Received : 2019.03.03
Accepted : 2019.09.02
Published : 2019.12.31

https://doi.org/10.14329/apjis.2019.29.4.771 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Deep learning techniques such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) show superior performance in text classification than traditional approaches such as Support Vector Machines (SVMs) and Naïve Bayesian approaches. When using CNNs for text classification tasks, word embedding or character embedding is a step to transform words or characters to fixed size vectors before feeding them into convolutional layers. In this paper, we propose a parallel word-level and character-level embedding approach in CNNs for text classification. The proposed approach can capture word-level and character-level patterns concurrently in CNNs. To show the usefulness of proposed approach, we perform experiments with two English and three Korean text datasets. The experimental results show that character-level embedding works better in Korean and word-level embedding performs well in English. Also the experimental results reveal that the proposed approach provides better performance than traditional CNNs with word-level embedding or character-level embedding in both Korean and English documents. From more detail investigation, we find that the proposed approach tends to perform better when there is relatively small amount of data comparing to the traditional embedding approaches.

Keywords

Acknowledgement

This work was supported by 'Big Intelligence Business Education based on Business Laboratory Project (CK2)'. (Project ID: 2016928290)

References

Boureau, Y. L., Ponce, J., and LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In Proceedings of ICML'10 Proceedings of the 27th International Conference on International Conference on Machine Learning, 111-118.
Chen, J., Huang, H., Tian, S., and Qu, Y. (2009). Feature selection for text classification with Naive Bayes. In Expert Systems with Applications, 26(3), 5432-5435. https://doi.org/10.1016/j.eswa.2008.06.054
Chen, T., Xu, R., He, Y., and Wang, X. (2017). Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. In Expert Systems with Applications, 72, 221-230. https://doi.org/10.1016/j.eswa.2016.10.065
Chung, T., and Gildea, D. (2009). Unsupervised tokenization for machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2, 718-726.
Dos Santos, C., and Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 69-78.
Gardner, M. W., and Dorling, S. R. (1998). Artificial neural networks (the multilayer perceptron): A review of applications in the atmospheric sciences. In Atmospheric Environment, 32(14-15), 2627-2636. https://doi.org/10.1016/S1352-2310(97)00447-0
Gers, F. A., Schmidhuber, J., and Cummins, F. (1999). Learning to forget: Continual prediction with LSTM. 9th International Conference on Artificial Neural Networks.
Gunn, S. R. (1998). Support vector machines for classification and regression. ISSI technical report, 66.
Hatzivassiloglou, V., and McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, 174-181.
Ioffe, S., and Szegedy, C. (2014). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv: 1502. 03167.
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning, 137-142.
Kang, M., Ahn, J., and Lee, K. (2018). Opinion mining using ensemble text hidden Markov models for text classification. In Expert Systems with Applications, 94, 218-227. https://doi.org/10.1016/j.eswa.2017.07.019
Kang, H., Yoo, S. J., and Han, D. (2012). Senti-lexicon and improved Naive Bayes algorithms for sentiment analysis of restaurant reviews. In Expert Systems with Applications, 39, 6000-6010. https://doi.org/10.1016/j.eswa.2011.11.107
Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408. 5882.
Kim, Y., Jernite, Y., Sontag, D., and Rush, A. M. (2016). Character-aware neural language models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI).
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
Li, C. H., and Park, S. C. (2009). An efficient document classification model using an improved back propagation neural network and singular value decomposition. In Expert Systems with Application, 36, 3208-3215. https://doi.org/10.1016/j.eswa.2008.01.014
Li, F. (2010). The information content of forward-looking statements in corporate filings-a Naive Bayesian machine learning approach. Journal of Accounting Research, 1049-1102.
Liang, D., Xu, W., and Zhao, Y. (2017). Combining word-level and character-level representations for relation classification of informal text. In Proceedings of the 2nd Workshop on Representation Learning of NLP, 43-47.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., and Potts, C. (2011). Learning word vectors for sentiment analysis. The 49th Annual Meeting of the Association for Computational Linguistics.
McAuley, J., and Leskovec, J. (2013). Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, 165-172.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS.
Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of EMNLP, 10, 79-86.
Rana, S., and Singh, A. (2016). Comparative analysis of sentiment orientation using SVM and Naive Bayes technique. 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), 106-111.
Sebastiani, F. (2002). Machine learning in automated text categorization. Published in Journal ACM Computing Surveys (CSUR), 34(1), 1-47. https://doi.org/10.1145/505282.505283
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2016). Grad-CAM: Visual explanations from deep networks via gradient-based localization. arXiv:1610.02391, 24.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929-1958.
Trindade, L., Wang, H., Blackburn, W., and Taylor, P. S. (2014). Enhanced factored sequence kernel for sentiment classification. Web Intelligence (WI) and Intelligent Agent Technologies (IAT) 2014 IEEE/WIC/ACM International Joint Conferences, 2, 519-525.
Turney, P. D., and Littman, M. L. (2002). Unsupervised learning of semantic orientation from a hundred-billion-word corpus. arXiv:cs/0212012, 11.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. (2016). Hierarchical attention networks for document classification. Proceedings of NAACL-HLT, 1480-1489.
Yi, K., and Beheshti, J. (2009). A hidden Markov model-based text classification of medical documents. In Journal of Information Science.
Young, T., Hazarika, D., Poria, S., and Cambria, E. (2018). Recent trends in deep learning based natural language processing. arXiv:1708.02709v5, 24.
Yousefi-Azar, M., and Hamey, L. (2017). Text summarization using unsupervised deep learning. In Expert Systems with Applications, 68, 93-105. https://doi.org/10.1016/j.eswa.2016.10.017
Zeng, D., Liu, K., Lai, S., Zhou, G., and Zhao, J. (2014). Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), 2335-2344.
Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level convolutional networks for text classification. In Proceedings of Neural Information Processing Systems (NIPS).
Zhang, Y., and Wallace, B. (2016). A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification. In Compitation and Language.

Asia pacific journal of information systems

Text Classification Using Parallel Word-level and Character-level Embeddings in Convolutional Neural Networks

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)