• Title/Summary/Keyword: 비격식 문서

Search Result 1, Processing Time 0.015 seconds

Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification (비격식 문서 분류 성능 개선을 위한 LDA 단어 분포 기반의 자질 확장)

  • Lee, Hokyung;Yang, Seon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.1008-1014
    • /
    • 2016
  • Data such as Twitter, Facebook, and customer reviews belong to the informal document group, whereas, newspapers that have grammar correction step belong to the formal document group. Finding consistent rules or patterns in informal documents is difficult, as compared to formal documents. Hence, there is a need for additional approaches to improve informal document analysis. In this study, we classified Twitter data, a representative informal document, into ten categories. To improve performance, we revised and expanded features based on LDA(Latent Dirichlet allocation) word distribution. Using LDA top-ranked words, the other words were separated or bundled, and the feature set was thus expanded repeatedly. Finally, we conducted document classification with the expanded features. Experimental results indicated that the proposed method improved the micro-averaged F1-score of 7.11%p, as compared to the results before the feature expansion step.