A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

Kim, Hee sook;Lee, Min Hi;

doi:10.17703/IJACT.2017.5.1.83

International Journal of Advanced Culture Technology

제5권1호
/
Pages.83-89
/
2017
/
2288-7202(pISSN)
/
2288-7318(eISSN)

국제문화기술진흥원 (The International Promotion Agency of Culture Technology)

DOI QR Code

A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

Kim, Hee sook (Department of Computer Information, Inchon Campus of Korea Polytechnic) ;
Lee, Min Hi (Department of Architecture, Howon University)

투고 : 2017.02.03
심사 : 2017.03.04
발행 : 2017.03.31

https://doi.org/10.17703/IJACT.2017.5.1.83 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.

키워드

참고문헌

H. Drucker, D. Wu, and V.N. Vapnik, "Support Vector Machines for Spam Classification," IEEE Transactions on Neural Networks, Vol. 10, No. 5, pp. 1048-1054, Sept 5, Sep. 1999. https://doi.org/10.1109/72.788645
A. Kolcz and J. Alspector. "SVM-Based Filtering of E-mail Spam with Content-Specific Misclassification Costs", in Proc. of the Workshop on Text Mining (TextDM'01), 2001.
Deerwester, Dumais, Furnas, Lanouauer, and Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science, Vol. 41, No. 6, pp. 391-407, 1990. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol. 9, No. 8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
Rie Johnson and Tong Zhang, "Effective Use of Word Order for Text Categorization with Convolutional Neural Network," In NAACL HLT, 2015.
Rie Johnson and Tong Zhang, "Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings," in Proc. 33rd International Conference on Machine Learning, Vol. 48, 2016.
A.L. Maas, R.E Daly, P.T. Pham et al., "Learning Word Vectors for Sentiment Analysis," in Proc.49th Annual Meeting of the Association for Computational Linguistics, pp. 142-150, June 19-24, 2011.
K.S. Tai, R. Socher, and C.D. Manning, "Improved Semantic Representation from Tree-Structured Long Short-Term Memory Networks," in Proc.53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing, pp. 1556-1566, July 26-31, 2015.
Z. Wojciech and I. Sutskever, "Learning to Execute," under review as a Conference Paper at 5th International Conference on Learning Representations (ICLR), May 7 - 9, 2015.
I. Vulic and M. Moens, "Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings," in Proc.38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 363-372, Aug. 9-13, 2015.
K.S. Jones, "A Statistical Interpretation of Term Specificity and Its Application in Retrieval," Journal of Documentation, Vol. 60, No. 5, pp. 493-502, 2004. https://doi.org/10.1108/00220410410560573
V. Metsis, I. Androutsopoulos, and G. Paliouras, "Spam Filtering with Naive Bayes - Which Naive Bayes?," in Proc. 3rd Conf. Email and Anti-Spam, July 27-28, 2006.
J. Pennington, R.Socher, and C.D. Manning, "GloveL Global Vectors for Word Representation," in Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543,October 25-29, 2014.
G. Tzortzis and A. Likas, "Deep Belief Networks for Spam Filtering", in Proc. 19th IEEE International Conference on Tools with Artificial Intelligence, pp. 306-309, Oct. 29-31, 2007.

International Journal of Advanced Culture Technology

A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)