References
- Bache, K. and Lichman, M. (2013), UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Science, http://archive.ics.uci.edu/ml.
- Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., and Lai, J. C. (1992), Class-Based N-gram Models of Natural Language, Computational linguistics, 18(4), 467-479.
- Burger, J. D., Henderson, J., Kim, G., and Zarrella, G. (2011), Discriminating Gender on Twitter, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1301-1309, Association for Computational Linguistics.
- Chemudugunta, C. and Steyvers, P. S. M. (2007), Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model, Advances in Neural Information Processing Systems 19 : Proceedings of the 2006 Conference, 19, MIT Press.
- Cho, S. G. and Kim, S. B. (2012), Finding Meaningful Pattern of Key Words in IIE Transactions Using Text Mining, Journal of the Korean Institute of Industrial Engineers, 38(1), 67-73. https://doi.org/10.7232/JKIIE.2012.38.1.067
- da Silva, J. F. and Lopes, G. P. (1999), A Local Maxima Method and a Fair Dispersion Normalization for Extracting Multiword Units, Sixth meeting on the Mathematics of Language, 369-381.
- Feldman, R. and Sanger, J. (2007), The Text Mining Handbook : Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press.
- Ganesan, K., Zhai, C., and Han, J. (2010), Opinosis : A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions, Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China.
- Houvardas, J. and Stamatatos, E. (2006), N-Gram Feature Selection for Authorship Identification, Artificial Intelligence : Methodology, Systems, and Applications, 77-86, Springer Berlin Heidelburg.
- Jing, L., Huang, H., and Shi, H. (2002), Improved Feature Selection Approach TFIDF in Text Mining, Proceedings of the First International Conference on Machine Learning and Cybernetics, 944-946, Beijing, China.
- Li, Y. H. and Jain, A. K. (1998), Classification of Text Documents, The Computer Journal, 41(8), 537-546. https://doi.org/10.1093/comjnl/41.8.537
- Mukherjee, A. and Liu, B. (2010), Improving Gender Classification of Blog Authors, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 207-217, Association for Computational Linguistics.
- Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T. (2000), Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning, 39(2-3), 103-134. https://doi.org/10.1023/A:1007692713085
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay. E. (2011), Scikit-learn : Machine Learning in Python, Journal of Machine Learning Research, 12, 2825-2830.
- Phan, X., Nquyen, L., and Horiguchi. S. (2008), Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections, Proceedings of the 17th International Conference on World Wide Web, 91-100, ACM.
- Python Software Foundation (2010), Python Language Reference, Version 2.7, http://www.python.org/.
- Ramos, J. (2003), Using TF-IDF to Determine Word Relevance in Document Queries, Proceedings of the First Instructional Conference on Machine Learning.
- Salton, G. and McGill, M. J. (1983), Introduction to Modern Information Retrieval, McGraw-Hill Book Company.
- Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., and Chanona-Hernandez, L. (2014), Syntactic N-grams as Machine Learning Features for Natural Language Processing, Expert Systems with Applications, 41, 853-860. https://doi.org/10.1016/j.eswa.2013.08.015
- Silva, J. and Lopes, G. (2010), Towards Automatic Building of Document Keywords, Proceedings of the 23rd International Conference on Computational Linguistics : Posters, 1149-1157, Association for Computational Linguistics.
- Smadja, F., McKeown, K. R., and Hatzivassiloglou, V. (1996), Translating Collocations for Bilingual Lexicons : A Statistical Approach, Computation Linguistics, 22(1), 1-38.
- Tang, B., Shepherd, M., Milios, E., and Heywood, M. I. (2005), Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering, Proceeding of SIAM International Workshop on Feature Selection for Data Mining, 17-26.
- Ting, S. L., Ip, W. H., and Tsang, A. H. C. (2011), Is Naive Bayes a Good Classifier for Document Classification?, International Journal of Software Engineering and Its Applications, 5(3), 37-46.
- Zaki, T., Es-saady, Y., Mammass, D., Ennaji, A., and Nicolas, S. (2014), A Hybrid Method N-Grams-TFIDF with Radial Basis for Indexing and Classification of Arabic Documents, International Journal of Software Engineering and Its Applications, 8(2), 127-144.