Browse > Article
http://dx.doi.org/10.9708/jksci.2014.19.3.017

Word Cluster-based Mobile Application Categorization  

Heo, Jeongman (Dept. of Game Design & Development, SangMyung University)
Park, So-Young (Dept. of Game Design & Development, SangMyung University)
Abstract
In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.
Keywords
Mobile Application; Categorization; Word Clustering;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 S. Samarawickrama, and L. Jayaratne, "Automatic text classification and focused crawling," in Proceeding of the 6th International Conference on Digital Information Management, Melbourne, Australia, pp. 143-148, Sept. 2011.
2 de Groc, C. "Babouk: focused web crawling for corpus compilation and automatic terminology extraction," In Proceeding of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 497-498, Aug. 2011.
3 Y. R. Lee, and E. G. Im, "A Study on the smart phone application clustering using information of android permissions," in Proceeding of the Conference on the Korean Institute of Communication Science, pp. 812-813, Feb. 2012.
4 S. Z. Lee, J. I. Tsujii, and H. C. Rim, "Hidden Markov model-based Korean part-of-speech tagging considering high agglutinativity, word-spacing, and lexical correlativity," in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 384-391, Oct. 2000.
5 H. G. Yoon, S. Kim, and S. B. Park, "Noise elimination in mobile app descriptions based on topic model," in Proceeding of the Conference on Human & Cognitive Language Technology, pp.64-68, Oct. 2013.
6 W. H. Rho, S. B. Cho, "A mobile app category recommendation system with contexts using bayesian network," in Proceeding of Korea Computer Congress, pp.1408-1410, Jun. 2013.
7 B. Yan, and G. Chen, "AppJoy: personalized mobile application discovery," in Proceedings of the 9th international conference on mobile systems, applications, and services, pp. 113-126, Jun. 2011.
8 A. L. Berger, V. J. Della Pietra, and S. A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, vol. 22, no. 1, pp. 39-71, Mar. 1996.
9 J. H. Lim, Y. S. Hwang, S. Y. Park, and H. C. Rim, "Semantic role labeling using maximum entropy model," in Proceeding of the Conference on Computational Natural Language Learning, Boston: MA, pp. 122-125, May. 2004.
10 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, and D. Cournapeau, "Scikit-learn: machine learning in python," Journal of Machine Learning Research, Vol. 12, pp. 2825-2830, Oct. 2011.
11 A. K. McCallum, MALLET: a machine learning for lan-guage toolkit, http://mallet.cs.umass.edu.
12 H. Liu, and E. Milios, "Probabilistic models for focused Web crawling," Computational Intelligence, Vol. 28, No. 3, pp. 289-328, Aug. 2012.   DOI   ScienceOn
13 B. Baharudin, L. H. Lee, and K. Khan, "A review of machine learning algorithms for text-documents classification," Journal of Advances in Information Technology, Vol. 1, No. 1, pp. 4-20, Feb. 2010.
14 J. P. Moon, W. S. Lee, J. H. Chang, "A proper folder recommendation technique using frequent itemsets for efficient e-mail classification," Journal of the Korea Society of Computer and Information, Vol. 16, No. 2, pp. 33-46, Feb. 2011.   과학기술학회마을   DOI   ScienceOn
15 Y. S. Hwang, J. C. Moon, S. J. Cho, "Classification of malicious Web pages by using SVM," Journal of the Korea Society of Computer and Information, Vol. 17, No. 3, pp. 77-83, Mar. 2012.   과학기술학회마을   DOI   ScienceOn
16 T. N. Rubin, A. Chambers, P. Smyth, and M. Steyvers, "Statistical topic models for multi-label document classification," Machine Learning, Vol. 88, No. 1-2, pp. 157-208, Dec. 2011.
17 S. Y. Park, J. Chang, and T. Kihl, "Document classification model using Web documents for balancing training corpus size per category," Journal of Information and Communication Convergence Engineering, Vol. 11, No. 4, Dec. 2013.   과학기술학회마을   DOI   ScienceOn
18 G. Lu, P. Huang, L. He, C. Cu, and X. Li, "A new semantic similarity measuring method based on Web search engines," WSEAS Transactions on Computers, Vol. 9, No. 1, pp. 1-10, Jan. 2010.
19 B. K. Sun, D. H. We, K. R. Han, "A Study on Paper Retrieval System based on OWL Ontology," Journal of the Korea Society of Computer and Information, Vol. 14, No. 2, pp. 169-180, Feb. 2009.   과학기술학회마을