Browse > Article
http://dx.doi.org/10.9708/jksci.2015.20.4.017

Two-Phase Clustering Method Considering Mobile App Trends  

Heo, Jeong-Man (Dept. of Game Design & Development, SangMyung University)
Park, So-Young (Dept. of Game Design & Development, SangMyung University)
Abstract
In this paper, we propose a mobile app clustering method using word clusters. Considering the quick change of mobile app trends, the proposed method divides the mobile apps into some semantically similar mobile apps by applying a clustering algorithm to the mobile app set, rather than the predefined category system. In order to alleviate the data sparseness problem in the short mobile app description texts, the proposed method additionally utilizes the unigram, the bigram, the trigram, the cluster of each word. For the purpose of accurately clustering mobile apps, the proposed method manages to avoid exceedingly small or large mobile app clusters by using the word clusters. Experimental results show that the proposed method improves 22.18% from 57.48% to 79.66% on overall accuracy by using the word clusters.
Keywords
Mobile App Clustering; Word Clustering; Clustering Algorithm; Text Analysis;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 S. S. Kim, K. S. Han, B. S. Kim, S. K. Park and S. K. Ahn, "An Empirical Study on Users' Intention to Use Mobile Applications", Journal of Korean Institute of Information Technology, Vol. 9, No. 8, pp. 213-228, Aug. 2011.
2 J. M. Lim, J. Y. Yu, S. J. Jang, J. H. Lee and J. M. Yu, "Survey on the Internet Usage", Korea Internet & Security Agency, pp. 284, Dec. 2013.
3 S. Y. Park, J. Chang, and T. Kihl, "Document Classification Model using Web Documents for Balancing Training Corpus Size per Category," Journal of Information and Communication Convergence Engineering, Vol. 11, No. 4, Dec. 2013.
4 J. Heo, S. Y. Park, "Word Cluster-based Mobile Application Categorization", Journal of The Korea Society of Computer and Information, Vol. 19, No. 3, pp.17-24, Mar. 2014.   DOI
5 H. S. Lim, "Development Trends and Construction of an Automatic Document Classifier", Journal of Internet Computing and Services, Vol. 3, No. 3, pp. 48-56, Sep. 2002.
6 Y. Yang, J. O. Pedersenm, "A Comparative Study on Feature Selection in Text Categorization", Proc. of the International Conference in Machine Learning, pp. 412-420, July. 1997.
7 J. P. Moon, W. S. Lee, J. H. Chang, "A Proper Folder Recommendation Technique using Frequent Itemsets for Efficient e-mail Classification," Journal of the Korea Society of Computer and Information, Vol. 16, No. 2, pp. 33-46, Feb. 2011.   DOI
8 C. Apte and F. Damerau, "Automated Learning of Decision Rules for Text Categorization", ACM Trans. on Information Systems, Vol. 12, No. 3, pp. 223-251, July. 1994.
9 E. Weiner, J. O. Pedersenm and A. S. Weigned, "A Neural Network Approach to Topic Spotting", Proc. of the Annual Symposium on Document Analysis and Information Retrieval, pp.317-332, Apr. 1995.
10 T. Joachims, "Text Categorization with Support Vector Machines : Learning with many relevant features", Proc. of International Conference on Machine Learning, pp. 137-142, July. 1998.
11 Y. S. Hwang, J. C. Moon, S. J. Cho, "Classification of Malicious Web Pages by Using SVM," Journal of the Korea Society of Computer and Information, Vol. 17, No. 3, pp. 77-83, Mar. 2012.   DOI
12 D. W. Noh, S. Y. Lee and D. Y. Ra, "Developing a Text Categorization System Based on Unsupervised Learning Using an Information Retrieval Technique", Journal of KIISE : Computer Systems and Theory, Vol. 34, No. 2, pp. 160-168, Feb. 2007.
13 O. Zamir and O. Etzioni, "Grouper: A Dynamic Clustering Interface to Web Search Results," Proc. of the International World Wide Web Conference, pp.1361-1374, May. 1999.
14 P. Liang, D. Klein, "Online EM for unsupervised models", Proc. of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 611-619, Jun. 2009.
15 O. Zamir, "Fast and Intuitive Clustering of Web Documents," Proc. of the International Conference on Knowledge Discovery and Data Mining, pp. 287-290, Aug. 1997.
16 O. Zamir and O. Etzioni, "Web Document Clustering: A Feasibility Demonstration," Proc. of ACM SIGIR, pp.46-54, Aug. 1998.
17 G. Wei, "Named Entity Recognition and An Apply on Document Clustering," MSCs thesis, Dalhousie University, Oct. 2004.
18 H. Toda and R. Kataoka, "A Search Result Clustering Method Using Informatively Named Entities," Proc. of ACM International workshop on WIDM, pp.81-86, Nov. 2005.
19 K. Y. Sung and B. H. Yun, "Topic based Web Document Clustering using Named Entities", Journal of the Korea Contents Association, Vol. 10, No. 5, pp. 29-36, May. 2010.   DOI   ScienceOn
20 D. H. Kim, K. H. Joo and J. T. Choi, "An Effective Content Clustering Method for the Large Documents", Proceedings of KIIT Summer Conference, Hanbat National University, Korea, pp. 289-297, Jun. 2006.
21 J. C. Shin and C. Y. Ock, "Search Results Clustering In Real-time", Korea Computer Congress 2009, Mokpo National Maritime University, Korea, pp. 474-479, Jun. 2009.
22 H. G. Yoon, S. Kim, and S. B. Park, "Noise Elimination in Mobile App Descriptions based on Topic Model," in Proceeding of the Conference on Human & Cognitive Language Technology, pp.64-68, Oct. 2013.
23 J. A. Hartigan, and M. A. Wong, "A K-means Clustering Algorithm", Applied. Statistics, Vol. 28, No. 1, pp.100-108, Mar. 1979.   DOI   ScienceOn
24 S. Z. Lee, J. I. Tsujii, and H. C. Rim, "Hidden Markov Model-based Korean Part-of-Speech Tagging Considering High Agglutinativity, Word-spacing, and Lexical Correlativity," in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 384-391, Oct. 2000.