DOI QR코드

DOI QR Code

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

  • Yoon, Yeo-Chan (SW & Content Research Laboratory, ETRI) ;
  • Lee, Junwoo (SW & Content Research Laboratory, ETRI) ;
  • Park, So-Young (Department of Game Design and Development, Sangmyung University) ;
  • Lee, Changki (Department of Computer Science, Kangwon National University)
  • Received : 2016.12.21
  • Accepted : 2017.05.08
  • Published : 2017.08.01

Abstract

In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

Keywords

References

  1. Number of Apps Available in Leading App Stores, June 2016, Retrieved from https://www.statista.com/statistics/ 276623/number-of-apps-available-in-leading-app-stores/
  2. H. Zhu et al., "Exploiting Enriched Contextual Information for Mobile App Classification," Proc. ACM Int. Conf. Inform. Knowl. Manag., Maui, HI, USA, Oct. 29-Nov. 2, 2012, pp. 1617-1621.
  3. H. Zhu et al., "Mobile App Classification with Enriched Contextual Information," IEEE Trans. Mobile Comput., vol. 13, no. 7, 2014, pp. 1550-1563. https://doi.org/10.1109/TMC.2013.113
  4. M. Lindorfer, M. Neugschwandtner, and C. Platzer, "Marvin: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis," IEEE Annu. Comput. Softw. Applicat. Conf., Taichung, Taiwan, July 1-5, 2015, pp. 442-433.
  5. G. Berardi et al., "Multi-store Metadata-Based Supervised Mobile App Classification," Proc. Annu. ACM Symp. Appl. Comput., Salamanca, Spain, Apr. 13-17, 2015, pp. 585- 588.
  6. J.M. Heo and S.Y. Park, "Word Cluster-Based Mobile Application Categorization," J. Korea Soc. Comput. Inform., vol. 19, no. 3, Mar. 2014, pp. 19-24.
  7. V. Radosavljevic et al., "Smartphone App Categorization for Interest Targeting in Advertising Marketplace," Proc. Int. Conf. Companion World Wide Web., Quebec, Canada, Apr. 11-15, 2016, pp. 93-94.
  8. J.D. Rose, "An Efficient Association Rule Based Hierarchical Algorithm for Text Clustering," Int. J. Adv. Eng. Techol., vol. 7, no. 1, Jan.-Mar. 2016, pp. 751- 753.
  9. F. Beil, M. Ester, and X. Xu, "Frequent Term-Based Text Clustering," Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Alberta, Canada, July 23-26, 2002, pp. 436-442.
  10. S.S. Bedi, H. Yadav, and P. Yadav, "Categorization, Clustering and Association Rule Mining on WWW," Multimedia, Signal Process. Commun. Techol., Aligarh, India, Mar. 14-16, 2009, pp. 173-177.
  11. A. Kongthon, C. Haruechaiyasak, and S. Thaiprayoon, "Constructing Term Thesaurus Using Text Association Rule Mining," in Proc. ECTICON 2008, Krabi, Thailand, May 14-17, 2008, pp. 137-140.
  12. S. Das et al., "Opinion Based on Polarity and Clustering for Product Feature Extraction," Int. J. Inform. Eng. Electron. Bus., vol. 8, no. 5, Sept. 2016, pp. 36-43. https://doi.org/10.5815/ijieeb.2016.05.05
  13. K. Bafna and D. Toshniwal, "Feature Based Summarization of Customers' Reviews of Online Products," Procedia Comput. Sci., vol. 22, 2013, pp. 142-151. https://doi.org/10.1016/j.procs.2013.09.090
  14. S. Homoceanu et al., "Will I Like It? Providing Product Overviews Based on Opinion Excerpts," IEEE Conf. Commerce Enterprise Comput., Luxembourg, Sept. 5-7, 2011, pp. pp. 26-33.
  15. Z. Zhai et al., "Clustering Product Features for Opinion Mining," Proc. ACM Int. Conf. Web Search Data Mining, Hong Kong, China, Feb. 9-12, 2011, pp. 347-354.
  16. M. Hegland, "The Apriori Algorithm-a Tutorial", in Mathematics and Computation in Imaging Science and Information Processing, Singapore: World Scientific, 2005, pp. 209-262.
  17. T. Mikolov and J. Dean, "Distributed Representations of Words and Phrases and Their Compositionality," in Advances in Neural Information Processing Systems, MIT Press, 2013.
  18. J.H. Lau and T. Baldwin, An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation, July 2016, Accessed 2016. https://arxiv.Org/ abs/1607.05368
  19. M.J. Kusner et al., "From Word Embeddings to Document Distances," Proc. Int. Conf. Mach. Learn., Lille, France, July 6-11, 2015, pp. 957-966.
  20. B. Hu et al., "Convolutional Neural Network Architectures for Matching Natural Language Sentences," Adv. Neural Inform. Process. Syst., Montreal, Canada, Dec. 8-13, 2014, pp. 2042-2050.
  21. Y. Kim, Convolutional Neural Networks for Sentence Classification, Sept. 2014, Accessed 2016. https://arxiv.org/ abs/1408.5882
  22. T. Kenter and M. de Rijke, "Short Text Similarity with Word Embeddings," Proc. ACM Int. Conf. Inform. Knowl. Manag., Melbourne, Australia, Oct. 18-23, 2015, pp. 1411-1420.
  23. C.B. di Chen et al., "Simcompass: Using Deep Learning Word Embeddings to Assess Cross-Level Similarity," Proc. Int. Workshop Semantic Evaluation, Dublin, Ireland, Aug. 23-24, 2014, pp. 560-565.
  24. Q.V. Le and T. Mikolov, "Distributed Representations of Sentences and Documents," Int. Conf. Machin. Learn., Beijing, China, 2014, pp. 1-9.
  25. A.M. Dai, C. Olah, and Q.V. Le, Document Embedding with Paragraph Vectors, July 2015, Accessed 2015. https:// arxiv.org/abs/1507.07998
  26. R. Kiros et al., "Skip-Thought Vectors," in Advances in Neural Information Processing Systems, MIT Press, 2015.
  27. S. Wang et al., "Linked Document Embedding for Classification," Proc. ACM Int. Conf. Inform. Knowl. Manag., Indianapolis, IN, USA, Oct. 24-28, 2016, pp. 115- 124.
  28. R. Johansson and L.N. Pina, "Embedding a Semantic Network in a Word Space," Proc. Conf. North American Chapter Association Computational Linguistics: Human Language Technol., Denver, CO, USA, May 31-June 5, 2015, pp. 1428-1433.
  29. S. Rothe and H. Schutze, Autoextend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, July 2015, Aceessed 2016. https://arxiv.org/abs/1507.01127
  30. Z. Chen et al., "Revisiting Word Embedding for Contrasting Meaning," Proc. Annu. Meeting ACL-IJCNLP, Bejing, China, July 26-31, 2015, pp. 106-115.
  31. Q. Liu et al., "Learning Semantic Word Embeddings Based On Ordinal Knowledge Constraints," Proc. Annu. Meeting ACL-IJCNLP, Bejing, China, July 26-31, 2015, pp. 1501- 1511.
  32. M. Faruqui et al., Retrofitting Word Vectors to Semantic Lexicons, Mar. 2015, Accessed 2016. https://arxiv.org/abs/ 1411.4166
  33. A. Mnih and K. Kavukcuoglu, "Learning Word Embeddings Efficiently with Noise-Contrastive Estimation," in Advances in Neural Information Processing Systems, MIT Press, 2013.
  34. Viennot N., Garcia E., and Nieh J., "A Measurement Study of Google Play," ACM SIGMETRICS Performance Evaluation Rev., vol. 42, no. 1, 2014, pp. 221-233. https://doi.org/10.1145/2637364.2592003
  35. M. Lopez-Ibanez et al., "The Irace Package, Iterated Race for Automatic Algorithm Configuration," Universite Libre de Bruxelles, Belgium, Technical Report TR/IRIDIA/2011- 004, IRIDIA, 2011.
  36. F. Pedregosa et al., "Scikit-Learn: Machine learning in Python," J. Mach. Learn. Res., vol. 12, Oct. 2011, pp. 2825-2830.
  37. R. Rehurek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora," In Proc. LREC Workshop New Challenges NLP Frameworks, Malta, 2010, pp. 46-540.
  38. G. Peng et al., "K-means Document Clustering Based on Latent Dirichlet Allocation," In Proc. WDSI, Las Vegas, NV, USA, Apr. 5-9, 2016.
  39. C.K. Lee and M.G. Jang, "A Modified Fixed-threshold SMO for 1-Slack Structural SVM," ETRI J., vol. 32, no. 1, Feb. 2010, pp. 120-128. https://doi.org/10.4218/etrij.10.0109.0425
  40. C.K. Lee, "1-Slack One-Class SVM for Fast Learning," J. KIISE, vol. 19, no. 5, 2013, pp. 253-257.

Cited by

  1. Image classification and captioning model considering a CAM-based disagreement loss vol.42, pp.1, 2017, https://doi.org/10.4218/etrij.2018-0621