Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

Yoon, Yeo-Chan;Lee, Junwoo;Park, So-Young;Lee, Changki;

doi:10.4218/etrij.17.0116.0936

ETRI Journal

제39권4호
/
Pages.443-454
/
2017
/
1225-6463(pISSN)
/
2233-7326(eISSN)

한국전자통신연구원 (Electronics and Telecommunications Research Institute)

DOI QR Code

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

Yoon, Yeo-Chan (SW & Content Research Laboratory, ETRI) ;
Lee, Junwoo (SW & Content Research Laboratory, ETRI) ;
Park, So-Young (Department of Game Design and Development, Sangmyung University) ;
Lee, Changki (Department of Computer Science, Kangwon National University)

투고 : 2016.12.21
심사 : 2017.05.08
발행 : 2017.08.01

https://doi.org/10.4218/etrij.17.0116.0936 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we propose a fine-grained mobile application clustering model using retrofitted document embedding. To automatically determine the clusters and their numbers with no predefined categories, the proposed model initializes the clusters based on title keywords and then merges similar clusters. For improved clustering performance, the proposed model distinguishes between an accurate clustering step with titles and an expansive clustering step with descriptions. During the accurate clustering step, an automatically tagged set is constructed as a result. This set is utilized to learn a high-performance document vector. During the expansive clustering step, more applications are then classified using this document vector. Experimental results showed that the purity of the proposed model increased by 0.19, and the entropy decreased by 1.18, compared with the K-means algorithm. In addition, the mean average precision improved by more than 0.09 in a comparison with a support vector machine classifier.

키워드

참고문헌

Number of Apps Available in Leading App Stores, June 2016, Retrieved from https://www.statista.com/statistics/ 276623/number-of-apps-available-in-leading-app-stores/
H. Zhu et al., "Exploiting Enriched Contextual Information for Mobile App Classification," Proc. ACM Int. Conf. Inform. Knowl. Manag., Maui, HI, USA, Oct. 29-Nov. 2, 2012, pp. 1617-1621.
H. Zhu et al., "Mobile App Classification with Enriched Contextual Information," IEEE Trans. Mobile Comput., vol. 13, no. 7, 2014, pp. 1550-1563. https://doi.org/10.1109/TMC.2013.113
M. Lindorfer, M. Neugschwandtner, and C. Platzer, "Marvin: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis," IEEE Annu. Comput. Softw. Applicat. Conf., Taichung, Taiwan, July 1-5, 2015, pp. 442-433.
G. Berardi et al., "Multi-store Metadata-Based Supervised Mobile App Classification," Proc. Annu. ACM Symp. Appl. Comput., Salamanca, Spain, Apr. 13-17, 2015, pp. 585- 588.
J.M. Heo and S.Y. Park, "Word Cluster-Based Mobile Application Categorization," J. Korea Soc. Comput. Inform., vol. 19, no. 3, Mar. 2014, pp. 19-24.
V. Radosavljevic et al., "Smartphone App Categorization for Interest Targeting in Advertising Marketplace," Proc. Int. Conf. Companion World Wide Web., Quebec, Canada, Apr. 11-15, 2016, pp. 93-94.
J.D. Rose, "An Efficient Association Rule Based Hierarchical Algorithm for Text Clustering," Int. J. Adv. Eng. Techol., vol. 7, no. 1, Jan.-Mar. 2016, pp. 751- 753.
F. Beil, M. Ester, and X. Xu, "Frequent Term-Based Text Clustering," Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Alberta, Canada, July 23-26, 2002, pp. 436-442.
S.S. Bedi, H. Yadav, and P. Yadav, "Categorization, Clustering and Association Rule Mining on WWW," Multimedia, Signal Process. Commun. Techol., Aligarh, India, Mar. 14-16, 2009, pp. 173-177.
A. Kongthon, C. Haruechaiyasak, and S. Thaiprayoon, "Constructing Term Thesaurus Using Text Association Rule Mining," in Proc. ECTICON 2008, Krabi, Thailand, May 14-17, 2008, pp. 137-140.
S. Das et al., "Opinion Based on Polarity and Clustering for Product Feature Extraction," Int. J. Inform. Eng. Electron. Bus., vol. 8, no. 5, Sept. 2016, pp. 36-43. https://doi.org/10.5815/ijieeb.2016.05.05
K. Bafna and D. Toshniwal, "Feature Based Summarization of Customers' Reviews of Online Products," Procedia Comput. Sci., vol. 22, 2013, pp. 142-151. https://doi.org/10.1016/j.procs.2013.09.090
S. Homoceanu et al., "Will I Like It? Providing Product Overviews Based on Opinion Excerpts," IEEE Conf. Commerce Enterprise Comput., Luxembourg, Sept. 5-7, 2011, pp. pp. 26-33.
Z. Zhai et al., "Clustering Product Features for Opinion Mining," Proc. ACM Int. Conf. Web Search Data Mining, Hong Kong, China, Feb. 9-12, 2011, pp. 347-354.
M. Hegland, "The Apriori Algorithm-a Tutorial", in Mathematics and Computation in Imaging Science and Information Processing, Singapore: World Scientific, 2005, pp. 209-262.
T. Mikolov and J. Dean, "Distributed Representations of Words and Phrases and Their Compositionality," in Advances in Neural Information Processing Systems, MIT Press, 2013.
J.H. Lau and T. Baldwin, An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation, July 2016, Accessed 2016. https://arxiv.Org/ abs/1607.05368
M.J. Kusner et al., "From Word Embeddings to Document Distances," Proc. Int. Conf. Mach. Learn., Lille, France, July 6-11, 2015, pp. 957-966.
B. Hu et al., "Convolutional Neural Network Architectures for Matching Natural Language Sentences," Adv. Neural Inform. Process. Syst., Montreal, Canada, Dec. 8-13, 2014, pp. 2042-2050.
Y. Kim, Convolutional Neural Networks for Sentence Classification, Sept. 2014, Accessed 2016. https://arxiv.org/ abs/1408.5882
T. Kenter and M. de Rijke, "Short Text Similarity with Word Embeddings," Proc. ACM Int. Conf. Inform. Knowl. Manag., Melbourne, Australia, Oct. 18-23, 2015, pp. 1411-1420.
C.B. di Chen et al., "Simcompass: Using Deep Learning Word Embeddings to Assess Cross-Level Similarity," Proc. Int. Workshop Semantic Evaluation, Dublin, Ireland, Aug. 23-24, 2014, pp. 560-565.
Q.V. Le and T. Mikolov, "Distributed Representations of Sentences and Documents," Int. Conf. Machin. Learn., Beijing, China, 2014, pp. 1-9.
A.M. Dai, C. Olah, and Q.V. Le, Document Embedding with Paragraph Vectors, July 2015, Accessed 2015. https:// arxiv.org/abs/1507.07998
R. Kiros et al., "Skip-Thought Vectors," in Advances in Neural Information Processing Systems, MIT Press, 2015.
S. Wang et al., "Linked Document Embedding for Classification," Proc. ACM Int. Conf. Inform. Knowl. Manag., Indianapolis, IN, USA, Oct. 24-28, 2016, pp. 115- 124.
R. Johansson and L.N. Pina, "Embedding a Semantic Network in a Word Space," Proc. Conf. North American Chapter Association Computational Linguistics: Human Language Technol., Denver, CO, USA, May 31-June 5, 2015, pp. 1428-1433.
S. Rothe and H. Schutze, Autoextend: Extending Word Embeddings to Embeddings for Synsets and Lexemes, July 2015, Aceessed 2016. https://arxiv.org/abs/1507.01127
Z. Chen et al., "Revisiting Word Embedding for Contrasting Meaning," Proc. Annu. Meeting ACL-IJCNLP, Bejing, China, July 26-31, 2015, pp. 106-115.
Q. Liu et al., "Learning Semantic Word Embeddings Based On Ordinal Knowledge Constraints," Proc. Annu. Meeting ACL-IJCNLP, Bejing, China, July 26-31, 2015, pp. 1501- 1511.
M. Faruqui et al., Retrofitting Word Vectors to Semantic Lexicons, Mar. 2015, Accessed 2016. https://arxiv.org/abs/ 1411.4166
A. Mnih and K. Kavukcuoglu, "Learning Word Embeddings Efficiently with Noise-Contrastive Estimation," in Advances in Neural Information Processing Systems, MIT Press, 2013.
Viennot N., Garcia E., and Nieh J., "A Measurement Study of Google Play," ACM SIGMETRICS Performance Evaluation Rev., vol. 42, no. 1, 2014, pp. 221-233. https://doi.org/10.1145/2637364.2592003
M. Lopez-Ibanez et al., "The Irace Package, Iterated Race for Automatic Algorithm Configuration," Universite Libre de Bruxelles, Belgium, Technical Report TR/IRIDIA/2011- 004, IRIDIA, 2011.
F. Pedregosa et al., "Scikit-Learn: Machine learning in Python," J. Mach. Learn. Res., vol. 12, Oct. 2011, pp. 2825-2830.
R. Rehurek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora," In Proc. LREC Workshop New Challenges NLP Frameworks, Malta, 2010, pp. 46-540.
G. Peng et al., "K-means Document Clustering Based on Latent Dirichlet Allocation," In Proc. WDSI, Las Vegas, NV, USA, Apr. 5-9, 2016.
C.K. Lee and M.G. Jang, "A Modified Fixed-threshold SMO for 1-Slack Structural SVM," ETRI J., vol. 32, no. 1, Feb. 2010, pp. 120-128. https://doi.org/10.4218/etrij.10.0109.0425
C.K. Lee, "1-Slack One-Class SVM for Fast Learning," J. KIISE, vol. 19, no. 5, 2013, pp. 253-257.

피인용 문헌

Image classification and captioning model considering a CAM-based disagreement loss vol.42, pp.1, 2017, https://doi.org/10.4218/etrij.2018-0621

ETRI Journal

Fine-Grained Mobile Application Clustering Model Using Retrofitted Document Embedding

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)