[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2019-0443

Predicting numeric ratings for Google apps using text features and ensemble learning

Umer, Muhammad (Department of Computer Science, Khawaja Freed University)
Ashraf, Imran (Department of Information and Communication Engineering, Yeungnam Univeristy)
Mehmood, Arif (Department of Computer Science and Information Technology, The Islamia University of Bahawalpur)
Ullah, Saleem (Department of Computer Science, Khawaja Freed University)
Choi, Gyu Sang (Department of Information and Communication Engineering, Yeungnam Univeristy)

Publication Information

ETRI Journal / v.43, no.1, 2021 , pp. 95-108 More about this Journal

Abstract

Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.

Keywords

data mining; ensemble learning; Google app rating; opinion mining; text features; text mining;

Citations & Related Records

Reference

1	Statista, Number of available application in the Google Play store from December 2009 to March 2019, https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/, Online: accessed 22 May 2019.
2	Statistaa, Number of mobile app downloads worldwide in 2017, 2018 and 2020 (in billions), https://www.statista.com/statistics/271644/worldwide-free-and-paid-mobile-app-store-downloads/, Online: accessed 22 May 2019.
3	J. Horrigan, Online shopping, pew internet and american life project, Washington, DC, 2018, http://www.pewinternet.org/Reports/2008/Online-Shopping/01-Summary-of-Findings.aspx Online: accessed 8 Aug. 2014.
4	D. Pagano and W. Maalej, User feedback in the appstore: An empirical study, in Proc. IEEE Int. Requirements Eng. Conf. (Rio de Janeiro, Brazil), July 2013, pp. 125-134.
5	T. Chumwatana, Using sentiment analysis technique for analyzing Thai customer satisfaction from social media, 2015.
6	T. Thiviya et al., Mobile apps' feature extraction based on user reviews using machine learning, 2019.
7	H. Hanyang et al., Studying the consistency of star ratings and reviews of popular free hybrid android and ios apps, Empirical Softw. Eng. 24 (2019), no. 7, 7-32. DOI
8	N. Kumari and S. Narayan Singh, Sentiment analysis on e-commerce application by using opinion mining, in Proc. Int. Conf.-Cloud Syst. Big Data Eng. (Noida, India), Jan. 2016, pp. 320-325.
9	R. M. Duwairi and I. Qarqaz, Arabic sentiment analysis using supervised classification, in Proc. Int. Conf. Future Internet Things Cloud (Barcelona, Spain), Aug. 2014, pp. 579-583.
10	H. S. Le, T. V. Le, and T. V. Pham, Aspect analysis for opinion mining of vietnamese text, in Proc. Int. Conf. Adv. Comput. Applicat. (Ho Chi Minh, Vietnam), Nov. 2015, pp. 118-123.
11	H. Wang, L. Yue, and C. Zhai, Latent aspect rating analysis on review text data: A rating regression approach, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining (Washington, D.C., USA), July 2010, pp. 783-792.
12	K. Dave, S. Lawrence, and D. M. Pennock, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, in Proc. Int. Conf. World Wide Web (New York, USA), 2003, pp. 519-528.
13	A. Buche, D. Chandak, and A. Zadgaonkar, Opinion mining and analysis: A survey, arXiv preprint arXiv:1307.3336, 2013. DOI
14	B. Pang, L. Lee, S. Vaithyanathan, Thumbs up?: Sentiment classification using machine learning techniques, in Proc. ACL-02 Conf. Empirical Methods Natural Language Process. (Stroudsbrug, PA, USA), 2002, pp. 79-86.
15	C. Cardie et al., Combining low-level and summary representations of opinions for multi-perspective question answering, New directions in question answering, 2003, pp. 20-27.
16	H. Takamura, T. Inui, and M. Okumura, Extracting semantic orientations of words using spin model, in Proc. Annu. Meeting Association Comput. Linguistics (Ann Arbor, MI, USA), 2005, pp. 133-140.
17	M. Suleman, A. Malik, and S. S. Hussain, Google play store app ranking prediction using machine learning algorithm, Urdu News Headline, Text Classification by Using Different Machine Learning Algorithms, 2019.
18	F. Sarro et al., Customer rating reactions can be predicted purely using app features, in Proc. IEEE Int. Requirements Eng. Conf. (Banaf, Canada), Aug. 2018, pp. 76-87.
19	S. Aslam and I. Ashraf, Data mining algorithms and their applications in education data mining, Int. J. Adv. Res. Computer Sci. Manag. Studies 2 (2014), no. 7, 50-56.
20	D. Martens and T. Johann, On the emotion of users in app reviews, in Proc. IEEE/ACM Int. Workshop Emotion Awareness Softw. Eng. (Buenos Aires, Argentina), May 2017, pp. 8-14.
21	G. Hackeling, Mastering machine learning with scikit-learn, Packt Publishing Ltd, 2017.
22	Scikit learn, Scikit-learn classification and regression models, http://scikitlearn.org/stable/supervised_learning.html#supervised-learning/, Online: accessed 10 Apr. 2019
23	Z. Hailong, G. Wenyan, and J. Bo, Machine learning and lexicon based methods for sentiment classification: A survey, in Proc. Web Inf. Syst. Applicat. Conf. (Tianjin, China), Sept. 2014, pp. 262-265.
24	O. Araque et al., Enhancing deep learning sentiment analysis with ensemble techniques in social applications, Expert Syst. Appl. 77 (2017), 236-246. DOI
25	J. Hartmann et al., Comparing automated text classification methods, Int. J. Res. Mark. 36 (2019), 20-38. DOI
26	O. Aziz et al., A comparison of accuracy of fall detection algorithms (threshold-based vs. machine learning) using waistmounted tri-axial accelerometer signals from a comprehensive set of falls and non-fall trials, Med. Biol. Eng. Comput. 55 (2017), no. 1, 45-55. DOI
27	L. Breiman, Random forests, Mach. Learn. 45 (2001), no. 1, 5-32. DOI
28	R. E. Schapire and Y. Singer, Improved boosting algorithms using confidence-rated predictions, Mach. Learn. 37 (1999), no. 3, 297-336. DOI
29	A. Natekin and A. Knoll, Gradient boosting machines, a tutorial, Frontiers Neurorobotics 7 (2013), 21. DOI
30	T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining (San Francisco, CA, USA), Aug. 2016, pp. 785-794.
31	P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Mach. Learn. 63 (2006), no. 1, 3-42. DOI
32	R. Feldman and J. Sanger, The text mining handbook: Advanced approaches in analyzing unstructured data, Cambridge University Press, 2007.
33	B. Sriram et al., Short text classification in twitter to improve information filtering, in Proc. Int. ACM SIGIR Conf. Res. Development Inf. Retrieval (Geneva, Switzerland), July 2010, pp. 841-842.
34	I. Ashraf, S. Hur, and Y. Park, Blocate: A building identification scheme in gps denied environments using smartphone sensors, Sensors 18 (2018), no. 11, 3862. DOI
35	Scikit learn, Scikit-learn feature extraction with countvectorizer, https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.Count/, Online: accessed 5 Apr. 2019
36	Scikit learn, Scikit-learn feature extraction with tf/idf, https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.Tfidf/, Online: accessed 5 Apr. 2019
37	J. Han, J. Pei, and M. Kamber, Data mining: Concepts and techniques, Elsevier, 2011.
38	S. Loria, textblob documentation, Release 0.15 2 (2018).
39	P. Geurts and G. Louppe, Learning to rank with extremely randomized trees, JMLR: Workshop Conf. Proc. 14 (2011) 49-61.
40	X. Z. Fern and C. E. Brodley, Boosting lazy decision trees, In Proc. Int. Conf. Mach. Learn., 2003, pp. 178-185.
41	L. Breiman, Randomizing outputs to increase prediction accuracy, Mach. Learn. 40 (2000), no. 3, 229-242. DOI

	(2021) Advances in astronomy Predicting Pulsars from Imbalanced Dataset with Hybrid Resampling Approach / 2021 , 1
10	(2021) Journal of ambient intelligence and humanized computing Extensive hotel reviews classification using long short term memory / 12 (10) , 9375