• Title/Summary/Keyword: Predicting Popularity

Search Result 28, Processing Time 0.022 seconds

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

  • Dorjmaa, Tserendulam;Shin, Taeksoo
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.167-183
    • /
    • 2017
  • The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.

A Machine Learning-based Popularity Prediction Model for YouTube Mukbang Content (머신러닝 기반의 유튜브 먹방 콘텐츠 인기 예측 모델)

  • Beomgeun Seo;Hanjun Lee
    • Journal of Internet Computing and Services
    • /
    • v.24 no.6
    • /
    • pp.49-55
    • /
    • 2023
  • In this study, models for predicting the popularity of mukbang content on YouTube were proposed, and factors influencing the popularity of mukbang content were identified through post-analysis. To accomplish this, information on 22,223 pieces of content was collected from top mukbang channels in terms of subscribers using APIs and Pretty Scale. Machine learning algorithms such as Random Forest, XGBoost, and LGBM were used to build models for predicting views and likes. The results of SHAP analysis showed that subscriber count had the most significant impact on view prediction models, while the attractiveness of a creator emerged as the most important variable in the likes prediction model. This confirmed that the precursor factors for content views and likes reactions differ. This study holds academic significance in analyzing a large amount of online content and conducting empirical analysis. It also has practical significance as it informs mukbang creators about viewer content consumption trends and provides guidance for producing high-quality, marketable content.

Predicting the Lifespan and Retweet Times of Tweets Based on Multiple Feature Analysis

  • Bae, Yongjin;Ryu, Pum-Mo;Kim, Hyunki
    • ETRI Journal
    • /
    • v.36 no.3
    • /
    • pp.418-428
    • /
    • 2014
  • In social network services, such as Facebook, Google+, Twitter, and certain postings attract more people than others. In this paper, we propose a novel method for predicting the lifespan and retweet times of tweets, the latter being a proxy for measuring the popularity of a tweet. We extract information from retweet graphs, such as posting times; and social, local, and content features, so as to construct prediction knowledge bases. Tweets with a similar topic, retweet pattern, and properties are sequentially extracted from the knowledge base and then used to make a prediction. To evaluate the performance of our model, we collected tweets on Twitter from June 2012 to October 2012. We compared our model with conventional models according to the prediction goal. For the lifespan prediction of a tweet, our model can reduce the time tolerance of a tweet lifespan by about four hours, compared with conventional models. In terms of prediction of the retweet times, our model achieved a significantly outstanding precision of about 50%, which is much higher than two of the conventional models showing a precision of around 30% and 20%, respectively.

A Model to Predict Popularity of Internet Posts on Internet Forum Sites (인터넷 토론 게시판의 게시물 인기도 예측 모델)

  • Lee, Yun-Jung;Jung, In-Jun;Woo, Gyun
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.113-120
    • /
    • 2012
  • Today, Internet users can easily create and share the digital contents with others through various online content sharing services such as YouTube. So, many portal sites are flooded with lots of user created contents (UCC) in various media such as texts and videos. Estimating popularity of UCC is a crucial concern to both users and the site administrators. This paper proposes a method to predict the popularity of Internet articles, a kind of UCC, using the dynamics of the online contents themselves. To analyze the dynamics, we regarded the access counts of Internet posts as the popularity of them and analyzed the variation of the access counts. We derived a model to predict the popularity of a post represented by the time series of access counts, which is based on an exponential function. According to the experimental results, the difference between the actual access counts and the predicted ones is not more than 10 for 20,532 posts, which cover about 90.7% of the test set.

Predicting Relative Superiority of TV Drama First Episodes based on the Quantitative Competency Index of the Cast and Crew (TV드라마 참여 인물의 계량 능력지표에 기반한 첫 회 시청률 상대적 우위 예측)

  • Ju, Sang Phil;Hong, June Seok;Kim, Wooju
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.6
    • /
    • pp.179-191
    • /
    • 2019
  • It is not easy to predict the return on investment in the content business, and there is no index to evaluate cast & crew. The absolute number of TV ratings is steadily declining, but there is no substitute index yet. In this study, we tried to predict the relative popularity of the drama by designing the relative superiority of the individual drama viewership as the response variable and designing the relative superiority of the drama participants as the explanatory variables. We used various machine learning algorithms and added explanatory variables that were found to be useful in previous studies. As a result, with properly combined explanatory variables, a high prediction accuracy of 84% is obtained. In this study, we intend to promote the investment efficiency of the entire contents industry by predicting the relative popularity of the contents.

Text Mining and Sentiment Analysis for Predicting Box Office Success

  • Kim, Yoosin;Kang, Mingon;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4090-4102
    • /
    • 2018
  • After emerging online communications, text mining and sentiment analysis has been frequently applied into analyzing electronic word-of-mouth. This study aims to develop a domain-specific lexicon of sentiment analysis to predict box office success in Korea film market and validate the feasibility of the lexicon. Natural language processing, a machine learning algorithm, and a lexicon-based sentiment classification method are employed. To create a movie domain sentiment lexicon, 233,631 reviews of 147 movies with popularity ratings is collected by a XML crawling package in R program. We accomplished 81.69% accuracy in sentiment classification by the Korean sentiment dictionary including 706 negative words and 617 positive words. The result showed a stronger positive relationship with box office success and consumers' sentiment as well as a significant positive effect in the linear regression for the predicting model. In addition, it reveals emotion in the user-generated content can be a more accurate clue to predict business success.

A Study on Sentiment Pattern Analysis of Video Viewers and Predicting Interest in Video using Facial Emotion Recognition (얼굴 감정을 이용한 시청자 감정 패턴 분석 및 흥미도 예측 연구)

  • Jo, In Gu;Kong, Younwoo;Jeon, Soyi;Cho, Seoyeong;Lee, DoHoon
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.215-220
    • /
    • 2022
  • Emotion recognition is one of the most important and challenging areas of computer vision. Nowadays, many studies on emotion recognition were conducted and the performance of models is also improving. but, more research is needed on emotion recognition and sentiment analysis of video viewers. In this paper, we propose an emotion analysis system the includes a sentiment analysis model and an interest prediction model. We analyzed the emotional patterns of people watching popular and unpopular videos and predicted the level of interest using the emotion analysis system. Experimental results showed that certain emotions were strongly related to the popularity of videos and the interest prediction model had high accuracy in predicting the level of interest.

Korean and English Sentiment Analysis Using the Deep Learning

  • Ramadhani, Adyan Marendra;Choi, Hyung Rim;Lim, Seong Bae
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.3
    • /
    • pp.59-71
    • /
    • 2018
  • Social media has immense popularity among all services today. Data from social network services (SNSs) can be used for various objectives, such as text prediction or sentiment analysis. There is a great deal of Korean and English data on social media that can be used for sentiment analysis, but handling such huge amounts of unstructured data presents a difficult task. Machine learning is needed to handle such huge amounts of data. This research focuses on predicting Korean and English sentiment using deep forward neural network with a deep learning architecture and compares it with other methods, such as LDA MLP and GENSIM, using logistic regression. The research findings indicate an approximately 75% accuracy rate when predicting sentiments using DNN, with a latent Dirichelet allocation (LDA) prediction accuracy rate of approximately 81%, with the corpus being approximately 64% accurate between English and Korean.

A Strategy of Assessing Climate Factors' Influence for Agriculture Output

  • Kuan, Chin-Hung;Leu, Yungho;Lee, Chien-Pang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1414-1430
    • /
    • 2022
  • Due to the Internet of Things popularity, many agricultural data are collected by sensors automatically. The abundance of agricultural data makes precise prediction of rice yield possible. Because the climate factors have an essential effect on the rice yield, we considered the climate factors in the prediction model. Accordingly, this paper proposes a machine learning model for rice yield prediction in Taiwan, including the genetic algorithm and support vector regression model. The dataset of this study includes the meteorological data from the Central Weather Bureau and rice yield of Taiwan from 2003 to 2019. The experimental results show the performance of the proposed model is nearly 30% better than MARS, RF, ANN, and SVR models. The most important climate factors affecting the rice yield are the total sunshine hours, the number of rainfall days, and the temperature.The proposed model also offers three advantages: (a) the proposed model can be used in different geographical regions with high prediction accuracies; (b) the proposed model has a high explanatory ability because it could select the important climate factors which affect rice yield; (c) the proposed model is more suitable for predicting rice yield because it provides higher reliability and stability for predicting. The proposed model can assist the government in making sustainable agricultural policies.

Predicting the Popularity of Post Articles with Virtual Temperature in Web Bulletin (웹게시판에서 가상온도를 이용한 게시글의 인기 예측)

  • Kim, Su-Do;Kim, So-Ra;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.10
    • /
    • pp.19-29
    • /
    • 2011
  • A Blog provides commentary, news, or content on a particular subject. The important part of many blogs is interactive format. Sometimes, there is a heated debate on a topic and any article becomes a political or sociological issue. In this paper, we proposed a method to predict the popularity of an article in advance. First, we used hit count as a factor to predict the popularity of an article. We defined the saturation point and derived a model to predict the hit count of the saturation point by a correlation coefficient of the early hit count and hit count of the saturation point. Finally, we predicted the virtual temperature of an article using 4 types(explosive, hot, warm, cold). We can predict the virtual temperature of Internet discussion articles using the hit count of the saturation point with more than 70% accuracy, exploiting only the first 30 minutes' hit count. In the hot, warm, and cold categories, we can predict more than 86% accuracy from 30 minutes' hit count and more than 90% accuracy from 70 minutes' hit count.