• Title/Summary/Keyword: Word-Prediction

Search Result 114, Processing Time 0.028 seconds

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

  • Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.49-65
    • /
    • 2014
  • Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.

A study on trends and predictions through analysis of linkage analysis based on big data between autonomous driving and spatial information (자율주행과 공간정보의 빅데이터 기반 연계성 분석을 통한 동향 및 예측에 관한 연구)

  • Cho, Kuk;Lee, Jong-Min;Kim, Jong Seo;Min, Guy Sik
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.2
    • /
    • pp.101-115
    • /
    • 2020
  • In this paper, big data analysis method was used to find out global trends in autonomous driving and to derive activate spatial information services. The applied big data was used in conjunction with news articles and patent document in order to analysis trend in news article and patents document data in spatial information. In this paper, big data was created and key words were extracted by using LDA (Latent Dirichlet Allocation) based on the topic model in major news on autonomous driving. In addition, Analysis of spatial information and connectivity, global technology trend analysis, and trend analysis and prediction in the spatial information field were conducted by using WordNet applied based on key words of patent information. This paper was proposed a big data analysis method for predicting a trend and future through the analysis of the connection between the autonomous driving field and spatial information. In future, as a global trend of spatial information in autonomous driving, platform alliances, business partnerships, mergers and acquisitions, joint venture establishment, standardization and technology development were derived through big data analysis.

A Study on the Extraction of Psychological Distance Embedded in Company's SNS Messages Using Machine Learning (머신 러닝을 활용한 회사 SNS 메시지에 내포된 심리적 거리 추출 연구)

  • Seongwon Lee;Jin Hyuk Kim
    • Information Systems Review
    • /
    • v.21 no.1
    • /
    • pp.23-38
    • /
    • 2019
  • The social network service (SNS) is one of the important marketing channels, so many companies actively exploit SNSs by posting SNS messages with appropriate content and style for their customers. In this paper, we focused on the psychological distances embedded in the SNS messages and developed a method to measure the psychological distance in SNS message by mixing a traditional content analysis, natural language processing (NLP), and machine learning. Through a traditional content analysis by human coding, the psychological distance was extracted from the SNS message, and these coding results were used for input data for NLP and machine learning. With NLP, word embedding was executed and Bag of Word was created. The Support Vector Machine, one of machine learning techniques was performed to train and test the psychological distance in SNS message. As a result, sensitivity and precision of SVM prediction were significantly low because of the extreme skewness of dataset. We improved the performance of SVM by balancing the ratio of data by upsampling technique and using data coded with the same value in first content analysis. All performance index was more than 70%, which showed that psychological distance can be measured well.

A Study on the Prediction of Referral Intension based on Customer Satisfaction in Construction Management (CM에서 고객만족도에 기반한 추천의향 예측에 관한 연구)

  • Jeong, Min;Lee, Ghang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.11 no.6
    • /
    • pp.100-110
    • /
    • 2010
  • The main roots of CM service contracts include existing customer repurchases and those made by new customers by existing ones. The study on customers and loyalty can be factors to strengthen CM's competitiveness. However, there have been little attempt to study customer satisfaction and customer loyalty. Construction Management (CM), the advanced construction management method, was introduced 15 years ago in the mid 1990's in the domestic market. The aim of this research is to build a model that can predict customer loyalty based on how much customers are satisfied with CM service. To measure customer satisfaction and loyalty, this research surveyed 135 decision-makers who have experienced CM services. Customer satisfaction was tested and analyzed according to different phases: planning, designing, procurement, construction, and post construction. Referral intention was tested based on NPS theory. Customer types were divided into detractors, passively satisfied and promoters according to the tested measurement and multinomial logistic regression between the satisfaction by construction phases and customer types. This research resulted to a model that can predict customer types: detractors, passively satisfied and promoters, which were determined according to satisfaction level. The initial planning phase also revealed which factor is most influential for a customer to become promoter. These results can be used to acquire customer loyalty by managing the satisfaction of customers through a project under an Internet-based environment. Such can provide the needed information in quickly exploring positive and negative word-of-mouth feedbacks.

An Artificial Neural Network Based Phrase Network Construction Method for Structuring Facility Error Types (설비 오류 유형 구조화를 위한 인공신경망 기반 구절 네트워크 구축 방법)

  • Roh, Younghoon;Choi, Eunyoung;Choi, Yerim
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.21-29
    • /
    • 2018
  • In the era of the 4-th industrial revolution, the concept of smart factory is emerging. There are efforts to predict the occurrences of facility errors which have negative effects on the utilization and productivity by using data analysis. Data composed of the situation of a facility error and the type of the error, called the facility error log, is required for the prediction. However, in many manufacturing companies, the types of facility error are not precisely defined and categorized. The worker who operates the facilities writes the type of facility error in the form with unstructured text based on his or her empirical judgement. That makes it impossible to analyze data. Therefore, this paper proposes a framework for constructing a phrase network to support the identification and classification of facility error types by using facility error logs written by operators. Specifically, phrase indicating the types are extracted from text data by using dictionary which classifies terms by their usage. Then, a phrase network is constructed by calculating the similarity between the extracted phrase. The performance of the proposed method was evaluated by using real-world facility error logs. It is expected that the proposed method will contribute to the accurate identification of error types and to the prediction of facility errors.

A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론)

  • Roh, Jeong-Min;Kim, Yongsung
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.52-62
    • /
    • 2021
  • Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

A Study on Development of a Hearing Impairment Simulator considering Frequency Selectivity and Asymmetrical Auditory Filter of the Hearing Impaired (난청인의 주파수 선택도와 비대칭적 청각 필터를 고려한 난청 시뮬레이터 개발에 관한 연구)

  • Joo, Sang-Ick;Kang, Hyun-Deok;Song, Young-Rok;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.4
    • /
    • pp.831-840
    • /
    • 2010
  • In this paper, we propose a hearing impairment simulator considering reduced frequency selectivity and asymmetrical auditory filter of the hearing impaired, and we verified the reduced frequency selectivity and asymmetrical auditory filter affected in speech perception through experiments. The reduced frequency selectivity has made embodied by spectral smearing using LPC(linear prediction coding). The shapes of auditory filter are asymmetrical different with each center frequency. Hearing impaired person which has hearing loss was differently changed with that of normal hearing people and it has different value for speech of quality through auditory filter. The experiments confirmed subjective test and objective test. The subjective experiments are composed of 4 kinds of tests: pure tone test, SRT(speech reception threshold) test, and WRS(word recognition score) test without spectral smearing, and WRS test with spectral smearing. The experiment of the hearing impairment simulator was performed from 9 subjects who have normal ears. The amount of spectral smearing was controlled by LPC order. The asymmetrical auditory filter of proposed hearing impairment simulator was simulated and then some tests to estimate the filter's performance objectively were performed. The objective experiment as simulated auditory filter's performance evaluation method used PESQ(perceptual evaluation of speech quality) and LLR(log likelihood ratio) for speech through auditory filter. The processed speech was evaluated objective speech quality and distortion using PESQ and LLR value. When hearing loss processed, PESQ and LLR value have big difference according to asymmetrical auditory filter in hearing impairment simulator.

Analysis of Adverse Drug Reaction Reports using Text Mining (텍스트마이닝을 이용한 약물유해반응 보고자료 분석)

  • Kim, Hyon Hee;Rhew, Kiyon
    • Korean Journal of Clinical Pharmacy
    • /
    • v.27 no.4
    • /
    • pp.221-227
    • /
    • 2017
  • Background: As personalized healthcare industry has attracted much attention, big data analysis of healthcare data is essential. Lots of healthcare data such as product labeling, biomedical literature and social media data are unstructured, extracting meaningful information from the unstructured text data are becoming important. In particular, text mining for adverse drug reactions (ADRs) reports is able to provide signal information to predict and detect adverse drug reactions. There has been no study on text analysis of expert opinion on Korea Adverse Event Reporting System (KAERS) databases in Korea. Methods: Expert opinion text of KAERS database provided by Korea Institute of Drug Safety & Risk Management (KIDS-KD) are analyzed. To understand the whole text, word frequency analysis are performed, and to look for important keywords from the text TF-IDF weight analysis are performed. Also, related keywords with the important keywords are presented by calculating correlation coefficient. Results: Among total 90,522 reports, 120 insulin ADR report and 858 tramadol ADR report were analyzed. The ADRs such as dizziness, headache, vomiting, dyspepsia, and shock were ranked in order in the insulin data, while the ADR symptoms such as vomiting, 어지러움, dizziness, dyspepsia and constipation were ranked in order in the tramadol data as the most frequently used keywords. Conclusion: Using text mining of the expert opinion in KIDS-KD, frequently mentioned ADRs and medications are easily recovered. Text mining in ADRs research is able to play an important role in detecting signal information and prediction of ADRs.

Development of a Model to Predict the Number of Visitors to Local Festivals Using Machine Learning (머신러닝을 활용한 지역축제 방문객 수 예측모형 개발)

  • Lee, In-Ji;Yoon, Hyun Shik
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.35-52
    • /
    • 2020
  • Purpose Local governments in each region actively hold local festivals for the purpose of promoting the region and revitalizing the local economy. Existing studies related to local festivals have been actively conducted in tourism and related academic fields. Empirical studies to understand the effects of latent variables on local festivals and studies to analyze the regional economic impacts of festivals occupy a large proportion. Despite of practical need, since few researches have been conducted to predict the number of visitors, one of the criteria for evaluating the performance of local festivals, this study developed a model for predicting the number of visitors through various observed variables using a machine learning algorithm and derived its implications. Design/methodology/approach For a total of 593 festivals held in 2018, 6 variables related to the region considering population size, administrative division, and accessibility, and 15 variables related to the festival such as the degree of publicity and word of mouth, invitation singer, weather and budget were set for the training data in machine learning algorithm. Since the number of visitors is a continuous numerical data, random forest, Adaboost, and linear regression that can perform regression analysis among the machine learning algorithms were used. Findings This study confirmed that a prediction of the number of visitors to local festivals is possible using a machine learning algorithm, and the possibility of using machine learning in research in the tourism and related academic fields, including the study of local festivals, was captured. From a practical point of view, the model developed in this study is used to predict the number of visitors to the festival to be held in the future, so that the festival can be evaluated in advance and the demand for related facilities, etc. can be utilized. In addition, the RReliefF rank result can be used. Considering this, it will be possible to improve the existing local festivals or refer to the planning of a new festival.

An Analysis for the Characteristics of Digital TVs in CES in the View of Technology Growth and Substitution Curves (기술 성장 및 대체 곡선 관점에서의 CES 출품 Digital TV의 특성 분석)

  • Kim, Do-Goan;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.05a
    • /
    • pp.96-98
    • /
    • 2013
  • Through reviewing the characteristics of digital TVs, which have emerged in CES since 2005, in the view of technology growth and substitution curves, this paper is to provide a prediction on the next generation's multi-media on smart environment. As a result, digital TV has been developed on the flow of its technology growth curve from the early version in 2005 to smart digital TV in 2013, which emphasizes the key word "connected", and it has already come to the market puberty. Also, as it has the characteristics such as supporting multi functional and multi media environments and introducing curved or flexible display, the digital TV in CES 2013 has reached in introductory stage on the technology substitution curve.

  • PDF