• Title/Summary/Keyword: big data mining

Search Result 679, Processing Time 0.025 seconds

Analysis of Research Trends of 'Word of Mouth (WoM)' through Main Path and Word Co-occurrence Network (주경로 분석과 연관어 네트워크 분석을 통한 '구전(WoM)' 관련 연구동향 분석)

  • Shin, Hyunbo;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.179-200
    • /
    • 2019
  • Word-of-mouth (WoM) is defined by consumer activities that share information concerning consumption. WoM activities have long been recognized as important in corporate marketing processes and have received much attention, especially in the marketing field. Recently, according to the development of the Internet, the way in which people exchange information in online news and online communities has been expanded, and WoM is diversified in terms of word of mouth, score, rating, and liking. Social media makes online users easy access to information and online WoM is considered a key source of information. Although various studies on WoM have been preceded by this phenomenon, there is no meta-analysis study that comprehensively analyzes them. This study proposed a method to extract major researches by applying text mining techniques and to grasp the main issues of researches in order to find the trend of WoM research using scholarly big data. To this end, a total of 4389 documents were collected by the keyword 'Word-of-mouth' from 1941 to 2018 in Scopus (www.scopus.com), a citation database, and the data were refined through preprocessing such as English morphological analysis, stopwords removal, and noun extraction. To carry out this study, we adopted main path analysis (MPA) and word co-occurrence network analysis. MPA detects key researches and is used to track the development trajectory of academic field, and presents the research trend from a macro perspective. For this, we constructed a citation network based on the collected data. The node means a document and the link means a citation relation in citation network. We then detected the key-route main path by applying SPC (Search Path Count) weights. As a result, the main path composed of 30 documents extracted from a citation network. The main path was able to confirm the change of the academic area which was developing along with the change of the times reflecting the industrial change such as various industrial groups. The results of MPA revealed that WoM research was distinguished by five periods: (1) establishment of aspects and critical elements of WoM, (2) relationship analysis between WoM variables, (3) beginning of researches of online WoM, (4) relationship analysis between WoM and purchase, and (5) broadening of topics. It was found that changes within the industry was reflected in the results such as online development and social media. Very recent studies showed that the topics and approaches related WoM were being diversified to circumstantial changes. However, the results showed that even though WoM was used in diverse fields, the main stream of the researches of WoM from the start to the end, was related to marketing and figuring out the influential factors that proliferate WoM. By applying word co-occurrence network analysis, the research trend is presented from a microscopic point of view. Word co-occurrence network was constructed to analyze the relationship between keywords and social network analysis (SNA) was utilized. We divided the data into three periods to investigate the periodic changes and trends in discussion of WoM. SNA showed that Period 1 (1941~2008) consisted of clusters regarding relationship, source, and consumers. Period 2 (2009~2013) contained clusters of satisfaction, community, social networks, review, and internet. Clusters of period 3 (2014~2018) involved satisfaction, medium, review, and interview. The periodic changes of clusters showed transition from offline to online WoM. Media of WoM have become an important factor in spreading the words. This study conducted a quantitative meta-analysis based on scholarly big data regarding WoM. The main contribution of this study is that it provides a micro perspective on the research trend of WoM as well as the macro perspective. The limitation of this study is that the citation network constructed in this study is a network based on the direct citation relation of the collected documents for MPA.

Research on the Movie Reviews Regarded as Unsuccessful in Box Office Outcomes in Korea: Based on Big Data Posted on Naver Movie Portal

  • Jeon, Ho-Seong
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.3
    • /
    • pp.51-69
    • /
    • 2021
  • Purpose - Based on literature studies of movie reviews and movie ratings, this study raised two research questions on the contents of online word of mouth and the number of movie screens as mediator variables. Research question 1 wanted to figure out which topics of word groups had a positive or negative impact on movie ratings. Research question 2 tried to identify the role of the number of movie screens between movie ratings and box office outcomes. Design/methodology/approach - Through R program, this study collected about 82,000 movie reviews and movie ratings posted on Naver's movie website to examine the role of online word of mouths and movie screen counts in 10 movies that were considered commercially unsuccessful with fewer than 2 million viewers despite securing about 1,000 movie screens. To confirm research question 1, topic modeling, a text mining technique, was conducted on movie reviews. In addition, this study linked the movie ratings posted on Naver with information of KOBIS by date, to identify the research question 2. Findings - Through topic modeling, 5 topics were identified. Topics found in this study were largely organized into two groups, the content of the movie (topic 1, 2, 3) and the evaluation of the movie (topics 4, 5). When analyzing the relationship between movie reviews and movie ratings with 5 mediators identified in topic modeling to probe research question 1, the topic word groups related to topic 2, 3 and 5 appeared having a negative effect on the netizen's movie ratings. In addition, by connecting two secondary data by date, analysis for research question 2 was implemented. The outcomes showed that the causal relationship between movie ratings and audience numbers was mediated by the number of movie screens. Research implications or Originality - The results suggested that the information presented in text format was harder to quantify than the information provided in scores, but if content information could be digitalized through text mining techniques, it could become variable and be analyzed to identify causality with other variables. The outcomes in research question 2 showed that movie ratings had a direct impact on the number of viewers, but also had indirect effects through changes in the number of movie screens. An interesting point is that the direct effect of movie ratings on the number of viewers is found in most American films released in Korea.

Outdoor Healing Places Perception Analysis Using Named Entity Recognition of Social Media Big Data (소셜미디어 빅데이터의 개체명 인식을 활용한 옥외 힐링 장소 인식 분석)

  • Sung, Junghan;Lee, Kyungjin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.50 no.5
    • /
    • pp.90-102
    • /
    • 2022
  • In recent years, as interest in healing increases, outdoor spaces with the concept of healing have been created. For more professional and in-depth planning and design, the perception and characteristics of outdoor healing places through social media posts were analyzed using NER. Text mining was conducted using 88,155 blog posts, and frequency analysis and clique cohesion analysis were conducted. Six elements were derived through a literature review, and two elements were added to analyze the perception and the characteristics of healing places. As a result, visitors considered place elements, date and time, social elements, and activity elements more important than personnel, psychological elements, plants and color, and form and shape when visiting healing places. The analysis allowed the derivation of perceptions and characteristics of healing places through keywords. From the results of the Clique, keywords, such as places, date and time, and relationship, were clustered, so it was possible to know where, when, what time, and with whom people were visiting places for healing. Through the study, the perception and characteristics of healing places were derived by analyzing large-scale data written by visitors. It was confirmed that specific elements could be used in planning and marketing.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

A Case Study of e-Business Implementation in Part Manufacturing Industry(B2B in PCB Industry) (부품 제조 산업에서의 e-Business 구축 사례(PCB 산업의 B2B))

  • Bae, Joon-Soo;Bae, Eun-Hae;Cheong, Min-Chang;Shin, In-Ki;Park, Young-Chul
    • IE interfaces
    • /
    • v.13 no.3
    • /
    • pp.503-511
    • /
    • 2000
  • The main theme of this research is a case of e-Business implementation in part manufacturing industry, especially in a PCB manufacturing company. The characteristics of part manufacturing industry are as follows. First, an ERP system runs as a legacy system that is ready to be combined with e-Business system. Secondly, the number of customers is very small. The customers are not many individuals but only a few big electronic enterprises that are strategically affiliated with the part manufacturing company. This means that the e-Business of the part manufacturing industry needs to focus on sharing pertinent information throughout the transactions with the customers, not on data-warehousing or data-mining customers' potential needs or requests. In this paper, we extracted e-Business opportunity domains from a PCB manufacturing company, a typical part manufacturing industry. We are intended to enhance information sharing between customers and the company, and provide functions of transactions necessary in the whole value chain from order to shipment. Implementing the e-Business system on the Web can increase the visibility of customers, and further, the company can be transformed into an extended enterprise where the relationship with the customers becomes very close and interleaved. Also, the Cyber Office functionality of the e-Business system can support the salespersons effectively, so that they can spend more time on customer satisfaction. Such efforts, in the future, can be a basis for active adaptation to the industry transformations such as forming e-community and participating in the marketplace.

  • PDF

A Research on stock price prediction based on Deep Learning and Economic Indicators (거시지표와 딥러닝 알고리즘을 이용한 자동화된 주식 매매 연구)

  • Hong, Sunghyuck
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.267-272
    • /
    • 2020
  • Macroeconomics are one of the indicators that are preceded and analyzed when analyzing stocks because it shows the movement of a country's economy as a whole. The overall economic situation at the national level, such as national income, inflation, unemployment, exchange rates, currency, interest rates, and balance of payments, has a great affect on the stock market, and economic indicators are actually correlated with stock prices. It is the main source of data for analysts to watch with interest and to determine buy and sell considering the impact on individual stock prices. Therefore, economic indicators that impact on the stock price are analyzed as leading indicators, and the stock price prediction is predicted through deep learning-based prediction, after that the actual stock price is compared. If you decide to buy or sell stocks by analysis of stock prediction, then stocks can be investments, not gambling. Therefore, this research was conducted to enable automated stock trading by using macro-indicators and deep learning algorithms in artificial intelligence.

Active Senior Contents Trend Analysis using LDA Topic Modeling (LDA 토픽 모델링을 이용한 액티브 시니어 콘텐츠 트렌드 분석)

  • Lee, Dongwoo;Kim, Yoosin;Shin, Eunjung
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.35-45
    • /
    • 2021
  • The purpose of this study is to understand the characteristics and trends of active senior. As the baby boom generation become the age of the elderly, they are more active than senior. These seniors are called active seniors, a new consumer group. Many countries and companies are also interested in providing relevant policies and services, but there is lack of researches on active senior trends. This study collects the 8,740 posts related to active seniors on social media from January 1st, 2018 to June 31st, 2021, and conducted keyword frequency analysis, TF-IDF analysis and LDA topic modeling. Through LDA topic modeling, topics are classified into 10 categories: lifestyle, benefits, shopping, government business, government education, health, society and economy, care industry, silver housing, leisure. The results of this study can be utilized as fundamental data to help understand the academic and industrial aspects of active senior.

Sensitivity of abacus and Chasdaq in the Chinese stock market through analysis of Weibo sentiment related to Corona-19 (코로나-19관련 웨이보 정서 분석을 통한 중국 주식시장의 주판 및 차스닥의 민감도 예측 기법)

  • Li, Jiaqi;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Investor mood from social media is gaining increasing attention for leading a price movement in stock market. Based on the behavioral finance theory, this study argues that sentiment extracted from social media using big data technique can predict a real-time (short-run) price momentum in Chinese stock market. Collecting Sina Weibo posts that related to COVID-19 using keyword method, a daily influential weighted sentiment factors is extracted from the sizable raw data of over 2 millions of posts. We examine one supervised and 4 unsupervised sentiment analysis model, and use the best performed word-frequency and BiLSTM mdoel. The test result shows a similar movement between stock price change and sentiment factor. It indicates that public mood extracted from social media can in some extent represent the investors' sentiment and make a difference in stock market fluctuation when people are concentrating on a special events that can cause effect on the stock market.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

The Correlation between Social Media and the Behaviors of the Supreme Court in Korea (소셜미디어와 대법원 판결의 상관 관계에 대한 분석)

  • Heo, Junhong;Seo, Yeeun;Lee, Seoyeong;Lee, Sang-Yong Tom
    • Knowledge Management Research
    • /
    • v.22 no.3
    • /
    • pp.31-53
    • /
    • 2021
  • As a communication channel for individuals, social media is affecting various areas such as business, economy, politics, and society. One of the less-studied areas is the law. Therefore, this study collected various information from social media and analyzed its impacts on the legal decisions, especially the Supreme Court decisions in Korea. This study was conducted by compiling information from Internet news articles and public responses. We found that when the negative reactions from the public got higher, the trial duration until the supreme court making the final decisions became shorter. However, we were not able to find the significant relationship between social media reactions and dismissal of appeal nor annulment. Our study would contribute to the information systems and knowledge management research in a sense that the social analytics is applied to the area of legal decisions, instead of using conventional qualitative study methodology. Our study is also meaningful to the practitioners because that big data analytical business can be applied to the field of law by creating a new database for the emerging legal technology. Finally, law makers can think of a better way to standardize the legal decision process to minimize the reverse effects from social media.