• Title/Summary/Keyword: Text-mining Analysis

Search Result 1,221, Processing Time 0.031 seconds

A Comparative Analysis of OTT Service Reviews Before and After the Onset of the Pandemic Using Text Mining Technique: Focusing on the Emotion-Focused Coping and Nostalgia (텍스트 마이닝을 활용한 코로나 19 전후 온라인 동영상 서비스(OTT) 리뷰 비교분석 연구 - 정서 중심 대처와 노스탤지어를 중심으로)

  • Ko, Minjeong;Lee, Sangwon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.375-388
    • /
    • 2021
  • This study aims to contribute to the understanding of consumer behavior during the COVID-19 by comparing blog reviews of an over-the-top (OTT) online video service from before and during the pandemic. We anticipate that the COVID-19 outbreak prompts the use of the OTT service as part of an emotion-focused coping strategy derived from the loss of personal control and the subsequent avoidance motivation. We also posit that a strong yearning for life before COVID-19 will increase interest in the content that fulfills a need for nostalgia. Our analysis of Netflix reviews provides empirical evidence of the effects of an emotion-focused coping strategy and nostalgia on OTT service usage. First, the titles of the reviews posted during COVID-19 indicate that consumers were less likely to mention OTT services other than Netflix, more interested in domestic content, and used OTT services as an avoidance-denial strategy. Second, the blog content demonstrates that while pre-COVID reviews tend to focus on the practical benefits of OTT services, those posted during the pandemic focus on mood, emotions, and dialogue. In addition, interest in comedy and romance genres increased during COVID-19. Third, we identified a greater preference for realistic or everyday content that depicted the pre-pandemic era. This is the first empirical study to investigate the effects of COVID-19 on video streaming usage in Korea. In addition, this research contributes to the field of marketing by expanding our understanding of online video service users during COVID-19 and identifies practical implications for OTT services in the midst of a pandemic.

Development of Scaffolding Strategies Model by Information Search Process (ISP) (정보탐색과정(ISP)에 의한 스캐폴딩 전략 모형 개발)

  • Jeong-Hoon Lim
    • Journal of Korean Library and Information Science Society
    • /
    • v.54 no.1
    • /
    • pp.143-165
    • /
    • 2023
  • This study aims to propose a scaffolding strategy that can be applied to the information search process by using Kuhlthau's ISP model, which presented a design and implementation strategy for the mediation role in the learning process. To this end, the relevant literature was reviewed to categorize scaffolding strategies, and impressions were collected from the students surveys after providing 150 middle school students in the Daejeon area with the project class to which the scaffolding strategy based on the ISP model was applied. The collected data were processed into a form suitable for analysis through data preprocessing for word frequencies to be extracted, and topic analysis was performed using STM (Structural Topic Modeling). First, after determining the optimal number of topics and extracting topics for each stage of the ISP model, the extracted topics were classified into three types: cognitive domain-macro perspective, cognitive domain-micro perspective, and emotional domain perspective. In this process, we focused on cognitive verbs and emotional verbs among words extracted through text mining, and presented a scaffolding strategy model related to each topic by reviewing representative document cases. Based on the results of this study, if an appropriate scaffolding strategy is provided at the ISP model stage, a positive effect on learners' self-directed task solving can be expected.

Study on Research Trends (2001~2020) of the Baekdudaegan Mountains with Big Data Analyses of Academic Journals (학술논문 빅데이터 분석을 활용한 백두대간에 관한 연구동향(2001~2020) 분석)

  • Lee, Jinkyu;Sim, Hyung Seok;Lee, Chang-Bae
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.1
    • /
    • pp.36-49
    • /
    • 2022
  • The purpose of this study was to analyze domestic research trends related to the Baekdudaegan Mountains in the last two decades. In total, 551 academic papers and keyword data related to the Baekdudaegan Mountains were collected using the "Research and Information Service Section" and analyzed using "big data" analysis programs, such as Textom and UCINET. Papers related to the Baekdudaegan Mountains were published in 177 academic journals, and 229 papers (41.6% of all published papers) were published between 2011 and 2015. According to word frequency data (N-gram analyses), the major research topic over the past 20 years was "species diversity." According to CONCOR analysis results, the main research could be divided into 15 areas, the most important of which was "species diversity," followed by "vegetation restoration and management," and "culture." Ecological research comprised 12 groups with a frequency of 78.8%; humanities and social research comprised 2 groups with a frequency of 15.6%. Overall, our study of research areas and quantitative data analyses provides valuable information that could help establish policy formulation.

A Study on Tourism Behavior in the New normal Era Using Big Data (빅데이터를 활용한 뉴노멀(New normal)시대의 관광행태 변화에 관한 연구)

  • Kyoung-mi Yoo;Jong-cheon Kang;Youn-hee Choi
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.167-181
    • /
    • 2023
  • This study utilized TEXTOM, a social network analysis program to analyze changes in current tourism behavior after travel restrictions were eased after the outbreak of COVID-19. Data on the keywords 'domestic travel' and 'overseas travel' were collected from blogs, cafes, and news provided by Naver, Google, and Daum. The collection period was set from April to December 2022 when social distancing was lifted, and 2019 and 2020 were each set as one year and compared and analyzed with 2022. A total of 80 key words were extracted through text mining and centrality analysis was performed using NetDraw. Finally, through the CONCOR, the correlated keywords were clustered into 4. As a result of the study, tourism behavior in 2022 shows tourism recovery before the outbreak of COVID-19, segmentation of travel based on each person's preferred theme, prioritization of each country's corona mitigation policy, and then selecting a tourist destination. It is expected to provide basic data for the development of tourism marketing strategies and tourism products for the newly emerging tourism ecosystem after COVID-19.

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news (온라인 언급이 기업 성과에 미치는 영향 분석 : 뉴스 감성분석을 통한 기업별 주가 예측)

  • Jeong, Ji Seon;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.37-51
    • /
    • 2015
  • Due to the development of internet technology and the rapid increase of internet data, various studies are actively conducted on how to use and analyze internet data for various purposes. In particular, in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of the current application of structured data. Especially, there are various studies on sentimental analysis to score opinions based on the distribution of polarity such as positivity or negativity of vocabularies or sentences of the texts in documents. As a part of such studies, this study tries to predict ups and downs of stock prices of companies by performing sentimental analysis on news contexts of the particular companies in the Internet. A variety of news on companies is produced online by different economic agents, and it is diffused quickly and accessed easily in the Internet. So, based on inefficient market hypothesis, we can expect that news information of an individual company can be used to predict the fluctuations of stock prices of the company if we apply proper data analysis techniques. However, as the areas of corporate management activity are different, an analysis considering characteristics of each company is required in the analysis of text data based on machine-learning. In addition, since the news including positive or negative information on certain companies have various impacts on other companies or industry fields, an analysis for the prediction of the stock price of each company is necessary. Therefore, this study attempted to predict changes in the stock prices of the individual companies that applied a sentimental analysis of the online news data. Accordingly, this study chose top company in KOSPI 200 as the subjects of the analysis, and collected and analyzed online news data by each company produced for two years on a representative domestic search portal service, Naver. In addition, considering the differences in the meanings of vocabularies for each of the certain economic subjects, it aims to improve performance by building up a lexicon for each individual company and applying that to an analysis. As a result of the analysis, the accuracy of the prediction by each company are different, and the prediction accurate rate turned out to be 56% on average. Comparing the accuracy of the prediction of stock prices on industry sectors, 'energy/chemical', 'consumer goods for living' and 'consumer discretionary' showed a relatively higher accuracy of the prediction of stock prices than other industries, while it was found that the sectors such as 'information technology' and 'shipbuilding/transportation' industry had lower accuracy of prediction. The number of the representative companies in each industry collected was five each, so it is somewhat difficult to generalize, but it could be confirmed that there was a difference in the accuracy of the prediction of stock prices depending on industry sectors. In addition, at the individual company level, the companies such as 'Kangwon Land', 'KT & G' and 'SK Innovation' showed a relatively higher prediction accuracy as compared to other companies, while it showed that the companies such as 'Young Poong', 'LG', 'Samsung Life Insurance', and 'Doosan' had a low prediction accuracy of less than 50%. In this paper, we performed an analysis of the share price performance relative to the prediction of individual companies through the vocabulary of pre-built company to take advantage of the online news information. In this paper, we aim to improve performance of the stock prices prediction, applying online news information, through the stock price prediction of individual companies. Based on this, in the future, it will be possible to find ways to increase the stock price prediction accuracy by complementing the problem of unnecessary words that are added to the sentiment dictionary.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

Comparative Analysis of Korean and Japanese Textbooks on World Geography: Focused on the Contents of Global Education (한.일 고등학교 세계지리 교과서 내용 비교 분석 -국제이해교육의 관련 내용을 중심으로-)

  • Yang, Won-Taek
    • Journal of the Korean association of regional geographers
    • /
    • v.2 no.2
    • /
    • pp.75-92
    • /
    • 1996
  • Geography education is one of the best ways to improve the understanding of other countries. By analyzing Korean and Japanese textbooks on world geography, I tried to find out how well they explain the other country and to set forth guiding principles for geography education. To achieve these aims, weight analysis are used. The major findings in this study can be summarised as follow. The contents of Korean and Japanese geography textbooks were analyzed deviding into 2 major topics, 6 minor topics, and 20 key concepts. (1) By analyzing Korean geography textbook of the 5th curriculum the weight percentages which had been given to each minor topics were found. They are as follow: resource problem(57.7%), human right problem(21.4%), population problem (9.0%), mutual dependence(6.0%), environmental problem(3.3%), international competition(2.6%). (2) By analyzing Korean geography text-book of the 6th curriculum the weight percentages which had been give to each minor topics were found. They are as follow: resource problem(42.7%), human right problem(21.7%), mutual dependence (20.9%), environmental problem(7.7%), population problem(4.6%), international competition(2.4%) (3) By analyzing Japanise geography text-book of 5th curriculum ammendment the weight percentages which had been give to each minor topics were found. They are as follows: resource problem(49.9%) human right problem(21.7%), mutual dependence(15.5%), population problem (7.1%), international competition(6.2%), environmental problem(3.8%) (4) By analyzing Japanise geography textbook of 6th curriculum ammendment the weight percentages which had been give to each minor topics were found. They are as follows human right problem (31.6%), mutual dependence(22.8%), resource problem(20.7%), population problem(12.7%), environmental problem(8.6%), international competition(3.6%). We can see that in the field of dependence Korea and Japan put the similar weight but in the field of common problem they put the fairly different weight. It can be viewed as the difference of curriculum. That is to say Korea used both the systematic method on the basis of unit but Japan used only topical method on the basis of unit. Therefore Korean geography textbook introduce agriculture, forestry, fishery, mining industry and manufacturing industry. Japanese textbook, however gives a detailed account about residents' lives in specific area. For that reason in Korean textbook, resource was stressed, while in Japanese textbook, culture was stressed.

  • PDF

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.