• Title/Summary/Keyword: 지능 시스템

Search Result 12,320, Processing Time 0.038 seconds

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Personal Information Overload and User Resistance in the Big Data Age (빅데이터 시대의 개인정보 과잉이 사용자 저항에 미치는 영향)

  • Lee, Hwansoo;Lim, Dongwon;Zo, Hangjung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.125-139
    • /
    • 2013
  • Big data refers to the data that cannot be processes with conventional contemporary data technologies. As smart devices and social network services produces vast amount of data, big data attracts much attention from researchers. There are strong demands form governments and industries for bib data as it can create new values by drawing business insights from data. Since various new technologies to process big data introduced, academic communities also show much interest to the big data domain. A notable advance related to the big data technology has been in various fields. Big data technology makes it possible to access, collect, and save individual's personal data. These technologies enable the analysis of huge amounts of data with lower cost and less time, which is impossible to achieve with traditional methods. It even detects personal information that people do not want to open. Therefore, people using information technology such as the Internet or online services have some level of privacy concerns, and such feelings can hinder continued use of information systems. For example, SNS offers various benefits, but users are sometimes highly exposed to privacy intrusions because they write too much personal information on it. Even though users post their personal information on the Internet by themselves, the data sometimes is not under control of the users. Once the private data is posed on the Internet, it can be transferred to anywhere by a few clicks, and can be abused to create fake identity. In this way, privacy intrusion happens. This study aims to investigate how perceived personal information overload in SNS affects user's risk perception and information privacy concerns. Also, it examines the relationship between the concerns and user resistance behavior. A survey approach and structural equation modeling method are employed for data collection and analysis. This study contributes meaningful insights for academic researchers and policy makers who are planning to develop guidelines for privacy protection. The study shows that information overload on the social network services can bring the significant increase of users' perceived level of privacy risks. In turn, the perceived privacy risks leads to the increased level of privacy concerns. IF privacy concerns increase, it can affect users to from a negative or resistant attitude toward system use. The resistance attitude may lead users to discontinue the use of social network services. Furthermore, information overload is mediated by perceived risks to affect privacy concerns rather than has direct influence on perceived risk. It implies that resistance to the system use can be diminished by reducing perceived risks of users. Given that users' resistant behavior become salient when they have high privacy concerns, the measures to alleviate users' privacy concerns should be conceived. This study makes academic contribution of integrating traditional information overload theory and user resistance theory to investigate perceived privacy concerns in current IS contexts. There is little big data research which examined the technology with empirical and behavioral approach, as the research topic has just emerged. It also makes practical contributions. Information overload connects to the increased level of perceived privacy risks, and discontinued use of the information system. To keep users from departing the system, organizations should develop a system in which private data is controlled and managed with ease. This study suggests that actions to lower the level of perceived risks and privacy concerns should be taken for information systems continuance.

Impact of impulsiveness on mobile banking usage: Moderating effect of credit card use and mediating effect of SNS addiction (충동성이 모바일뱅킹 사용률에 미치는 영향: 신용카드 사용 여부의 조절효과와 SNS 중독의 매개효과)

  • Lee, Youmi;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.113-137
    • /
    • 2021
  • According to the clear potential of mobile banking growth, many studies related to this are being conducted, but in Korea, it is concentrated on the analysis of technical factors or consumers' intentions, behaviors, and satisfaction. In addition, even though it has a strong customer base of 20s, there are few studies that have been conducted specifically for this customer group. In order for mobile banking to take a leap forward, a strategy to secure various perspectives is needed not only through research on itself but also through research on external factors affecting mobile banking. Therefore, this study analyzes impulsiveness, credit card use, and SNS addiction among various external factors that can significantly affect mobile banking in their 20s. This study examines whether the relationship between impulsiveness and mobile banking usage depends on whether or not a credit card is used, and checks whether a customer's impulsiveness is possible by examining whether a credit card is used. Based on this, it is possible to establish new standards for classification of marketing target groups of mobile banking. After finding out the static or unsuitable relationship between whether to use a credit card and impulsiveness, we want to indirectly predict the customer's impulsiveness through whether to use a credit card or not to use a credit card. It also verifies the mediating effect of SNS addiction in the relationship between impulsiveness and mobile banking usage. For this analysis, the collected data were conducted according to research problems using the SPSS Statistics 25 program. The findings are as follows. First, positive urgency has been shown to have a significant static effect on mobile banking usage. Second, whether to use credit cards has shown moderating effects in the relationship between fraudulent urgency and mobile banking usage. Third, it has been shown that all subfactors of impulsiveness have significant static relationships with subfactors of SNS addiction. Fourth, it has been confirmed that the relationship between positive urgency, SNS addiction, and mobile banking usage has total effect and direct effect. The first result means that mobile banking usage may be high if positive urgency is measured relatively high, even if the multi-dimensional impulsiveness scale is low. The second result indicates that mobile banking usage rates were not affected by the independent variable, negative urgency, but were found to have a significant static relationship with negative urgency when using credit cards. The third result means that SNS is likely to become addictive if lack of premeditation or lack of perseverance is high because it provides instant enjoyment and satisfaction as a mobile-based service. This also means that SNS can be used as an avoidance space for those with negative urgency, and as an emotional expression space for those with high positive urgency.

A Study on Enhancing Personalization Recommendation Service Performance with CNN-based Review Helpfulness Score Prediction (CNN 기반 리뷰 유용성 점수 예측을 통한 개인화 추천 서비스 성능 향상에 관한 연구)

  • Li, Qinglong;Lee, Byunghyun;Li, Xinzhe;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.29-56
    • /
    • 2021
  • Recently, various types of products have been launched with the rapid growth of the e-commerce market. As a result, many users face information overload problems, which is time-consuming in the purchasing decision-making process. Therefore, the importance of a personalized recommendation service that can provide customized products and services to users is emerging. For example, global companies such as Netflix, Amazon, and Google have introduced personalized recommendation services to support users' purchasing decisions. Accordingly, the user's information search cost can reduce which can positively affect the company's sales increase. The existing personalized recommendation service research applied Collaborative Filtering (CF) technique predicts user preference mainly use quantified information. However, the recommendation performance may have decreased if only use quantitative information. To improve the problems of such existing studies, many studies using reviews to enhance recommendation performance. However, reviews contain factors that hinder purchasing decisions, such as advertising content, false comments, meaningless or irrelevant content. When providing recommendation service uses a review that includes these factors can lead to decrease recommendation performance. Therefore, we proposed a novel recommendation methodology through CNN-based review usefulness score prediction to improve these problems. The results show that the proposed methodology has better prediction performance than the recommendation method considering all existing preference ratings. In addition, the results suggest that can enhance the performance of traditional CF when the information on review usefulness reflects in the personalized recommendation service.

Simulation and Post-representation: a study of Algorithmic Art (시뮬라시옹과 포스트-재현 - 알고리즘 아트를 중심으로)

  • Lee, Soojin
    • 기호학연구
    • /
    • no.56
    • /
    • pp.45-70
    • /
    • 2018
  • Criticism of the postmodern philosophy of the system of representation, which has continued since the Renaissance, is based on a critique of the dichotomy that separates the subjects and objects and the environment from the human being. Interactivity, highlighted in a series of works emerging as postmodern trends in the 1960s, was transmitted to an interactive aspect of digital art in the late 1990s. The key feature of digital art is the possibility of infinite variations reflecting unpredictable changes based on public participation on the spot. In this process, the importance of computer programs is highlighted. Instead of using the existing program as it is, more and more artists are creating and programming their own algorithms or creating unique algorithms through collaborations with programmers. We live in an era of paradigm shift in which programming itself must be considered as a creative act. Simulation technology and VR technology draw attention as a technique to represent the meaning of reality. Simulation technology helps artists create experimental works. In fact, Baudrillard's concept of Simulation defines the other reality that has nothing to do with our reality, rather than a reality that is extremely representative of our reality. His book Simulacra and Simulation refers to the existence of a reality entirely different from the traditional concept of reality. His argument does not concern the problems of right and wrong. There is no metaphysical meaning. Applying the concept of simulation to algorithmic art, the artist models the complex attributes of reality in the digital system. And it aims to build and integrate internal laws that structure and activate the world (specific or individual), that is to say, simulate the world. If the images of the traditional order correspond to the reproduction of the real world, the synthesized images of algorithmic art and simulated space-time are the forms of art that facilitate the experience. The moment of seeing and listening to the work of Ian Cheng presented in this article is a moment of personal experience and the perception is made at that time. It is not a complete and closed process, but a continuous and changing process. It is this active and situational awareness that is required to the audience for the comprehension of post-representation's forms.

A Study on the Strategy of IoT Industry Development in the 4th Industrial Revolution: Focusing on the direction of business model innovation (4차 산업혁명 시대의 사물인터넷 산업 발전전략에 관한 연구: 기업측면의 비즈니스 모델혁신 방향을 중심으로)

  • Joeng, Min Eui;Yu, Song-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.57-75
    • /
    • 2019
  • In this paper, we conducted a study focusing on the innovation direction of the documentary model on the Internet of Things industry, which is the most actively industrialized among the core technologies of the 4th Industrial Revolution. Policy, economic, social, and technical issues were derived using PEST analysis for global trend analysis. It also presented future prospects for the Internet of Things industry of ICT-related global research institutes such as Gartner and International Data Corporation. Global research institutes predicted that competition in network technologies will be an issue for industrial Internet (IIoST) and IoT (Internet of Things) based on infrastructure and platforms. As a result of the PEST analysis, developed countries are pushing policies to respond to the fourth industrial revolution through cooperation of private (business/ research institutes) led by the government. It was also in the process of expanding related R&D budgets and establishing related policies in South Korea. On the economic side, the growth tax of the related industries (based on the aggregate value of the market) and the performance of the entity were reviewed. The growth of industries related to the fourth industrial revolution in advanced countries overseas was found to be faster than other industries, while in Korea, the growth of the "technical hardware and equipment" and "communication service" sectors was relatively low among industries related to the fourth industrial revolution. On the social side, it is expected to cause enormous ripple effects across society, largely due to changes in technology and industrial structure, changes in employment structure, changes in job volume, etc. On the technical side, changes were taking place in each industry, representing the health and medical sectors and manufacturing sectors, which were rapidly changing as they merged with the technology of the Fourth Industrial Revolution. In this paper, various management methodologies for innovation of existing business model were reviewed to cope with rapidly changing industrial environment due to the fourth industrial revolution. In addition, four criteria were established to select a management model to cope with the new business environment: 'Applicability', 'Agility', 'Diversity' and 'Connectivity'. The expert survey results in an AHP analysis showing that Business Model Canvas is best suited for business model innovation methodology. The results showed very high importance, 42.5 percent in terms of "Applicability", 48.1 percent in terms of "Agility", 47.6 percent in terms of "diversity" and 42.9 percent in terms of "connectivity." Thus, it was selected as a model that could be diversely applied according to the industrial ecology and paradigm shift. Business Model Canvas is a relatively recent management strategy that identifies the value of a business model through a nine-block approach as a methodology for business model innovation. It identifies the value of a business model through nine block approaches and covers the four key areas of business: customer, order, infrastructure, and business feasibility analysis. In the paper, the expansion and application direction of the nine blocks were presented from the perspective of the IoT company (ICT). In conclusion, the discussion of which Business Model Canvas models will be applied in the ICT convergence industry is described. Based on the nine blocks, if appropriate applications are carried out to suit the characteristics of the target company, various applications are possible, such as integration and removal of five blocks, seven blocks and so on, and segmentation of blocks that fit the characteristics. Future research needs to develop customized business innovation methodologies for Internet of Things companies, or those that are performing Internet-based services. In addition, in this study, the Business Model Canvas model was derived from expert opinion as a useful tool for innovation. For the expansion and demonstration of the research, a study on the usability of presenting detailed implementation strategies, such as various model application cases and application models for actual companies, is needed.

Prediction of a hit drama with a pattern analysis on early viewing ratings (초기 시청시간 패턴 분석을 통한 대흥행 드라마 예측)

  • Nam, Kihwan;Seong, Nohyoon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.33-49
    • /
    • 2018
  • The impact of TV Drama success on TV Rating and the channel promotion effectiveness is very high. The cultural and business impact has been also demonstrated through the Korean Wave. Therefore, the early prediction of the blockbuster success of TV Drama is very important from the strategic perspective of the media industry. Previous studies have tried to predict the audience ratings and success of drama based on various methods. However, most of the studies have made simple predictions using intuitive methods such as the main actor and time zone. These studies have limitations in predicting. In this study, we propose a model for predicting the popularity of drama by analyzing the customer's viewing pattern based on various theories. This is not only a theoretical contribution but also has a contribution from the practical point of view that can be used in actual broadcasting companies. In this study, we collected data of 280 TV mini-series dramas, broadcasted over the terrestrial channels for 10 years from 2003 to 2012. From the data, we selected the most highly ranked and the least highly ranked 45 TV drama and analyzed the viewing patterns of them by 11-step. The various assumptions and conditions for modeling are based on existing studies, or by the opinions of actual broadcasters and by data mining techniques. Then, we developed a prediction model by measuring the viewing-time distance (difference) using Euclidean and Correlation method, which is termed in our study similarity (the sum of distance). Through the similarity measure, we predicted the success of dramas from the viewer's initial viewing-time pattern distribution using 1~5 episodes. In order to confirm that the model is shaken according to the measurement method, various distance measurement methods were applied and the model was checked for its dryness. And when the model was established, we could make a more predictive model using a grid search. Furthermore, we classified the viewers who had watched TV drama more than 70% of the total airtime as the "passionate viewer" when a new drama is broadcasted. Then we compared the drama's passionate viewer percentage the most highly ranked and the least highly ranked dramas. So that we can determine the possibility of blockbuster TV mini-series. We find that the initial viewing-time pattern is the key factor for the prediction of blockbuster dramas. From our model, block-buster dramas were correctly classified with the 75.47% accuracy with the initial viewing-time pattern analysis. This paper shows high prediction rate while suggesting audience rating method different from existing ones. Currently, broadcasters rely heavily on some famous actors called so-called star systems, so they are in more severe competition than ever due to rising production costs of broadcasting programs, long-term recession, aggressive investment in comprehensive programming channels and large corporations. Everyone is in a financially difficult situation. The basic revenue model of these broadcasters is advertising, and the execution of advertising is based on audience rating as a basic index. In the drama, there is uncertainty in the drama market that it is difficult to forecast the demand due to the nature of the commodity, while the drama market has a high financial contribution in the success of various contents of the broadcasting company. Therefore, to minimize the risk of failure. Thus, by analyzing the distribution of the first-time viewing time, it can be a practical help to establish a response strategy (organization/ marketing/story change, etc.) of the related company. Also, in this paper, we found that the behavior of the audience is crucial to the success of the program. In this paper, we define TV viewing as a measure of how enthusiastically watching TV is watched. We can predict the success of the program successfully by calculating the loyalty of the customer with the hot blood. This way of calculating loyalty can also be used to calculate loyalty to various platforms. It can also be used for marketing programs such as highlights, script previews, making movies, characters, games, and other marketing projects.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

  • Park, Chul-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.63-88
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.