텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis

  • 감미아 (연세대학교 문헌정보학대학원) ;
  • 송민 (연세대학교 문과대학 문헌정보학과)
  • Kam, Miah (Graduate school of Library and Information Science, Yonsei University) ;
  • Song, Min (Department of Library and Information Science, Yonsei University)
  • 투고 : 2012.08.10
  • 심사 : 2012.09.11
  • 발행 : 2012.09.30


본 연구는 경향신문, 한겨레, 동아일보 세 개의 신문기사가 가지고 있는 내용 및 논조에 어떠한 차이가 있는지를 객관적인 데이터를 통해 제시하고자 시행되었다. 본 연구는 텍스트 마이닝 기법을 활용하여 신문기사의 키워드 단순빈도 분석과 Clustering, Classification 결과를 분석하여 제시하였으며, 경제, 문화 국제, 사회, 정치 및 사설 분야에서의 신문사 간 차이점을 분석하고자 하였다. 신문기사의 문단을 분석단위로 하여 각 신문사의 특성을 파악하였고, 키워드 네트워크로 키워드들 간의 관계를 시각화하여 신문사별 특성을 객관적으로 볼 수 있도록 제시하였다. 신문기사의 수집은 신문기사 데이터베이스 시스템인 KINDS에서 2008년부터 2012년까지 해당 주제로 주제어 검색을 하여 총 3,026개의 수집을 하였다. 수집된 신문기사들은 불용어 제거와 형태소 분석을 위해 Java로 구현된 Lucene Korean 모듈을 이용하여 자연어 처리를 하였다. 신문기사의 내용 및 논조를 파악하기 위해 경향신문, 한겨레, 동아일보가 정해진 기간 내에 일어난 특정 사건에 대해 언급하는 단어의 빈도 상위 10위를 제시하여 분석하였고, 키워드들 간 코사인 유사도를 분석하여 네트워크 지도를 만들었으며 단어들의 네트워크를 통해 Clustering 결과를 분석하였다. 신문사들마다의 논조를 확인하기 위해 Supervised Learning 기법을 활용하여 각각의 논조에 대해 분류하였으며, 마지막으로는 분류 성능 평가를 위해 정확률과 재현률, F-value를 측정하여 제시하였다. 본 연구를 통해 문화 전반, 경제 전반, 정치분야의 통합진보당 이슈에 대한 신문기사들에 전반적인 내용과 논조에 차이를 보이고 있음을 알 수 있었고, 사회분야의 4대강 사업에 대한 긍정-부정 논조에 차이가 있음을 발견할 수 있었다. 본 연구는 지금까지 연구되어왔던 한글 신문기사의 코딩 및 담화분석 방법에서 벗어나, 텍스트 마이닝 기법을 활용하여 다량의 데이터를 분석하였음에 의미가 있다. 향후 지속적인 연구를 통해 분류 성능을 보다 높인다면, 사람들이 뉴스를 접할 때 그 뉴스의 특정 논조 성향에 대해 우선적으로 파악하여 객관성을 유지한 채 정보에 접근할 수 있도록 도와주는 신뢰성 있는 툴을 만들 수 있을 것이라 기대한다.

This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.



  1. Baek, Y. K. and Y. M. Seo, "A Study on Automatic Classification System of Hangul Internet News Articles", Annual Fall symposium of 2003 of The Korea Society of Management Information Systems, The Korea Society of Management Information Systems, (2003), 574 -580.
  2. Balahur, A. and R. Steinberger, "Rethinking sentiment analysis in the news : From theory to practice and back", In Proceedings of the 1st Workshop on Opinion Mining and Sentiment Analysis, Satellite to CAEPIA 2009.
  3. Carlos, H. Caldas, and L. Soibelman, "Automating hierarchical document classification for construction management information systems", Journal of Automation in Construction, Vol. 12(2003), 395-406.
  4. Choi, H. J., A study on diversity of opinion in news market and Report Characteristics of Major Newspapers, Korean journal of journalism and communication studies, 2010.
  5. Choi, J. H., H. Kim, and N. Im, "Keyword Network Analysis for Technology Forecasting", Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 227-240.
  6. Chung, J. C., "Korean Press and Discourse of Ideology", Korean journal of journalism and communication studies, Vol.46, No.4(2002), 314-348.
  7. Chung, Y. M., Research in Information Retrieval, Ku-mi, Seoul, 2005.
  8. Kim, J. A. and B. Chae, "The Political Attitude of Newspapers and the Coverage of Political Scandal", Journal of communication and information, Vol.41(2008), 232-267.
  9. Kim, N. W. and J. Park, "Personal Information Detection by Using Naive Bayes Methodology", Journal of Intelligence and Information Systems, Vol.18, No.1(2012b), 91-107.
  10. Kim, Y. S., N. Kim, and S. R. Jeong, "Stock-Index Invest Model Using News Big Data Opinion Mining", Journal of Intelligence and Information Systems, Vol.18, No.2(2012a), 143- 156.
  11. Lee, J. Y., "Centrality Measures for Bibliometric Network Analysis", Korean Society for Library and Information Science, Vol.40, No.3(2006a), 191-214.
  12. Lee, J. Y., "A novel clustering method for examining and analyzing the intellectual structure of a scholarly field", Korean Society for information Management, Vol.23, No.4(2006b), 215-231.
  13. Lee, M. K. and S. J. Kim, "A Comparative Analysis over News Framing of the Abolition of the Family Headship (Hoju) System : Examining Three Major Korean Dailies : Chosun, Kukmin, Hankyoreh", Journal of communication and information, Vol.34(2006), 132-162.
  14. Media Today, "News Papers Report National Inspection Results with Party Biased view", Accessed 2012. 04. 12, .
  15. Pollak, S., R. Coesemans, W. Daelemans, and N. Lavrač, "Detecting Contrast Patterns in New spaper Articles by Combining Discourse Ana lysis and Text Mining", Pragmatics, Vol.21, No.4(2011), 647-683.

피인용 문헌

  1. 공공데이터에 적합한 다양한 소셜 그래프 비주얼라이제이션 알고리즘 제안 vol.10, pp.1, 2012,
  2. 취업준비생 토픽 분석을 통한 취업난 원인의 재탐색 vol.35, pp.1, 2016,
  3. An Analysis of Contents in the Chinese High School 『Thought and Politics (1) - Economic Life』 Textbook - Focusing on the Number of Economic Concepts and the Types of Inquiry Activities vol.48, pp.2, 2016,
  4. 국내 뉴스 보도 연구 동향에 관한 주제어 연결망 분석 vol.16, pp.8, 2012,
  5. Analyzing the Change of Discourse about Adolescent Problems Based on the Foucault’s Governmentality vol.18, pp.2, 2012,
  6. 신문 빅데이터를 바탕으로 본 국내 정보화의 경향과 도서관의 역할 vol.18, pp.9, 2018,
  7. Consumer Cooperatives’ Social Awareness:Analysis of News Reports vol.36, pp.2, 2012,
  8. 빅데이터 분석을 이용한 세종시 건설 계획에 관한 여론 변화 vol.20, pp.8, 2012,