• Title/Summary/Keyword: 뉴스빅데이터

Search Result 207, Processing Time 0.029 seconds

Keyword Analysis of COVID-19 in News Big Data : Focused on 4 Major Daily Newspapers

  • Kwon, Seong-Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.101-107
    • /
    • 2020
  • This paper aims to compare and analyze the major keywords according to the political orientation of progressive and conservative newspapers by utilizing the big data of the four major domestic daily newspapers related to COVID-19, which has entered a long-term war. To this end, 93,917 news reports from Jan. 20 to Sept. 15, 2020 were divided into four stages and the major keywords of the four newspapers were implemented and analyzed in WordCloud. According to the analysis, the conservative newspaper focused on the government's response, criticism, and China's responsibility by mentioning the keywords "government," "president," "state of affairs" and "mask" more than the progressive newspaper, while the progressive newspaper uses keywords that emphasize the seriousness of the disease and the occurrence of a dangerous situation. The Chosun Ilbo found that the use of various keywords during the massive outbreak of collective infections (2.18-5.15), and that the JoongAng Ilbo used keywords criticizing government policies in relation to reports of infectious diseases such as COVID-19, but also used keywords that emphasize the seriousness of diseases used by progressive newspapers and the occurrence of dangerous situations.

Design of news visualization system with Big Data analysis (빅데이터 분석을 이용한 뉴스 기사 시각화 시스템 설계)

  • Ko, Byungsoo;Jang, Hanbyeol;Choi, Hyeokjun;Kim, Kyungsup
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.242-244
    • /
    • 2015
  • 정보통신기술이 발전함에 따라 인터넷 뉴스 기사의 수와 구독률이 지속적으로 상승하고 있다. 하지만 주로 텍스트로 구성된 인터넷 기사를 통해 전체적인 이슈의 현황을 파악하는 데는 한계가 있다. 이에 본 논문에서는 분산 시스템 환경 내에서 기계 학습을 통하여 대량의 뉴스를 분석하고, 주요 이슈와 이슈간의 연관성을 추출하여 시각화하는 시스템 설계에 대해 제안하고자 한다.

Chunking Annotation Corpus Construction for Keyword Extraction in News Domain (뉴스 기사 키워드 추출을 위한 구묶음 주석 말뭉치 구축)

  • Kim, Tae-Young;Kim, Jeong Ah;Kim, Bo Hui;Oh, Hyo Jung
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.595-597
    • /
    • 2020
  • 빅데이터 시대에서 대용량 문서의 의미를 자동으로 파악하기 위해서는 문서 내에서 주제 및 내용을 포괄하는 핵심 단어가 키워드 단위로 추출되어야 한다. 문서에서 키워드가 될 수 있는 단위는 복합명사를 포함한 단어가 될 수도, 그 이상의 묶음이 될 수도 있다. 한국어는 언어적 특성상 구묶음 개념이 적용되는 데, 이를 통해 주요 키워드가 될 수 있는 말덩이 추출이 가능하다. 따라서 본 연구에서는 문서에서 단어뿐만 아니라 다양한 단위의 키워드 묶음을 태깅하는 가이드라인 정의를 비롯해 태깅도구를 활용한 코퍼스 구축 방법론을 고도화하고, 그 방법론을 실제로 뉴스 도메인에 적용하여 주석 말뭉치를 구축함으로써 검증하였다. 본 연구의 결과물은 텍스트 문서의 내용을 파악하고 분석이 필요한 모든 텍스트마이닝 관련 기술의 기초 작업으로 활용 가능하다.

  • PDF

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

  • Kim, JungJin;Jeong, Hanseok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.236-236
    • /
    • 2021
  • 최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.

  • PDF

Forecasting Birthrate Change based on Big Data (빅데이터 기반의 출산율 변동 예측)

  • Joo, Se-Min;Ok, Seong-Hwan;Hwang, Kyung-Tae
    • Informatization Policy
    • /
    • v.26 no.4
    • /
    • pp.20-35
    • /
    • 2019
  • We empirically analyze the effects of psychological factors, such as the fear of parenting, on fertility rates. An index is calculated based on the share of negative news articles on child care in all social articles from 2000 to 2018. The analysis result shows that as the index increases, the fertility rate after three years falls. This result is repeated in the correlation analysis, simple regression, and VAR analysis. According to Granger causality analysis, it is found that the relation between the index and the fertility rate after three years is not just a simple correlation but a causal relationship. There are differences among age groups. The fertility rate of women in their 20s and 30s shows a significant response to the index, but that of the 40s does not. The index affects the birthrate of first child, but do not affect the birthrate of second or more children. These results are consistent with the intuition that younger women are more likely to be affected by the negative articles about parenting, but not to those who have already experienced childbirth. This study is meaningful in that a significant index for predicting social phenomena is extracted beyond the limited use of news big data such as a simple keyword mention volume monitoring. Also, this big data-based index is a 3-year leading indicator for fertility, which provides the advantage of providing information that helps early detection.

Forecasting the Future Korean Society: A Big Data Analysis on 'Future Society'-related Keywords in News Articles and Academic Papers (빅데이터를 통해 본 한국사회의 미래: 언론사 뉴스기사와 사회과학 학술논문의 '미래사회' 관련 키워드 분석)

  • Kim, Mun-Cho;Lee, Wang-Won;Lee, Hye-Soo;Suh, Byung-Jo
    • Informatization Policy
    • /
    • v.25 no.4
    • /
    • pp.37-64
    • /
    • 2018
  • This study aims to forecast the future of the Korean society via a big data analysis. Based upon two sets of database - a collection of 46,000,000 news on 127 media in Naver Portal operated by Naver Corporation and a collection of 70,000 academic papers of social sciences registered in KCI (Korea Citation Index of National Research Foundation) between 2005-2017, 40 most frequently occurring keywords were selected. Next, their temporal variations were traced and compared in terms of number and pattern of frequencies. In addition, core issues of the future were identified through keyword network analysis. In the case of the media news database, such issues as economy, polity or technology turned out to be the top ranked ones. As to the academic paper database, however, top ranking issues are those of feeling, working or living. Referring to the system and life-world conceptual framework suggested by $J{\ddot{u}}rgen$ Habermas, public interest of the future inclines to the matter of 'system' while professional interest of the future leans to that of 'life-world.' Given the disparity of future interest, a 'mismatch paradigm' is proposed as an alternative to social forecasting, which can substitute the existing paradigms based on the ideas of deficiency or deprivation.

Social perception of the Arduino lecture as seen in big data (빅데이터 분석을 통한 아두이노 강의에 대한 사회적 인식)

  • Lee, Eunsang
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.6
    • /
    • pp.935-945
    • /
    • 2021
  • The purpose of this study is to analyze the social perception of Arduino lecture using big data analysis method. For this purpose, data from January 2012 to May 2021 were collected using the Textom website as a keyword searched for 'arduino + lecture' in blogs, cafes, and news channels of NAVER website. The collected data was refined using the Textom website, and text mining analysis and semantic network analysis were performed by opening the Textom website, Ucinet 6, and Netdraw programs. As a result of text mining analysis such as frequency analysis, TF-IDF analysis, and degree centrality it was confirmed that 'education' and 'coding' were the top keywords. As a result of CONCOR analysis for semantic network analysis, four clusters can be identified: 'Arduino-related education', 'Physical computing-related lecture', 'Arduino special lecture', and 'GUI programming'. Through this study, it was possible to confirm various meaningful social perceptions of the general public in relation to Arduino lecture on the Internet. The results of this study will be used as data that provides meaningful implications for instructors preparing for Arduino lectures, researchers studying the subject, and policy makers who establish software education or coding education and related policies.

A Study on Corporate Reputation and Profitability Focus on Online News and Comments (기업평판과 수익성에 관한 연구 온라인 뉴스와 뉴스댓글을 중심으로)

  • Jin, Zhilong;Han, Eun-Kyoung
    • Journal of Digital Convergence
    • /
    • v.17 no.9
    • /
    • pp.399-406
    • /
    • 2019
  • The purpose of this study is to examine the relationship between corporate reputation and the profitability. In this study, Big Data Analysis was conducted for Hyundai Motor, Shinsegae Department Store, SK Telecom, and Amorepacific to solve research problems. The results of this study show that the effect of each corporate reputation on the profitability is different according to the company. For products such as Hyundai Motor and Amorepacific that are used directly by consumers, the corporate reputation formed by the comments was more influential. In addition, distribution Service company such as Shinsegae Department Store showed more influence by online news. On the other hand, SK Telecom did not have a significant effect on profitability. Based on the results, this study emphasizes the importance of online news and comments on corporate reputation management, and aims to contribute to establishing an efficient reputation management strategy by examining the relationship between corporate reputation and profitability.

A Study on the Agenda Rank-Order Correlation between Twitter and Portal News about Sewol Ferry Catastrophe (세월호 참사에 대한 트위터와 포털뉴스의 의제 순위 상관관계 연구)

  • Kim, Shin-Ku;Choi, Eun-Kyoung
    • Journal of Internet Computing and Services
    • /
    • v.16 no.3
    • /
    • pp.105-116
    • /
    • 2015
  • The Sewol ferry catastrophe that took place on April 16 2014 was unprecedented in terms of its sociopolitical implications, which had reverberated throughout the Korean nation. Mindful of such distinct characteristics of the Sewol ferry catastrophe, this thesis looks into the salience of the agendas portrayed in Twitter and Portal News coverage on the disaster and the correlation between the attribute-specific agendas of the foregoing mediums by making use of the agenda rank order correlation method. Extraction and analysis of big data revealed that first, while the hypothesis that there were little difference in terms of salience among the main agendas between Twitter and Portal News was dismissed, the rank order correlation proved to be high as regards the main agendas on Twitter and Portal News. This signifies that Twitter agendas exert influence over those on Portal News. Next, and regarding the five main agendas on the incident, there existed differences in salience between the attribute-specific agendas of the two mediums, with low figures for corresponding rank order correlations. Such results signify that Twitter and Portal News have little influence over each other as regards their agenda rank order correlation.

An Analysis of Social Perception on Forest Using News Big Data (뉴스 빅데이터를 활용한 산림에 대한 사회적 인식 변화 분석)

  • Jang, Youn-Sun;Lee, Ju-Eun;Na, So-Yeon;Lee, Jeong-Hee;Seo, Jeong-Weon
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.3
    • /
    • pp.462-477
    • /
    • 2021
  • The purpose of this study was to understand changes in domestic forest policy and social perception of forests from a macro perspective using big data analysis of news articles and editorials. A total of 13,570 'forest' related data were collected from metropolitan and economic journals from 1946-2017 using keyword and CONCOR (Convergence of iterated Correlations) analysis. First, we found the percentage of articles and editorials using the keyword 'forest'increased overall. Second, news data on 'forest' in the field of reporting was concentrated in the "social" sector during the first period (1946-1966), followed by forest-related issues expanding to various fields from the second (1967-1972) to fifth (1988-1997) periods, then toward the "culture" sector in the sixth (1998-2007) and "politics" after the seventh (2008-2017) period. Third, we found changes in the policy paradigm over time significantly changed social awareness. In the first and second periods, people experienced livelihood issues rather than forest greening or forest protection policy and expanded their awareness of planned and scientific afforestation (third) to environmental protection (fourth) and ecological perspectives (sixth to seventh). The key outcome of our analysis was leveraging news big data that reflected polices on forests and public social perception To further derive future social issues,more in-depth analysis of public discourse and perception will be possible using textual big data and GDP of various social network services (SNS), such as combining blogs and YouTube.