• Title/Summary/Keyword: News Big data

Search Result 291, Processing Time 0.027 seconds

The Representation of Cancer Risk by Korean Health Journalism: Comparing the Crude Rates of 10 Cancers to the Amount of Cancer News in the Three Major Newspapers(1990-2010) (10대암 조발생률과 신문 보도량의 비교: 3대 일간지 보도(1990년~2010년)를 중심으로)

  • Ju, Youngkee;Jeong, Da-Eun;You, Myoungsoon
    • Korean Journal of Health Education and Promotion
    • /
    • v.30 no.5
    • /
    • pp.201-210
    • /
    • 2013
  • Objectives: The public relies on the news media to understand health risks. To examine the surveillance function of Korean health journalism, this study compared the rank-order of the 10 most frequently diagnosed cancers with that of the 10 cancers most frequently covered by three major Korean newspapers. Methods: News stories published between 1999 and 2010 by the Chosun-Ilbo, Joong-Ang-Ilbo, and Dong-A-Ilbo were examined. Data on cancer incidence were collected using the epidemiological data published by a governmental public health institution. To compare the level of the crude rates and the amount of news coverage, rank-order correlation tests and regression analyses were employed. Results: A reduction in the rank-ordered correlation coefficient was observed despite an increase in the overall number of cancer news stories released. The significance of the correlation disappeared after 2006. The big difference of the rank order between the crude rate and the amount of news coverage was observed in the cancer of breast, uteri, thyroid, and gallbladder/biliary. Finally, the three newspapers did not follow the amount change in stomach, lung, liver, and uterine cervix cancer. The four cancers' rank orders of crude rate were lowering, signifying a reduction of the comparative dangerousness of the four cancers. Conclusions: The news media's customization of news content and the negative bias in journalism are suggested as possible influences on the news media's inaccurate representation of cancer risk.

Wrapper-based Economy Data Collection System Design And Implementation (래퍼 기반 경제 데이터 수집 시스템 설계 및 구현)

  • Piao, Zhegao;Gu, Yeong Hyeon;Yoo, Seong Joon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.227-230
    • /
    • 2015
  • For analyzing and prediction of economic trends, it is necessary to collect particular economic news and stock data. Typical Web crawler to analyze the page content, collects document and extracts URL automatically. On the other hand there are forms of crawler that can collect only document of a particular topic. In order to collect economic news on a particular Web site, we need to design a crawler which could directly analyze its structure and gather data from it. The wrapper-based web crawler design is required. In this paper, we design a crawler wrapper for Economic news analysis system based on big data and implemented to collect data. we collect the data which stock data, sales data from USA auto market since 2000 with wrapper-based crawler. USA and South Korea's economic news data are also collected by wrapper-based crawler. To determining the data update frequency on the site. And periodically updated. We remove duplicate data and build a structured data set for next analysis. Primary to remove the noise data, such as advertising and public relations, etc.

  • PDF

Analysis of Review Data of 'Tamna' Franchisees to Promote Sustainable Travel in Jeju City (제주시의 지속가능한 여행 활성화를 위한 지역화폐 '탐나는전' 가맹점의 리뷰 데이터 분석)

  • Sehui Baek;Sehyoung Kim;Miran Bae;Juyoung Kang
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.113-128
    • /
    • 2022
  • After COVID-19, interest in "sustainable tourism" increased, and the number of tourists who wanted to experience "sustainable tourism" also increased. However, there is a problem that the programs and methods for 'sustainable tourism' are not specific and diverse. In addition, since most of the interests of "sustainable tourism" focus on "environment" and "carbon neutrality," there are not many programs or government policies that can contribute to the community. Therefore, in this study, news data and review data were analyzed to suggest a method for promoting 'sustainable tourism'. First, in this study, major themes of sustainable travel were derived through news big data analysis. Through this analysis, policy themes and events of 'sustainable tourism' were derived. By analyzing news big data related to "sustainable tourism," we would like to analyze the reasons why sustainable travel has not been activated in Korea. Finally, in order to promote sustainable travel in Jeju island, we analyzed user review data of Jeju local currency, and propose a idea to coexist with the local community.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

Trends of South Korea's Informatization and Libraries' Role Based on Newspaper Big Data (신문 빅데이터를 바탕으로 본 국내 정보화의 경향과 도서관의 역할)

  • Na, Kyoungsik;Lee, Jisu
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.9
    • /
    • pp.14-33
    • /
    • 2018
  • The purpose of this study to analyze the informatization trends in Korea through objective newspaper data for the period from 1998 to 2017 for informatization and library in four newspapers including KyoungHyang Newspaper, Kookmin Ilbo, Hankyoreh and Hankookilbo. Based on the analysis results of metadata and related words using BIGKinds, a news big data system, this study presented analysis of simple frequency, classification and classification of the keywords 'information', 'informatization' and 'library'. Based on the results, we compared and analyzed the tendency of informatization in the media through comparison with the 'Information White Paper' which is the publication of government agencies and with research about the research topic of 4 academic journals in the Library and Information Science field. This study tried to interpret the trends of informatization based on the media and it is meaningful that we analyzed the big data of newspaper article which is the long term and time series data. Based on the results of the study, implications of the growth and development of libraries with domestic informatization were suggested. It is expected that we will be able to create a basic framework for developing library informatization policy through the further studies.

Forecasting the Future Korean Society: A Big Data Analysis on 'Future Society'-related Keywords in News Articles and Academic Papers (빅데이터를 통해 본 한국사회의 미래: 언론사 뉴스기사와 사회과학 학술논문의 '미래사회' 관련 키워드 분석)

  • Kim, Mun-Cho;Lee, Wang-Won;Lee, Hye-Soo;Suh, Byung-Jo
    • Informatization Policy
    • /
    • v.25 no.4
    • /
    • pp.37-64
    • /
    • 2018
  • This study aims to forecast the future of the Korean society via a big data analysis. Based upon two sets of database - a collection of 46,000,000 news on 127 media in Naver Portal operated by Naver Corporation and a collection of 70,000 academic papers of social sciences registered in KCI (Korea Citation Index of National Research Foundation) between 2005-2017, 40 most frequently occurring keywords were selected. Next, their temporal variations were traced and compared in terms of number and pattern of frequencies. In addition, core issues of the future were identified through keyword network analysis. In the case of the media news database, such issues as economy, polity or technology turned out to be the top ranked ones. As to the academic paper database, however, top ranking issues are those of feeling, working or living. Referring to the system and life-world conceptual framework suggested by $J{\ddot{u}}rgen$ Habermas, public interest of the future inclines to the matter of 'system' while professional interest of the future leans to that of 'life-world.' Given the disparity of future interest, a 'mismatch paradigm' is proposed as an alternative to social forecasting, which can substitute the existing paradigms based on the ideas of deficiency or deprivation.

Analysis of Real Estate Market Trend Using Text Mining and Big Data (빅데이터와 텍스트마이닝을 이용한 부동산시장 동향분석)

  • Chun, Hae-Jung
    • Journal of Digital Convergence
    • /
    • v.17 no.4
    • /
    • pp.49-55
    • /
    • 2019
  • This study is on the trend of real estate market using text mining and big data. The data were collected through internet news posted on Naver from August 2016 to August 2017. As a result of TF-IDF analysis, the frequency was high in the order of housing, sale, household, real estate market, and region. Many words related to policies such as loan, government, countermeasures, and regulations were extracted, and the region - related words appeared the most frequently in Seoul. The combination of the words related to the region showed that the frequencies of 'Seoul - Gangnam', 'Seoul - Metropolitan area', 'Gangnam - reconstruction' and 'Seoul - reconstruction' appeared frequently. It can be seen that the people's interest and expectation about the reconstruction of Gangnam area is high.

Exploratory Study on the Media Coverage Trends of Personal Information Issues for Corporate Sustainable Management

  • Dabin Lee;Yeji Choi;Jaewook Byun;Hangbae Chang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.4
    • /
    • pp.87-96
    • /
    • 2024
  • Information power has been a major criterion for wealth disparity in human history, and since the advent of the Fourth Industrial Revolution, referred to as the data economy era, personal information has also gained economic value. Additionally, companies collect and analyze customer information to use as a marketing tool, providing personalized services, making the collection of quality customer information crucial to a company's success. However, as the amount of data held by companies increases, crimes of stealing personal information for financial gain have surged, making corporate customer information a target for criminals. The leakage of personal information and its circumstances lead to a decline in corporate trust from the customer's perspective, threatening corporate sustainability with falling stock prices and decreased sales. Therefore, companies find themselves in a paradoxical situation where the utilization of personal information is increasing while the risk of personal information leakage is also growing. This study used the news big data analysis system, BIG KINDS, to analyze major keywords before and after media coverage on personal information leaks, examining domestic media coverage trends. Through this, we identified the impact of personal information leakage on corporate sustainability and analyzed the connection between personal information protection and sustainable corporate management. The results derived from this study are expected to serve as foundational data for companies seeking ways to enhance sustainable management while increasing the utilization of personal information.

A Comparative Study between Stock Price Prediction Models Using Sentiment Analysis and Machine Learning Based on SNS and News Articles (SNS와 뉴스기사의 감성분석과 기계학습을 이용한 주가예측 모형 비교 연구)

  • Kim, Dongyoung;Park, Jeawon;Choi, Jaehyun
    • Journal of Information Technology Services
    • /
    • v.13 no.3
    • /
    • pp.221-233
    • /
    • 2014
  • Because people's interest of the stock market has been increased with the development of economy, a lot of studies have been going to predict fluctuation of stock prices. Latterly many studies have been made using scientific and technological method among the various forecasting method, and also data using for study are becoming diverse. So, in this paper we propose stock prices prediction models using sentiment analysis and machine learning based on news articles and SNS data to improve the accuracy of prediction of stock prices. Stock prices prediction models that we propose are generated through the four-step process that contain data collection, sentiment dictionary construction, sentiment analysis, and machine learning. The data have been collected to target newspapers related to economy in the case of news article and to target twitter in the case of SNS data. Sentiment dictionary was built using news articles among the collected data, and we utilize it to process sentiment analysis. In machine learning phase, we generate prediction models using various techniques of classification and the data that was made through sentiment analysis. After generating prediction models, we conducted 10-fold cross-validation to measure the performance of they. The experimental result showed that accuracy is over 80% in a number of ways and F1 score is closer to 0.8. The result can be seen as significantly enhanced result compared with conventional researches utilizing opinion mining or data mining techniques.

Analysis entrepreneurship trends using keyword analysis of news article Big Data :2013~2022 (뉴스기사 빅데이터의 키워드분석을 활용한 창업 트렌드 분석:2013~2022 )

  • Jaeeog Kim;Byunghoon Jeon
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.83-97
    • /
    • 2023
  • This research aims to identify startup trends by analyzing a large number of news articles through semantic network analysis. Using the BIGKinds article analysis service provided by the Korea Press Foundation, 330,628 news articles from 19 newspapers from January 2013 to December 2022 were comprehensively analyzed. The study focused on exploring the changes in key issues over the past decade, considering the impact of the social environment and global economic trends on entrepreneurship. We compared the number of news articles and changes in issues before and after the COVID-19 pandemic, and visualized entrepreneurship trends through frequency analysis, relationship analysis, and correlation analysis. The results of the study showed that the top keywords for entrepreneurship-related words are startup activation and commercialization, and the correlation between COVID-19 and entrepreneurship keywords is almost negligible in a linear sense, but the number of news articles decreased during the pandemic, which has an impact. In particular, the most frequently mentioned keywords are Ministry of SMEs and Startups, place is the United States, and person is limited. The agency was the SBA, and the entrepreneurship sector is more affected by social issues than any other sector, with the important characteristics of increased frequency of prompt access. This study supplies essential basic data for understanding and exploring issues and events related to entrepreneurship and suggests future research topics in the field.

  • PDF