• Title/Summary/Keyword: 뉴스빅데이터

Search Result 207, Processing Time 0.022 seconds

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

Topic Modeling on the Adolescent Problem Using Text Mining (텍스트 마이닝을 이용한 청소년 문제 토픽 모델링)

  • Cho, Ju-Yeon;Cho, Kyoung Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.12
    • /
    • pp.1589-1595
    • /
    • 2018
  • The purpose of this research is to search for and identify trends in adolescent problems on internet news sites. Among the domestic internet news sites, 8,110 articles on adolescent problems from 1993 to 2018 were analyzed for the top three top-ranked 'The Chosunilbo', 'The Dong-A Ilbo', and 'Korea Joongang Daily' news sites. As a result of this study, we have been able to understand the topic of adolescent problems in internet news sites for the last 26 years and find out that the trend of articles has been changed considering the environment, policies and culture related to adolescent problems. This study is meaningful to start from the method to examine the social trends of existing adolescent problems, to expand the scope of adolescent problems and counseling, to use quantitative analysis methods and to provide new information to consider diversity.

Sentiment Analysis of Elderly and Job in the Demographic Cliff (인구절벽사회에서 노인과 일자리 감성분석)

  • Kim, Yang-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.110-118
    • /
    • 2020
  • Social media data serves as a proxy indicator to understand the problems and the future of public opinion in Korean society. This research used 109,015 news data from 2016 to 2018 to analyze the sensitivity of the elderly and employment in Korean society, and explored the possibility of expanding the labor force in Korean society, which is facing a cliff between the elderly and the population. Topic keywords for employment of the elderly include "elderly*employment", "elderly*employment", and "elderly*wage". As a result of the analysis, positive sensitivity prevails for most of the period, and it is possible to expand the working-age population. Positive feelings about expanding employment opportunities for the elderly and negative feelings about low wages have brought to light the reality of the elderly who are still poor despite their work. In this study, social big data was used to analyze the perceptions and sensibilities of Korean society related to the elderly and employment through hierarchical crowd analysis and related text mining analysis.

A Study on deduction of important factors for new infectious diseases through big data analysis (빅데이터 분석을 통한 신종감염병 중요 요인 도출)

  • Suh, Kyung-Do
    • Journal of Industrial Convergence
    • /
    • v.19 no.3
    • /
    • pp.35-40
    • /
    • 2021
  • This study attempted to derive important factors of emerging infectious diseases by collecting and analyzing text data onto emerging infectious diseases. For this purpose, articles in the Naver News database were directly crawled, pre-processed, and used for data analysis. In addition, additional analysis was performed using Big Kinds. As a result of the priority analysis, the importance was shown in the order of corona, infectious disease, quarantine, vaccine, outbreak, virus, infection, and development. As a result of the proximity centrality analysis, the importance was shown in the order of government, death, and plan, and the analysis result of Big Kinds showed that Covid-19 and the Korea Centers for Disease Control and Prevention were important. Based on the results of this study, it can be said that the government's policy support is needed to raise public awareness of new infectious diseases, prevent disease, and develop vaccines and treatments.

A study on the method of deriving the cause of social issues based on causal sentences (인과관계문형 기반 사회이슈 발생원인 도출 방법 연구)

  • Lee, Namyeon;Lee, Jae Hyung
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.167-176
    • /
    • 2021
  • With development of big data analysis technology, many studies to find social issues using texts mining techniques have been conducted. In order to derive social issues, previous studies performed in a way that collects a large amount of text data from news or SNS, and then analyzes issues based on text mining techniques such as topic modeling and terms network analysis. Social issues are the results of various social phenomena and factors. However, since previous studies focused on deriving social issues that are results of various causes, there are limitations to revealing the cause of the issues. In order to effectively respond to social issues, it is necessary not only to derive social issues, but also to be able to identify the causes of social issues. In this study, in order to overcome these limitations, we proposed a method of deriving the factors that cause social issues from texts related to social issues based on the theory of part of Korean linguistics. To do this, we collected news data related to social issues for three years from 2017 to 2019 and proposed a methodology to find causes based causal sentences based on text mining techniques.

The College Reputation System using Public Data and Sentiment Analysis (공공데이터와 감성분석을 이용한 대학평판시스템)

  • Kim, Eun-Ah;Lee, Yon-Sik
    • Convergence Security Journal
    • /
    • v.18 no.1
    • /
    • pp.103-110
    • /
    • 2018
  • Modern society is increasingly demanding in many areas of big data processing technology to collect, aggregate, and analyze large amounts of data over the Internet and SNS. A typical application is to evaluate the reputation of a company or college. To measure and quantify a reputation, fair and precise data and efficient data processing are very important. For this purpose, a quantitative quotient was obtained using public data, a qualitative quotient was obtained through sentiment analysis using news articles, and a complex college reputation quotient was calculated. In this paper, a complex college reputation quotient was calculated based on the quantitative index, reflecting the sentimental reputation, and based on the proposed mixed university system. In this paper, the Complex College Reputation System(CCRS) was proposed, which produced the Complex College Reputation Quotient with an objective quantitative quotient and qualitative quotient reflecting the sentimental reputation to measure the college reputation.

  • PDF

An Analysis of High School Korean Language Instruction Regarding Universal Design for Learning: Social Big Data Analysis and Survey Analysis (보편적 학습설계 측면에서의 고등학교 국어과 교수 실태: 소셜 빅데이터 및 설문조사 분석)

  • Shin, Mikyung;Lee, Okin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.326-337
    • /
    • 2020
  • This study examined the public interest in high school Korean language instruction and the universal design for learning (UDL) using the social big data analysis method. The observations from 10,339 search results led to the conclusion that public interest in UDL was significantly lower than that of high school Korean language instruction. The results of the Big Data Association analysis showed that 17.22% of the terms were found to be related to "curriculum." In addition, a survey was conducted on a total of 330 high school students to examine how their teachers apply UDL in the classroom. High school students perceived computers as the most frequently used technology tool in daily classes (38.79%). Teacher-led lectures (52.12%) were the most frequently observed method of instruction. Compared to the second-year and third-year students, the first-year students appreciated the usage of technology tools and various instruction mediums more frequently (ps<.05). Students were relatively more positive in their response to the query on the provision of multiple means of representation. Consequently, the lesson contents became easier to understand for students with the availability of various study methods and materials. The first-year students were generally more positive towards teachers' incorporation of UDL.

Social Big Data-based Co-occurrence Analysis of the Main Person's Characteristics and the Issues in the 2016 Rio Olympics Men's Soccer Games (소셜 빅데이터 기반 2016리우올림픽 축구 관련 이슈 및 인물에 대한 연관단어 분석)

  • Park, SungGeon;Lee, Soowon;Hwang, YoungChan
    • 한국체육학회지인문사회과학편
    • /
    • v.56 no.2
    • /
    • pp.303-320
    • /
    • 2017
  • This paper seeks to better understand the focal issues and persons related to Rio Olympic soccer games through social data science and analytics. This study collected its data from online news articles and comments specific to KOR during the Olympic football games. In order to investigate the public interests for each game and target persons, this study performed the co-occurrence words analysis. Then after, the study applied the NodeXL software to perform its visualization of the results. Through this application and process, the study found several major issues during the Rio Olympic men's football game including the following: the match between KOR and PIJ, KOR player Heungmin Son, commentator Young-Pyo Lee, sportscaster Woo-Jong Jo. The study also showed the general public opinion expressed positive words towards the South Korean national football team during the Rio Olympics, though there existed negative words as well. Furthermore the study revealed positive attitude towards the commentators and casters. In conclusion, the way to increase the public's interest in big sporting events can be achieved by providing the following: contents that include various professional sports analysis, a capable domain expert with thorough preparation, a commentator and/or caster with artistic sense as well as well-spoken, explanatory power and so on. Multidisciplinary research combined with sports science, social science, information technology and media can contribute to a wide range of theoretical studies and practical developments within the sports industry.