• Title/Summary/Keyword: news topic

Search Result 232, Processing Time 0.034 seconds

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Policy agenda proposals from text mining analysis of patents and news articles (특허 및 뉴스 기사 텍스트 마이닝을 활용한 정책의제 제안)

  • Lee, Sae-Mi;Hong, Soon-Goo
    • Journal of Digital Convergence
    • /
    • v.18 no.3
    • /
    • pp.1-12
    • /
    • 2020
  • The purpose of this study is to explore the trend of blockchain technology through analysis of patents and news articles using text mining, and to suggest the blockchain policy agenda by grasping social interests. For this purpose, 327 blockchain-related patent abstracts in Korea and 5,941 full-text online news articles were collected and preprocessed. 12 patent topics and 19 news topics were extracted with latent dirichlet allocation topic modeling. Analysis of patents showed that topics related to authentication and transaction accounted were largely predominant. Analysis of news articles showed that social interests are mainly concerned with cryptocurrency. Policy agendas were then derived for blockchain development. This study demonstrates the efficient and objective use of an automated technique for the analysis of large text documents. Additionally, specific policy agendas are proposed in this study which can inform future policy-making processes.

Issue analysis of the admission officer system using topic analysis (토픽 분석을 이용한 학생부종합전형의 쟁점 분석)

  • Hong, Younghee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.423-434
    • /
    • 2019
  • An important issues in Korea society in 2018 was the revision of the university entrance examination system. Among the discussions, in order to grasp what the issue of admission officer system is, attention was focused on the function of media such as monitoring and criticism as well as the tried topic analysis of related news articles. As a result, the reorganization of the College Scholastic Ability Test (CSAT) was derived and showed the sensitivity of Korean society towards the CSAT. Topics directly related to the admission officer system were the selection factor and fairness of the university entrance examination system in relation to the selection factor.

Text Mining Driven Content Analysis of Ebola on News Media and Scientific Publications (텍스트 마이닝을 이용한 매체별 에볼라 주제 분석 - 바이오 분야 연구논문과 뉴스 텍스트 데이터를 이용하여 -)

  • An, Juyoung;Ahn, Kyubin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.289-307
    • /
    • 2016
  • Infectious diseases such as Ebola virus disease become a social issue and draw public attention to be a major topic on news or research. As a result, there have been a lot of studies on infectious diseases using text-mining techniques. However, there is no research on content analysis of two media channels that have distinct characteristics. Accordingly, in this study, we conduct topic analysis between news (representing a social perspective) and academic research paper (representing perspectives of bio-professionals). As text-mining techniques, topic modeling is applied to extract various topics according to the materials, and the word co-occurrence map based on selected bio entities is used to compare the perspectives of the materials specifically. For network analysis, topic map is built by using Gephi. Aforementioned approaches uncovered the difference of topics between two materials and the characteristics of the two materials. In terms of the word co-occurrence map, however, most of entities are shared in both materials. These results indicate that there are differences and commonalties between social and academic materials.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

The Study on the Meaning Change of 'Startup' and 'Entrepreneurship' using the Bigdata-based Corpus Network Analysis (빅데이터 기반 어휘연결망분석을 활용한 '창업'과 '기업가정신'의 의미변화연구)

  • Kim, Yeonjong;Park, Sanghyeok
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.4
    • /
    • pp.75-93
    • /
    • 2020
  • The purpose of this study is to extract keywords for 'startup' and 'entrepreneurship' from Naver news articles in Korea since 1990 and Google news articles in foreign countries, and to understand the changes in the meaning of entrepreneurship and entrepreneurship in each era It is aimed at doing. In summary, first, in terms of the frequency of keywords, venture sprouting is a sample of the entrepreneurial spirit of the government-led and entrepreneurs' chairman, and various technology investments and investments in corporate establishment have been made. It can be seen that training for the development of items and items was carried out, and in the case of the venture re-emergence period, it can be seen that the youth-oriented entrepreneurship and innovation through the development of various educational programs were emphasized. Second, in the result of vocabulary network analysis, the network connection and centrality of keywords in the leap period tended to be stronger than in the germination period, but the re-leap period tended to return to the level of germination. Third, in topic analysis, it can be seen that Naver keyword topics are mostly business-related content related to support, policy, and education, whereas topics through Google News consist of major keywords that are more specifically applicable to practical work.

Relation Between News Topics and Variations in Pharmaceutical Indices During COVID-19 Using a Generalized Dirichlet-Multinomial Regression (g-DMR) Model

  • Kim, Jang Hyun;Park, Min Hyung;Kim, Yerin;Nan, Dongyan;Travieso, Fernando
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1630-1648
    • /
    • 2021
  • Owing to the unprecedented COVID-19 pandemic, the pharmaceutical industry has attracted considerable attention, spurred by the widespread expectation of vaccine development. In this study, we collect relevant topics from news articles related to COVID-19 and explore their links with two South Korean pharmaceutical indices, the Drug and Medicine index of the Korea Composite Stock Price Index (KOSPI) and the Korean Securities Dealers Automated Quotations (KOSDAQ) Pharmaceutical index. We use generalized Dirichlet-multinomial regression (g-DMR) to reveal the dynamic topic distributions over metadata of index values. The results of our analysis, obtained using g-DMR, reveal that a greater focus on specific news topics has a significant relationship with fluctuations in the indices. We also provide practical and theoretical implications based on this analysis.

An Analysis of News Report Characteristics on Archives & Records Management for the Press in Korea: Based on 1999~2018 News Big Data (뉴스 빅데이터를 이용한 우리나라 언론의 기록관리 분야 보도 특성 분석: 1999~2018 뉴스를 중심으로)

  • Han, Seunghee
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.41-75
    • /
    • 2018
  • The purpose of this study is to analyze the characteristics of Korean media on the topic of archives & records management based on time-series analysis. In this study, from January, 1999 to June, 2018, 4,680 news articles on archives & records management topics were extracted from BigKinds. In order to examine the characteristics of the media coverage on the archives & records management topic, this study was analyzed to the difference of the press coverage by period, subject, and type of the media. In addition, this study was conducted word-frequency based content analysis and semantic network analysis to investigate the content characteristics of media on the subject. Based on these results, this study was analyzed to the differences of media coverage by period, subject, and type of media. As a result, the news in the field of records management showed that there was a difference in the amount of news coverage and news contents by period, subject, and type of media. The amount of news coverage began to increase after the Presidential Records Management Act was enacted in 2007, and the largest amount of news was reported in 2013. Daily newspapers and financial newspapers reported the largest amount of news. As a result of analyzing news reports, during the first 10 years after 1999, news topics were formed around the issues arising from the application and diffusion process of the concept of archives & records management. However, since the enactment of the Presidential Records Management Act, archives & records management has become a major factor in political and social issues, and a large amount of political and social news has been reported.

A Topic Analysis of SW Education Textdata Using R (R을 활용한 SW교육 텍스트데이터 토픽분석)

  • Park, Sunju
    • Journal of The Korean Association of Information Education
    • /
    • v.19 no.4
    • /
    • pp.517-524
    • /
    • 2015
  • In this paper, to find out the direction of interest related to the SW education, SW education news data were gathered and its contents were analyzed. The topic analysis of SW education news was performed by collecting the data of July 23, 2013 to October 19, 2015. By analyzing the relationship among the most mentioned top 20 words with the web crawling using R, the result indicated that the 20 words are the closely relevant data as the thickness of the node size of the 20 words was balancing each other in the co-occurrence matrix graph focusing on the 'SW education' word. Moreover, our analysis revealed that the data were mainly composed of the topics about SW talent, SW support Program, SW educational mandate, SW camp, SW industry and the job creation. This could be used for big data analysis to find out the thoughts and interests of such people in the SW education.

An analysis of the change in media's reports and attitudes about face masks during the COVID-19 pandemic in South Korea: a study using Big Data latent dirichlet allocation (LDA) topic modelling (빅데이터 LDA 토픽 모델링을 활용한 국내 코로나19 대유행 기간 마스크 관련 언론 보도 및 태도 변화 분석)

  • Suh, Ye-Ryoung;Koh, Keumseok Peter;Lee, Jaewoo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.5
    • /
    • pp.731-740
    • /
    • 2021
  • This study applied LDA topic modeling analysis to collect and analyze news media big data related to face masks in the three waves of the COVID-19 pandemic in Korea. The results empirically show that media reports focused on mask production and distribution policies in the first wave and the mandatory mask wearing in the second wave. In contrast, more reports on trivial, gossipy events consist of the media coverage in the second and third waves. The findings imply that Korea's governmental interventions to address the shortage of face masks and to regulate mask wearing were successful relatively in a short time. In contrast, the study also reports that there may be relative less number of science-based news reports like the ones on the effectiveness of face masks or different levels of filter types. This study exemplifies how a big data analysis can be applied to evaluate and enhance public health communication.