• Title/Summary/Keyword: Dirichlet Process

Search Result 72, Processing Time 0.026 seconds

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Research Trend Analysis of Publications in the Journal of Home Economics Education Association Using Network Text Analysis (네트워크 텍스트 분석을 이용한 한국가정과교육학회지 논문의 연구 동향 분석)

  • Lee, Yoon-Jung;Kim, Eun Jeung;Kim, Ji sun
    • Journal of Korean Home Economics Education Association
    • /
    • v.31 no.4
    • /
    • pp.1-18
    • /
    • 2019
  • The purpose of this study was to analyze the research trend in home economics education using network text analysis method. The 586 research articles published in the Journal of Home Economics Education Association between July, 2003 and December 2018 were examined using Neckinger 4, a social network analysis software. The frequency and centrality measures(degree centrality, closeness centrality, and betweenness centrality) were calculated for the words appeared throughout the whole period, and the centrality analysis and LAD(Latent Dirichlet Allocation) were conducted for the four sub-periods. The results are as follows: first, the most frequently appeared words are parents, culture, unit, health, career, consumption, practicality, etc. The words such as parents and management scored high in degree centrality; parents and male students in closeness centrality; and male students and units in betweenness centrality. Second, when divided into four periods, the words such as education, family, purpose, class, middle school, and school appeared most frequently across the periods; but some words such as 'purpose' (in period 3 and 4), or 'process' (in period 4) were salient only in certain periods. Third, the words with high centrality were consistent regardless of the types of centrality within each period. Fourth, the topic analysis using LAD showed that curriculum, textbook, family healthiness, teaching-learning, evaluation, dietary life, appearance management, and consumption were the topics consistently appeared across all periods. The topics have become diversified and deepened. New topics such as teacher training and safety appeared in later periods, possibly due to the curriculum and national policy changes, and housing as a less represented topic is suggested as an area that needs further research attention. This study has implication in that it allows researchers to identify the major research interests and the trends in research by researchers in home economic education.