Search | Korea Science

A Novel Technique of Topic Detection for On-line Text Documents: A Topic Tree-based Approach (온라인 텍스트문서의 계층적 트리 기반 주제탐색 기법)

Xuan, Man;Kim, Han-Joon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.396-399
- /
- 2012
Topic detection is a problem of discovering the topics of online publishing documents. For topic detection, it is important to extract correct topic words and to show the topical words easily to understand. We consider a topic tree-based approach to more effectively and more briefly show the result of topic detection for online text documents. In this paper, to achieve the topic tree-based topic detection, we propose a new term weighting method, called CTF-CDF-IDF, which is simple yet effective. Moreover, we have modified a conventional clustering method, which we call incremental k-medoids algorithm. Our experimental results with Reuters-21578 and Google news collections show that the proposed method is very useful for topic detection.
https://doi.org/10.3745/PKIPS.y2012m11a.396 인용 PDF

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.38 no.4
- /
- pp.595-599
- /
- 2018
Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.
https://doi.org/10.12652/Ksce.2018.38.4.0595 인용 PDF KSCI

An Exploratory Study of Health Inequality Discourse Using Korean Newspaper Articles: A Topic Modeling Approach

Kim, Jin-Hwan
- Journal of Preventive Medicine and Public Health
- /
- v.52 no.6
- /
- pp.384-392
- /
- 2019
Objectives: This study aimed to explore the health inequality discourse in the Korean press by analyzing newspaper articles using a relatively new content analysis technique. Methods: This study used the search term "health inequality" to collect articles containing that term that were published between 2000 and 2018. The collected articles went through pre-processing and topic modeling, and the contents and temporal trends of the extracted topics were analyzed. Results: A total of 1038 articles were identified, and 5 topics were extracted. As the number of studies on health inequality has increased over the past 2 decades, so too has the number of news articles regarding health inequality. The extracted topics were public health policies, social inequalities in health, inequality as a social problem, healthcare policies, and regional health gaps. The total number of occurrences of each topic increased every year, and the trend observed for each theme was influenced by events related to its contents, such as elections. Finally, the frequency of appearance of each topic differed depending on the type of news source. Conclusions: The results of this study can be used as preliminary data for future attempts to address health inequality in Korea. To make addressing health inequality part of the public agenda, the media's perspective and discourse regarding health inequality should be monitored to facilitate further strategic action.
https://doi.org/10.3961/jpmph.19.221 인용 PDF KSCI

Topic and Source Diversity of the Front Page in the New York Times, Chicago Tribune and the Los Angeles Times from 1950 to 2000 (20세기 하반기의 미 신문 1면 보도에 대한 다양성 분석: 뉴스 토픽과 정보원의 분포를 중심으로)

Shim, Hoon
- Korean journal of communication and information
- /
- v.30
- /
- pp.175-201
- /
- 2005
This study investigates the diversity of news topic and source of the New York Times, Chicago Tribune, and the Los Angeles Times in the second half of the twentieth century. In probing the conventional traits of the contemporary press, the researcher traced the changing patterns and trends of news values in terms of news-gathering routine in order to evaluate the journalistic role conception in terms of social responsibility theory. Findings indicated that the American press as a neutral transmitter has been consistently violated by source and topic bias without any significant changes during the last five decades. The data, however, revealed the evident shift of the contemporary press from the heavy reliance of official source to the business/economic source. In addition, news topics such as business, health, and education have replaced the conventional popular topics such as crime and accidents. By contrast, it was revealed that the unconventional topics such as poverty, labor and minority still fail to receive the large attention from the target papers.
PDF

Topic Modeling and Keyword Network Analysis of News Articles Related to Nurses before and after "the Thanks to You Challenge" during the COVID-19 Pandemic (COVID-19 '덕분에 챌린지' 전후 간호사 관련 뉴스 기사의 토픽 모델링 및 키워드 네트워크 분석)

Yun, Eun Kyoung;Kim, Jung Ok;Byun, Hye Min;Lee, Guk Geun
- Journal of Korean Academy of Nursing
- /
- v.51 no.4
- /
- pp.442-453
- /
- 2021
Purpose: This study was conducted to assess public awareness and policy challenges faced by practicing nurses. Methods: After collecting nurse-related news articles published before and after 'the Thanks to You Challenge' campaign (between December 31, 2019, and July 15, 2020), keywords were extracted via preprocessing. A three-step method keyword analysis, latent Dirichlet allocation topic modeling, and keyword network analysis was used to examine the text and the structure of the selected news articles. Results: Top 30 keywords with similar occurrences were collected before and after the campaign. The five dominant topics before the campaign were: pandemic, infection of medical staff, local transmission, medical resources, and return of overseas Koreans. After the campaign, the topics 'infection of medical staff' and 'return of overseas Koreans' disappeared, but 'the Thanks to You Challenge' emerged as a dominant topic. A keyword network analysis revealed that the word of nurse was linked with keywords like thanks and campaign, through the word of sacrifice. These words formed interrelated domains of 'the Thanks to You Challenge' topic. Conclusion: The findings of this study can provide useful information for understanding various issues and social perspectives on COVID-19 nursing. The major themes of news reports lagged behind the real problems faced by nurses in COVID-19 crisis. While the press tends to focus on heroism and whole society, issues and policies mutually beneficial to public and nursing need to be further explored and enhanced by nurses.
https://doi.org/10.4040/jkan.20287 인용 PDF KSCI

A Study on Children's Images during the Liberation Period Using Topic Modeling: With a focus on The Children's News (토픽 모델링을 이용한 해방기 아동상 연구 - 「어린이신문」을 중심으로 -)

Jang, Seok-Eun;Lee, Hye-Eun
- Journal of the Korean BIBLIA Society for library and Information Science
- /
- v.33 no.3
- /
- pp.157-178
- /
- 2022
This study explores children's images in The Children's News, a children's newspaper during the Liberation period. For this purpose, frequency analysis, topic modeling, and time series analysis were performed from the first issue of December 1, 1945 to the 86 issue of December 13, 1947, except for No. 34, which was not passed down. As a result of frequency analysis, keywords related to country, school, and family appeared frequently, and through topic modeling, children's images were observed in these topics, including children with patriotism, children with scientific literacy, children with artistic refinement, and children as social beings. The time series analysis results show that the percentage of patriotism-related topics was high during the early days of the Liberation period when The Children's News were published, but as the ratio of topics such as science and art gradually increased, it was confirmed that the image of children was diversified.
https://doi.org/10.14699/kbiblia.2022.33.3.157 인용 PDF KSCI

Trend Analysis of Pet Plants Before and After COVID-19 Outbreak Using Topic Modeling: Focusing on Big Data of News Articles from 2018 to 2021

Park, Yumin;Shin, Yong-Wook
- Journal of People, Plants, and Environment
- /
- v.24 no.6
- /
- pp.563-572
- /
- 2021
Background and objective: The ongoing COVID-19 pandemic restricted daily life, forcing people to spend time indoors. With the growing interest in mental health issues and residential environments, 'pet plants' have been receiving attention during the unprecedented social distancing measures. This study aims to analyze the change in trends of pet plants before and during the COVID-19 pandemic and provide basic data for studies related to pet plants and directions of future development. Methods: A total of 2,016 news articles using the keyword 'pet plants' were collected on Naver News from January 1, 2018 to August 15, 2019 (609 articles) and January 1, 2020 to August 15, 2021 (1,407 articles). The texts were tokenized into words using KoNLPy package, ultimately coming up with 63,597 words. The analyses included frequency of keywords and topic modeling based on Latent Dirichlet Allocation (LDA) to identify the inherent meanings of related words and each topic. Results: Topic modeling generated three topics in each period (before and during the COVID-19), and the results showed that pet plants in daily life have become the object of 'emotional support' and 'healing' during social distancing. In particular, pet plants, which had been distributed as a solution to prevent solitary deaths and depression among seniors living alone, are now expanded to help resolve the social isolation of the general public suffering from COVID-19. The new term 'plant butler' became a new trend, and there was a change in the trend in which people shared their hobbies and information about pet plants and communicated with others in online. Conclusion: Based on these findings, the trend data of pet plants before and after the outbreak of COVID-19 can provide the basis for activating research on pet plants and setting the direction for development of related industries considering the continuous popularity and trend of indoor gardening and green hobby.
https://doi.org/10.11628/ksppe.2021.24.6.563 인용

KOSPI index prediction using topic modeling and LSTM

Jin-Hyeon Joo;Geun-Duk Park
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.7
- /
- pp.73-80
- /
- 2024
In this paper, we proposes a method to improve the accuracy of predicting the Korea Composite Stock Price Index (KOSPI) by combining topic modeling and Long Short-Term Memory (LSTM) neural networks. In this paper, we use the Latent Dirichlet Allocation (LDA) technique to extract ten major topics related to interest rate increases and decreases from financial news data. The extracted topics, along with historical KOSPI index data, are input into an LSTM model to predict the KOSPI index. The proposed model has the characteristic of predicting the KOSPI index by combining the time series prediction method by inputting the historical KOSPI index into the LSTM model and the topic modeling method by inputting news data. To verify the performance of the proposed model, this paper designs four models (LSTM_K model, LSTM_KNS model, LDA_K model, LDA_KNS model) based on the types of input data for the LSTM and presents the predictive performance of each model. The comparison of prediction performance results shows that the LSTM model (LDA_K model), which uses financial news topic data and historical KOSPI index data as inputs, recorded the lowest RMSE (Root Mean Square Error), demonstrating the best predictive performance.
https://doi.org/10.9708/jksci.2024.29.07.073 인용 PDF HTML

Tweets analysis using a Dynamic Topic Modeling : Focusing on the 2019 Koreas-US DMZ Summit (트윗의 타임 시퀀스를 활용한 DTM 분석 : 2019 남북미정상회동 이벤트를 중심으로)

Ko, EunJi;Choi, SunYoung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.2
- /
- pp.308-313
- /
- 2021
In this study, tweets about the 2019 Koreas-US DMZ Summit were collected along with a time sequence and analyzed by a sequential topic modeling method, Dynamic Topic Modeling(DTM). In microblogging services such as Twitter, unstructured data that mixes news and an opinion about a single event occurs at the same time on a large scale, and information and reactions are produced in the same message format. Therefore, to grasp a topic trend, the contextual meaning can be found only by performing pattern analysis reflecting the characteristics of sequential data. As a result of calculating the DTM after obtaining the topic coherence score and evaluating the Latent Dirichlet Allocation(LDA), 30 topics related to news reports and opinions were derived, and the probability of occurrence of each topic and keywords were dynamically evolving. In conclusion, the study found that DTM is a suitable model for analyzing the trend of integrated topics in a specific event over time.
https://doi.org/10.6109/jkiice.2021.25.2.308 인용 PDF KSCI

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

Yun, Yeoil;Kim, Namgyu
- Journal of Information Technology Services
- /
- v.18 no.4
- /
- pp.135-149
- /
- 2019
Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.
https://doi.org/10.9716/KITS.2019.18.4.135 인용 PDF KSCI

Search Result 241, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)