• Title/Summary/Keyword: topic extraction

Search Result 124, Processing Time 0.024 seconds

Investigating Major Topics Through the Analysis of Depression-related Facebook Group Posts (페이스북 그룹 게시물 분석을 통한 우울증 관련 주제에 대한 고찰)

  • Zhu, Yongjun;Kim, Donghun;Lee, Changho;Lee, Yongjeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.4
    • /
    • pp.171-187
    • /
    • 2019
  • The study aims to analyze the posts of depression-related Facebook groups to understand major topics discussed by group users. Specifically, the purpose of the study is to identify the topics and keywords of the posts to understand what users discuss about depression. Depression is a mental disorder that is somewhat sensitive in the online community, which is characterized by accessibility, openness and anonymity. The researchers have implemented a natural language-based data analysis framework that includes components ranging from Facebook data collection to the automated extraction of topics. Using the framework, we collected and analyzed 885 posts created in the past one year from the largest Facebook depression group. To derive more complete and accurate topics, we combined both automated and manual (e.g., stop words removal, topic size determination) methods. Results indicate that users discuss a variety of topics including depression in general, human relations, mood and feeling, depression symptoms, suicide, medical references, family and etc.

Topic Automatic Extraction Model based on Unstructured Security Intelligence Report (비정형 보안 인텔리전스 보고서 기반 토픽 자동 추출 모델)

  • Hur, YunA;Lee, Chanhee;Kim, Gyeongmin;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.33-39
    • /
    • 2019
  • As cyber attack methods are becoming more intelligent, incidents such as security breaches and international crimes are increasing. In order to predict and respond to these cyber attacks, the characteristics, methods, and types of attack techniques should be identified. To this end, many security companies are publishing security intelligence reports to quickly identify various attack patterns and prevent further damage. However, the reports that each company distributes are not structured, yet, the number of published intelligence reports are ever-increasing. In this paper, we propose a method to extract structured data from unstructured security intelligence reports. We also propose an automatic intelligence report analysis system that divides a large volume of reports into sub-groups based on their topics, making the report analysis process more effective and efficient.

Text Network Analysis and Topic Modeling of News Articles on Lonely Death (고독사에 관한 언론보도기사의 텍스트네트워크 분석 및 토픽모델링)

  • Kim, Chunmi;Choi, Seungbeom;Kim, Eun Man
    • Journal of Korean Academy of Rural Health Nursing
    • /
    • v.18 no.2
    • /
    • pp.113-124
    • /
    • 2023
  • Purpose: The number of households vulnerable to isolation increases rapidly as social ties decrease, raising concerns about the associated increase in lonely deaths. This study aimed to identify issues related to lonely deaths by analyzing South Korean news articles; and to provide evidence for their use in preventing and managing lonely deaths via community nursing. Methods: This exploratory study analyzed the structure and trends of meaning of lonely deaths by identifying the association between keywords in news articles and lonely deaths. In this study, we searched for all news articles on lonely deaths, covering the period from January 1, 2010, to May 31, 2023. Data preprocessing and purification were conducted, followed by top-keyword extraction, keyword network analysis and topic modeling. The retrieved articles were analyzed using R and Python software. Results: Four main topics were identified: "discovering and responding to lonely death cases", "lonely deaths ending in lonely funerals", "supportive policies to prevent lonely deaths among of older adults", and "local government activities to prevent lonely deaths and support vulnerable populations." Conclusion: Based on these findings, it can be concluded that lonely death is a complex social phenomenon that can be prevented if society shows concern and care. Education related to lonely deaths should be included in nursing curricula for concrete action plans and professional development.

Analysis of Major COVID-19 Issues Using Unstructured Big Data (비정형 빅데이터를 이용한 COVID-19 주요 이슈 분석)

  • Kim, Jinsol;Shin, Donghoon;Kim, Heewoong
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.145-165
    • /
    • 2021
  • As of late December 2019, the spread of COVID-19 pandemic began which put the entire world in panic. In order to overcome the crisis and minimize any subsequent damage, the government as well as its affiliated institutions must maximize effects of pre-existing policy support and introduce a holistic response plan that can reflect this changing situation- which is why it is crucial to analyze social topics and people's interests. This study investigates people's major thoughts, attitudes and topics surrounding COVID-19 pandemic through the use of social media and big data. In order to collect public opinion, this study segmented time period according to government countermeasures. All data were collected through NAVER blog from 31 December 2019 to 12 December 2020. This research applied TF-IDF keyword extraction and LDA topic modeling as text-mining techniques. As a result, eight major issues related to COVID-19 have been derived, and based on these keywords, this research presented policy strategies. The significance of this study is that it provides a baseline data for Korean government authorities in providing appropriate countermeasures that can satisfy needs of people in the midst of COVID-19 pandemic.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

The Literature Study of Research Trend of Menthae Herba and Relationship Between the Herbology and KCD-code (박하(薄荷)의 국내·외 연구동향과 『본초학』, 한국표준질병사인분류의 상관관계에 대한 연구)

  • Kim, Hyun-Seok;Jeong, Jong-Kil;Lee, Soong-In
    • The Korea Journal of Herbology
    • /
    • v.30 no.5
    • /
    • pp.29-43
    • /
    • 2015
  • Objectives : This study was aimed to analyze the correlation between the Herbology and contemporary research results, KCD-codes.Methods : Papers were searched in OASIS and PubMed, then they were categorized. Medicine or pharmacy articles about Menthae Herba were matched with the Herbology treatment and KCD-codes. Other articles were analyzed by abstract of the papers. KCD-codes and terms were arranged by the Herbology treatment. The Degree of Herbology research (HDR) was calculated by numbers of papers, study method, citation rates.Results : There were 97 articles about Menthae Herba. Among these there were 47 medicine and pharmacy articles about Menthae Herba, and 15 articles were matched to the Herbology treatment. Studies about Headache and Wind-warmth was more active than others. Analysis of other articles showed that studies about contraceptive and anti-oxidative effect, plant growth, protective effect from insect, component extraction technics were active, too. In HDR, headache was scored by 136, Wind-warmth by 104, Eye hemorrhage by 51, Discomport in the throat by 50, Distention and fullness in the chest and hypochondrium by 15, Rubella and Measles by 0.Conclusions : 97 articles about Menthae Herba were analyzed and 15 articles were matched to the Herbology treatment. Studies about headache, wind-warmth, eye-hemorrhage were more active than others of the Herbology treatment. And studies about contraceptive and anti-oxidative effect, plant growth, protective effect from insect, component extraction technics could be a new subject of the Herbology.

A comparative study on the relationship between estimates of critical velocity and number of jet fans for smoke control - A 'Fire-JF' contour map in road tunnels (임계속도와 제연팬 용량의 상관관계 연구 - 도로터널의 제연팬 특성도 연구)

  • Kim, Hyo-Gyu;Kim, Eun-Soo;Kim, Nam-Young;Lee, Chang-Woo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.6 no.4
    • /
    • pp.269-278
    • /
    • 2004
  • Recently, critical velocity has become a topic to attract most interests from the researchers in the field of tunnel safety. As the minimum velocity to prevent smoke backlayering during a fire, many equations have been proposed so far, and the following three equations are being considered as a standard method in Korea to calculate the capacity of smoke extraction fans. Equation by Kennedy based on Froude number, Tetzner' s equation with additional variable, ${\beta}$ to modify the Kennedy's equation, and the equation with the concept of super critical velocity by Wu are studied in this paper for the comparative purpose. A contour map describing the relationship between the critical velocity and the capacity of smoke extraction fans is proposed as a tool to calculate the number of jet fans for smoke control during a fire in the local tunnels.

  • PDF

Text Extraction Algorithm using the HTML Logical Structure Analysis (HTML 논리적 구조분석을 통한 본문추출 알고리즘)

  • Jeon, Hyun-Gee;KOH, Chan
    • Journal of Digital Contents Society
    • /
    • v.16 no.3
    • /
    • pp.445-455
    • /
    • 2015
  • According as internet and computer technology develops, the amount of information has increased exponentially, arising from a variety of web authoring tools and is a new web standard of appearance and a wide variety of web content accessibility as more convenient for the web are produced very quickly. However, web documents are put out on a variety of topics divided into some blocks where each of the blocks are dealing with a topic unrelated to one another as well as you can not see with contents such as many navigations, simple decorations, advertisements, copyright. Extract only the exact area of the web document body to solve this problem and to meet user requirements, and to study the effective information. Later on, as the reconstruction method, we propose a web search system can be optimized systematically manage documents.

An Analysis of the 2017 Korean Presidential Election Using Text Mining (텍스트 마이닝을 활용한 2017년 한국 대선 분석)

  • An, Eunhee;An, Jungkook
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.5
    • /
    • pp.199-207
    • /
    • 2020
  • Recently, big data analysis has drawn attention in various fields as it can generate value from large amounts of data and is also used to run political campaigns or predict results. However, existing research had limitations in compiling information about candidates at a high-level by analyzing only specific SNS data. Therefore, this study analyses news trends, topics extraction, sentiment analysis, keyword analysis, comment analysis for the 2017 presidential election of South Korea. The results show that various topics had been generated, and online opinions are extracted for trending keywords of respective candidates. This study also shows that portal news and comments can serve as useful tools for predicting the public's opinion on social issues. This study will This paper advances a building strategic course of action by providing a method of analyzing public opinion across various fields.

Judgment about the Usefulness of Automatically Extracted Temporal Information from News Articles for Event Detection and Tracking (사건 탐지 및 추적을 위해 신문기사에서 자동 추출된 시간정보의 유용성 판단)

  • Kim Pyung;Myaeng Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.6
    • /
    • pp.564-573
    • /
    • 2006
  • Temporal information plays an important role in natural language processing (NLP) applications such as information extraction, discourse analysis, automatic summarization, and question-answering. In the topic detection and tracking (TDT) area, the temporal information often used is the publication date of a message, which is readily available but limited in its usefulness. We developed a relatively simple NLP method of extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks. To extract temporal information, we make use of finite state automata and a lexicon containing time-revealing vocabulary. Extracted information is converted into a canonicalized representation of a time point or a time duration. We first evaluated the extraction and canonicalization methods for their accuracy and investigated on the extent to which temporal information extracted as such can help TDT tasks. The experimental results show that time information extracted from text indeed helps improve both precision and recall significantly.