• Title/Summary/Keyword: Text Mining for Korean

Search Result 638, Processing Time 0.022 seconds

KONG-DB: Korean Novel Geo-name DB & Search and Visualization System Using Dictionary from the Web (KONG-DB: 웹 상의 어휘 사전을 활용한 한국 소설 지명 DB, 검색 및 시각화 시스템)

  • Park, Sung Hee
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.321-343
    • /
    • 2016
  • This study aimed to design a semi-automatic web-based pilot system 1) to build a Korean novel geo-name, 2) to update the database using automatic geo-name extraction for a scalable database, and 3) to retrieve/visualize the usage of an old geo-name on the map. In particular, the problem of extracting novel geo-names, which are currently obsolete, is difficult to solve because obtaining a corpus used for training dataset is burden. To build a corpus for training data, an admin tool, HTML crawler and parser in Python, crawled geo-names and usages from a vocabulary dictionary for Korean New Novel enough to train a named entity tagger for extracting even novel geo-names not shown up in a training corpus. By means of a training corpus and an automatic extraction tool, the geo-name database was made scalable. In addition, the system can visualize the geo-name on the map. The work of study also designed, implemented the prototype and empirically verified the validity of the pilot system. Lastly, items to be improved have also been addressed.

Consumers' perceptions of dietary supplements before and after the COVID-19 pandemic based on big data

  • Eunjung Lee;Hyo Sun Jung;Jin A Jang
    • Journal of Nutrition and Health
    • /
    • v.56 no.3
    • /
    • pp.330-347
    • /
    • 2023
  • Purpose: This study identified words closely associated with the keyword "dietary supplement" (DS) using big data in Korean social media and investigated consumer perceptions and trends related to DSs before (2019) and after the coronavirus disease 2019 (COVID-19) pandemic (2021). Methods: A total of 37,313 keywords were found for the 2019 period, and 35,336 keywords were found for the 2021 period using blogs and cafes on Daum and Naver. Results were derived by text mining, semantic networking, network visualization analysis, and sentiment analysis. Results: The DS-related keywords that frequently appeared before and after COVID-19 were "recommend", "vitamin", "health", "children", "multiple", and "lactobacillus". "Calcium", "lutein", "skin", and "immunity" also had high frequency-inverse document frequency (TF-IDF) values. These keywords imply a keen interest in DSs among Korean consumers. Big data results also reflected social phenomena related to DSs; for example, "baby" and "pregnant woman" had lower TD-IDF values after the pandemic, suggesting lower marriage and birth rates but higher values for "joint", indicating reduced physical activity. A network centered on vitamins and health care was produced by semantic network analysis in 2019. In 2021, values were highest for deficiency and need, indicating that individuals were searching for DSs after the COVID-19 pandemic due to a lack an awareness of the need for adequate nutrient intake. Before the pandemic, DSs and vitamins were associated with healthcare and life cycle-related topics, such as pregnancy, but after the COVID-19 pandemic, consumer interests changed to disease prevention and treatment. Conclusion: This study provides meaningful clues regarding consumer perceptions and trends related to DSs before and after the COVID-19 pandemic and fundamental data on the effect of the pandemic on consumer interest in dietary supplements.

Unstructured Data Analysis using Equipment Check Ledger: A Case Study in Telecom Domain (장비점검 일지의 비정형 데이터분석을 통한 고장 대응 효율화 사례 연구)

  • Ju, Yeonjin;Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.127-135
    • /
    • 2020
  • As the importance of the use and analysis of big data is emerging, there is a growing interest in natural language processing techniques for unstructured data such as news articles and comments. Particularly, as the collection of big data becomes possible, data mining techniques capable of pre-processing and analyzing data are emerging. In this case study with a telecom company, we propose a methodology how to formalize unstructured data using text mining. The domain is determined as equipment failure and the data is about 2.2 million equipment check ledger data. Data on equipment failures by 800,000 per year is accumulated in the equipment check ledger. The equipment check ledger coexist with both formal and unstructured data. Although formal data can be easily used for analysis, unstructured data is difficult to be used immediately for analysis. However, in unstructured data, there is a high possibility that important information. Because it can be contained that is not written in a formal. Therefore, in this study, we study to develop digital transformation method for unstructured data in equipment check ledger.

A Study on Collecting and Structuring Language Resource for Named Entity Recognition and Relation Extraction from Biomedical Abstracts (생의학 분야 학술 논문에서의 개체명 인식 및 관계 추출을 위한 언어 자원 수집 및 통합적 구조화 방안 연구)

  • Kang, Seul-Ki;Choi, Yun-Soo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.227-248
    • /
    • 2017
  • This paper introduces an integrated model for systematically constructing a linguistic resource database that can be used by machine learning-based biomedical information extraction systems. The proposed method suggests an orderly process of collecting and constructing dictionaries and training sets for both named-entity recognition and relation extraction. Multiple heterogeneous structures for the resources which are collected from diverse sources are analyzed to derive essential items and fields for constructing the integrated database. All the collected resources are converted and refined to build an integrated linguistic resource storage. In this paper, we constructed entity dictionaries of gene, protein, disease and drug, which are considered core linguistic elements or core named entities in the biomedical domains and conducted verification tests to measure their acceptability.

Korea's Trade Rules Analysis using Topic Modeling : from 2000 to 2022 (토픽 모델링을 이용한 한국 무역규범 연구동향 분석 : 2000년~2022년)

  • Byeong-Ho Lim;Jeong-In Chang;Tae-Han Kim;Ha-Neul Han
    • Korea Trade Review
    • /
    • v.48 no.1
    • /
    • pp.55-81
    • /
    • 2023
  • The purpose of this study is to analyze the main issues and trends of Korean trade, and to draw implications for future research regarding trade rules. A total of 476 academic journal are analyzed using English keyword searched for 'Trade Rules' from 2000 to July 2022 in the Korean Journal Citation Index data base. The analysis methodology includes co-occurrence network and topic trend analysis which is a kind of text mining methods. The results shows that key words representing Korea's trade trend fall into four categories in which the number of research journals has rapidly increased, which are Topic 4 (Investment Treaty), Topic 7 (Trade Security), Topic 8 (China's Protectionism), and Topic 11 (Trade Settlement). The major background for these topics is the tension between the United States and China threatening the existing international trade system. A detailed study for China's protectionism, changes in trade security system, and new investment agreements, and changes in payment methods will be the challenges in near future.

Sentiment Analysis of Elderly and Job in the Demographic Cliff (인구절벽사회에서 노인과 일자리 감성분석)

  • Kim, Yang-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.110-118
    • /
    • 2020
  • Social media data serves as a proxy indicator to understand the problems and the future of public opinion in Korean society. This research used 109,015 news data from 2016 to 2018 to analyze the sensitivity of the elderly and employment in Korean society, and explored the possibility of expanding the labor force in Korean society, which is facing a cliff between the elderly and the population. Topic keywords for employment of the elderly include "elderly*employment", "elderly*employment", and "elderly*wage". As a result of the analysis, positive sensitivity prevails for most of the period, and it is possible to expand the working-age population. Positive feelings about expanding employment opportunities for the elderly and negative feelings about low wages have brought to light the reality of the elderly who are still poor despite their work. In this study, social big data was used to analyze the perceptions and sensibilities of Korean society related to the elderly and employment through hierarchical crowd analysis and related text mining analysis.

The Detection Model of Disaster Issues based on the Risk Degree of Social Media Contents (소셜미디어 위험도기반 재난이슈 탐지모델)

  • Choi, Seon Hwa
    • Journal of the Korean Society of Safety
    • /
    • v.31 no.6
    • /
    • pp.121-128
    • /
    • 2016
  • Social Media transformed the mass media based information traffic, and it has become a key resource for finding value in enterprises and public institutions. Particularly, in regards to disaster management, the necessity for public participation policy development through the use of social media is emphasized. National Disaster Management Research Institute developed the Social Big Board, which is a system that monitors social Big Data in real time for purposes of implementing social media disaster management. Social Big Board collects a daily average of 36 million tweets in Korean in real time and automatically filters disaster safety related tweets. The filtered tweets are then automatically categorized into 71 disaster safety types. This real time tweet monitoring system provides various information and insights based on the tweets, such as disaster issues, tweet frequency by region, original tweets, etc. The purpose of using this system is to take advantage of the potential benefits of social media in relations to disaster management. It is a first step towards disaster management that communicates with the people that allows us to hear the voice of the people concerning disaster issues and also understand their emotions at the same time. In this paper, Korean language text mining based Social Big Board will be briefly introduced, and disaster issue detection model, which is key algorithms, will be described. Disaster issues are divided into two categories: potential issues, which refers to abnormal signs prior to disaster events, and occurrence issues, which is a notification of disaster events. The detection models of these two categories are defined and the performance of the models are compared and evaluated.

A Comparative Study on Deep Learning Topology for Event Extraction from Biomedical Literature (생의학 분야 학술 문헌에서의 이벤트 추출을 위한 심층 학습 모델 구조 비교 분석 연구)

  • Kim, Seon-Wu;Yu, Seok Jong;Lee, Min-Ho;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.77-97
    • /
    • 2017
  • A recent sharp increase of the biomedical literature causes researchers to struggle to grasp the current research trends and conduct creative studies based on the previous results. In order to alleviate their difficulties in keeping up with the latest scholarly trends, numerous attempts have been made to develop specialized analytic services that can provide direct, intuitive and formalized scholarly information by using various text mining technologies such as information extraction and event detection. This paper introduces and evaluates total 8 Convolutional Neural Network (CNN) models for extracting biomedical events from academic abstracts by applying various feature utilization approaches. Also, this paper conducts performance comparison evaluation for the proposed models. As a result of the comparison, we confirmed that the Entity-Type-Fully-Connected model, one of the introduced models in the paper, showed the most promising performance (72.09% in F-score) in the event classification task while it achieved a relatively low but comparable result (21.81%) in the entire event extraction process due to the imbalance problem of the training collections and event identify model's low performance.

Predicting Functional Outcomes of Patients With Stroke Using Machine Learning: A Systematic Review (머신러닝을 활용한 뇌졸중 환자의 기능적 결과 예측: 체계적 고찰)

  • Bae, Suyeong;Lee, Mi Jung;Nam, Sanghun;Hong, Ickpyo
    • Therapeutic Science for Rehabilitation
    • /
    • v.11 no.4
    • /
    • pp.23-39
    • /
    • 2022
  • Objective : To summarize clinical and demographic variables and machine learning uses for predicting functional outcomes of patients with stroke. Methods : We searched PubMed, CINAHL and Web of Science to identify published articles from 2010 to 2021. The search terms were "machine learning OR data mining AND stroke AND function OR prediction OR/AND rehabilitation". Articles exclusively using brain imaging techniques, deep learning method and articles without available full text were excluded in this study. Results : Nine articles were selected for this study. Support vector machines (19.05%) and random forests (19.05%) were two most frequently used machine learning models. Five articles (55.56%) demonstrated that the impact of patient initial and/or discharge assessment scores such as modified ranking scale (mRS) or functional independence measure (FIM) on stroke patients' functional outcomes was higher than their clinical characteristics. Conclusions : This study showed that patient initial and/or discharge assessment scores such as mRS or FIM could influence their functional outcomes more than their clinical characteristics. Evaluating and reviewing initial and or discharge functional outcomes of patients with stroke might be required to develop the optimal therapeutic interventions to enhance functional outcomes of patients with stroke.

Perceptions of Residents in Relation to Smartphone Applications to Promote Understanding of Radiation Exposure after the Fukushima Accident: A Cross-Sectional Study within and outside Fukushima Prefecture

  • Kuroda, Yujiro;Goto, Jun;Yoshida, Hiroko;Takahashi, Takeshi
    • Journal of Radiation Protection and Research
    • /
    • v.47 no.2
    • /
    • pp.67-76
    • /
    • 2022
  • Background: We conducted a cross-sectional study of residents within and outside Fukushima Prefecture to clarify their perceptions of the need for smartphone applications (apps) for explaining exposure doses. The results will lead to more effective methods for identifying target groups for future app development by researchers and municipalities, which will promote residents' understanding of radiological situations. Materials and Methods: In November 2019, 400 people in Fukushima Prefecture and 400 people outside were surveyed via a web-based questionnaire. In addition to basic characteristics, survey items included concerns about radiation levels and intention to use a smartphone app to keep track of exposure. The analysis was conducted by stratifying responses in each region and then cross-tabulating responses to concerns about radiation levels and intention to use an app by demographic variables. The intention to use an app was analyzed by binomial logistic regression analysis. Text-mining analyses were conducted in KH Coder software. Results and Discussion: Outside Fukushima Prefecture, concerns about the medical exposure of women to radiation exceeded 30%. Within the prefecture, the medical exposure of women, purchasing food products, and consumption of own-grown food were the main concerns. Within the prefecture, having children under the age of 18, the experience of measurement, and having experience of evacuation were significantly related to the intention to use an app. Conclusion: Regional and individual differences were evident. Since respondents differ, it is necessary to develop and promote app use in accordance with their needs and with phases of reconstruction. We expect that a suitable app will not only collect data but also connect local service providers and residents, while protecting personal information.