• Title/Summary/Keyword: Text Mining for Korean

Search Result 638, Processing Time 0.023 seconds

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

  • Yoo, Ill-Hoi;Song, Min
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.2
    • /
    • pp.109-136
    • /
    • 2008
  • In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature searches required for the practice of evidence-based medicine. We introduce Swanson's UnDiscovered Public Knowledge (UDPK) model to generate biomedical hypotheses from biomedical literature such as MEDLINE by discovering novel connections among logically-related biomedical concepts. Another important area of text mining is document classification. Document classification is a valuable tool for biomedical tasks that involve large amounts of text. We survey well-known classification techniques in biomedicine. As the last example of text mining in biomedicine and healthcare, we survey information extraction. Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. We also address techniques and issues of evaluating text mining applications in biomedicine and healthcare.

Interplay of Text Mining and Data Mining for Classifying Web Contents (웹 컨텐츠의 분류를 위한 텍스트마이닝과 데이터마이닝의 통합 방법 연구)

  • 최윤정;박승수
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.3
    • /
    • pp.33-46
    • /
    • 2002
  • Recently, unstructured random data such as website logs, texts and tables etc, have been flooding in the internet. Among these unstructured data there are potentially very useful data such as bulletin boards and e-mails that are used for customer services and the output from search engines. Various text mining tools have been introduced to deal with those data. But most of them lack accuracy compared to traditional data mining tools that deal with structured data. Hence, it has been sought to find a way to apply data mining techniques to these text data. In this paper, we propose a text mining system which can incooperate existing data mining methods. We use text mining as a preprocessing tool to generate formatted data to be used as input to the data mining system. The output of the data mining system is used as feedback data to the text mining to guide further categorization. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We apply this method to categorize web sites containing adult contents as well as illegal contents. The result shows improvements in categorization performance for previously ambiguous data.

  • PDF

Text-Mining of Online Discourse to Characterize the Nature of Pain in Low Back Pain

  • Ryu, Young Uk
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.14 no.3
    • /
    • pp.55-62
    • /
    • 2019
  • PURPOSE: Text-mining has been shown to be useful for understanding the clinical characteristics and patients' concerns regarding a specific disease. Low back pain (LBP) is the most common disease in modern society and has a wide variety of causes and symptoms. On the other hand, it is difficult to understand the clinical characteristics and the needs as well as demands of patients with LBP because of the various clinical characteristics. This study examined online texts on LBP to determine of text-mining can help better understand general characteristics of LBP and its specific elements. METHODS: Online data from www.spine-health.com were used for text-mining. Keyword frequency analysis was performed first on the complete text of postings (full-text analysis). Only the sentences containing the highest frequency word, pain, were selected. Next, texts including the sentences were used to re-analyze the keyword frequency (pain-text analysis). RESULTS: Keyword frequency analysis showed that pain is of utmost concern. Full-text analysis was dominated by structural, pathological, and therapeutic words, whereas pain-text analysis was related mainly to the location and quality of the pain. CONCLUSION: The present study indicated that text-mining for a specific element (keyword) of a particular disease could enhance the understanding of the specific aspect of the disease. This suggests that a consideration of the text source is required when interpreting the results. Clinically, the present results suggest that clinicians pay more attention to the pain a patient is experiencing, and provide information based on medical knowledge.

Case Study on Public Document Classification System That Utilizes Text-Mining Technique in BigData Environment (빅데이터 환경에서 텍스트마이닝 기법을 활용한 공공문서 분류체계의 적용사례 연구)

  • Shim, Jang-sup;Lee, Kang-wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.1085-1089
    • /
    • 2015
  • Text-mining technique in the past had difficulty in realizing the analysis algorithm due to text complexity and degree of freedom that variables in the text have. Although the algorithm demanded lots of effort to get meaningful result, mechanical text analysis took more time than human text analysis. However, along with the development of hardware and analysis algorithm, big data technology has appeared. Thanks to big data technology, all the previously mentioned problems have been solved while analysis through text-mining is recognized to be valuable as well. However, applying text-mining to Korean text is still at the initial stage due to the linguistic domain characteristics that the Korean language has. If not only the data searching but also the analysis through text-mining is possible, saving the cost of human and material resources required for text analysis will lead efficient resource utilization in numerous public work fields. Thus, in this paper, we compare and evaluate the public document classification by handwork to public document classification where word frequency(TF-IDF) in a text-mining-based text and Cosine similarity between each document have been utilized in big data environment.

  • PDF

Analysis of Aviation Safety Management Issues using Text Mining (Text Mining 기법을 활용한 항공안전관리 이슈 분석)

  • Moonjin Kwon;Jang Ryong Lee
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.31 no.4
    • /
    • pp.19-27
    • /
    • 2023
  • In this study, a total of 2,584 domestic research papers with the keywords "Aviation Safety" and "Aviation Accidents" were subjected to Text Mining analysis. Various text mining techniques, including keyword frequency analysis, word correlation analysis, network analysis, and topic modeling, were applied to examine the research trends in the field of aviation safety. The results revealed a significant increase in research using the keyword "Aviation Safety" since 2015, with over 300 papers published annually. Through keyword frequency analysis, it was observed that "Aircraft" was the most frequently mentioned term, followed by "Drones" and "Unmanned Aircraft." Phi coefficients were calculated for words closely related to "Aircraft," "Aviation," "Drones," and "Safety." Furthermore, topic modeling was employed to identify 12 distinct topics in the field of aviation safety and aviation accidents, allowing for an in-depth exploration of research trends.

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.6
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

Research on Methods for Processing Nonstandard Korean Words on Social Network Services (소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구)

  • Lee, Jong-Hwa;Le, Hoanh Su;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.35-46
    • /
    • 2016
  • Social network services (SNS) that help to build relationship network and share a particular interest or activity freely according to their interests by posting comments, photos, videos,${\ldots}$ on online communities such as blogs have adopted and developed widely as a social phenomenon. Several researches have been done to explore the pattern and valuable information in social networks data via text mining such as opinion mining and semantic analysis. For improving the efficiency of text mining, keyword-based approach have been applied but most of researchers argued the limitations of the rules of Korean orthography. This research aims to construct a database of non-standard Korean words which are difficulty in data mining such abbreviations, slangs, strange expressions, emoticons in order to improve the limitations in keyword-based text mining techniques. Based on the study of subjective opinions about specific topics on blogs, this research extracted non-standard words that were found useful in text mining process.

Investigation of the Possibility of Research on Medical Classics Applying Text Mining - Focusing on the Huangdi's Internal Classic - (텍스트마이닝(Text mining)을 활용한 한의학 원전 연구의 가능성 모색 -『황제내경(黃帝內經)』에 대한 적용례를 중심으로 -)

  • Bae, Hyo-jin;Kim, Chang-eop;Lee, Choong-yeol;Shin, Sang-won;Kim, Jong-hyun
    • Journal of Korean Medical classics
    • /
    • v.31 no.4
    • /
    • pp.27-46
    • /
    • 2018
  • Objectives : In this paper, we investigated the applicability of text mining to Korean Medical Classics and suggest that researchers of Medical Classics utilize this methodology. Methods : We applied text mining to the Huangdi's internal classic, a seminal text of Korean Medicine, and visualized networks which represent connectivity of terms and documents based on vector similarity. Then we compared this outcome to the prior knowledge generated through conventional qualitative analysis and examined whether our methodology could accurately reflect the keyword of documents, clusters of terms, and relationships between documents. Results : In the term network, we confirmed that Qi played a key role in the term network and that the theory development based on relativity between Yin and Yang was reflected. In the document network, Suwen and Lingshu are quite distinct from each other due to their differences in description form and topic. Also, Suwen showed high similarity between adjacent chapters. Conclusions : This study revealed that text mining method could yield a significant discovery which corresponds to prior knowledge about Huangdi's internal classic. Text mining can be used in a variety of research fields covering medical classics, literatures, and medical records. In addition, visualization tools can also be utilized for educational purposes.

Violation Pattern Analysis for Good Manufacturing Practice for Medicine using t-SNE Based on Association Rule and Text Mining (우수 의약품 제조 기준 위반 패턴 인식을 위한 연관규칙과 텍스트 마이닝 기반 t-SNE분석)

  • Jun-O, Lee;So Young, Sohn
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.717-734
    • /
    • 2022
  • Purpose: The purpose of this study is to effectively detect violations that occur simultaneously against Good Manufacturing Practice, which were concealed by drug manufacturers. Methods: In this study, we present an analysis framework for analyzing regulatory violation patterns using Association Rule Mining (ARM), Text Mining, and t-distributed Stochastic Neighbor Embedding (t-SNE) to increase the effectiveness of on-site inspection. Results: A number of simultaneous violation patterns was discovered by applying Association Rule Mining to FDA's inspection data collected from October 2008 to February 2022. Among them there were 'concurrent violation patterns' derived from similar regulatory ranges of two or more regulations. These patterns do not help to predict violations that simultaneously appear but belong to different regulations. Those unnecessary patterns were excluded by applying t-SNE based on text-mining. Conclusion: Our proposed approach enables the recognition of simultaneous violation patterns during the on-site inspection. It is expected to decrease the detection time by increasing the likelihood of finding intentionally concealed violations.

Examining the Intellectual Structure of Housing Studies in Korea with Text Mining and Factor Analysis (저자 프로파일링과 요인분석을 이용한 국내 주거학 분야의 지적 구조 분석)

  • Lee, Jae-Yun;Kim, Hee-Jeon;Ryoo, Jong-Duk
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.2
    • /
    • pp.285-308
    • /
    • 2010
  • This study analyzes the intellectual structure in domestic research of the Housing field, by utilizing text mining technique. Unlike the existing research that mainly uses text clustering in statistical analyses to identify subject specialties, core authors, and relationships between research areas, this study applied author profiling and factor analysis. To supplement the analysis of intellectual structure generated by text mining, and to perform evaluation on intellectual structure itself, two professionals in the housing field were interviewed. The intellectual structure, generated through text mining, was evaluated and showed its division of valid research areas that is slightly different from the traditional intellectual structure in the housing field.