• Title/Summary/Keyword: text-mining

Search Result 1,538, Processing Time 0.03 seconds

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

  • Yoo, Ill-Hoi;Song, Min
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.2
    • /
    • pp.109-136
    • /
    • 2008
  • In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature searches required for the practice of evidence-based medicine. We introduce Swanson's UnDiscovered Public Knowledge (UDPK) model to generate biomedical hypotheses from biomedical literature such as MEDLINE by discovering novel connections among logically-related biomedical concepts. Another important area of text mining is document classification. Document classification is a valuable tool for biomedical tasks that involve large amounts of text. We survey well-known classification techniques in biomedicine. As the last example of text mining in biomedicine and healthcare, we survey information extraction. Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. We also address techniques and issues of evaluating text mining applications in biomedicine and healthcare.

Using Ontologies for Semantic Text Mining (시맨틱 텍스트 마이닝을 위한 온톨로지 활용 방안)

  • Yu, Eun-Ji;Kim, Jung-Chul;Lee, Choon-Youl;Kim, Nam-Gyu
    • The Journal of Information Systems
    • /
    • v.21 no.3
    • /
    • pp.137-161
    • /
    • 2012
  • The increasing interest in big data analysis using various data mining techniques indicates that many commercial data mining tools now need to be equipped with fundamental text analysis modules. The most essential prerequisite for accurate analysis of text documents is an understanding of the exact semantics of each term in a document. The main difficulties in understanding the exact semantics of terms are mainly attributable to homonym and synonym problems, which is a traditional problem in the natural language processing field. Some major text mining tools provide a thesaurus to solve these problems, but a thesaurus cannot be used to resolve complex synonym problems. Furthermore, the use of a thesaurus is irrelevant to the issue of homonym problems and hence cannot solve them. In this paper, we propose a semantic text mining methodology that uses ontologies to improve the quality of text mining results by resolving the semantic ambiguity caused by homonym and synonym problems. We evaluate the practical applicability of the proposed methodology by performing a classification analysis to predict customer churn using real transactional data and Q&A articles from the "S" online shopping mall in Korea. The experiments revealed that the prediction model produced by our proposed semantic text mining method outperformed the model produced by traditional text mining in terms of prediction accuracy such as the response, captured response, and lift.

Applications of the Text Mining Approach to Online Financial Information

  • Hansol Lee;Juyoung Kang;Sangun Park
    • Asia pacific journal of information systems
    • /
    • v.32 no.4
    • /
    • pp.770-802
    • /
    • 2022
  • With the development of deep learning techniques, text mining is producing breakthrough performance improvements, promising future applications, and practical use cases across many fields. Likewise, even though several attempts have been made in the field of financial information, few cases apply the current technological trends. Recently, companies and government agencies have attempted to conduct research and apply text mining in the field of financial information. First, in this study, we investigate various works using text mining to show what studies have been conducted in the financial sector. Second, to broaden the view of financial application, we provide a description of several text mining techniques that can be used in the field of financial information and summarize various paradigms in which these technologies can be applied. Third, we also provide practical cases for applying the latest text mining techniques in the field of financial information to provide more tangible guidance for those who will use text mining techniques in finance. Lastly, we propose potential future research topics in the field of financial information and present the research methods and utilization plans. This study can motivate researchers studying financial issues to use text mining techniques to gain new insights and improve their work from the rich information hidden in text data.

Interplay of Text Mining and Data Mining for Classifying Web Contents (웹 컨텐츠의 분류를 위한 텍스트마이닝과 데이터마이닝의 통합 방법 연구)

  • 최윤정;박승수
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.3
    • /
    • pp.33-46
    • /
    • 2002
  • Recently, unstructured random data such as website logs, texts and tables etc, have been flooding in the internet. Among these unstructured data there are potentially very useful data such as bulletin boards and e-mails that are used for customer services and the output from search engines. Various text mining tools have been introduced to deal with those data. But most of them lack accuracy compared to traditional data mining tools that deal with structured data. Hence, it has been sought to find a way to apply data mining techniques to these text data. In this paper, we propose a text mining system which can incooperate existing data mining methods. We use text mining as a preprocessing tool to generate formatted data to be used as input to the data mining system. The output of the data mining system is used as feedback data to the text mining to guide further categorization. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We apply this method to categorize web sites containing adult contents as well as illegal contents. The result shows improvements in categorization performance for previously ambiguous data.

  • PDF

A Study on Research Trends of Graph-Based Text Representations for Text Mining (텍스트 마이닝을 위한 그래프 기반 텍스트 표현 모델의 연구 동향)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.37-47
    • /
    • 2013
  • Text Mining is a research area of retrieving high quality hidden information such as patterns, trends, or distributions through analyzing unformatted text. Basically, since text mining assumes an unstructured text, it needs to be represented as a simple text model for analyzing it. So far, most frequently used model is VSM(Vector Space Model), in which a text is represented as a bag of words. However, recently much researches tried to apply a graph-based text model for representing semantic relationships between words. In this paper, we survey research trends of graph-based text representation models for text mining. Additionally, we also discuss about future models of graph-based text mining.

Text-Mining of Online Discourse to Characterize the Nature of Pain in Low Back Pain

  • Ryu, Young Uk
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.14 no.3
    • /
    • pp.55-62
    • /
    • 2019
  • PURPOSE: Text-mining has been shown to be useful for understanding the clinical characteristics and patients' concerns regarding a specific disease. Low back pain (LBP) is the most common disease in modern society and has a wide variety of causes and symptoms. On the other hand, it is difficult to understand the clinical characteristics and the needs as well as demands of patients with LBP because of the various clinical characteristics. This study examined online texts on LBP to determine of text-mining can help better understand general characteristics of LBP and its specific elements. METHODS: Online data from www.spine-health.com were used for text-mining. Keyword frequency analysis was performed first on the complete text of postings (full-text analysis). Only the sentences containing the highest frequency word, pain, were selected. Next, texts including the sentences were used to re-analyze the keyword frequency (pain-text analysis). RESULTS: Keyword frequency analysis showed that pain is of utmost concern. Full-text analysis was dominated by structural, pathological, and therapeutic words, whereas pain-text analysis was related mainly to the location and quality of the pain. CONCLUSION: The present study indicated that text-mining for a specific element (keyword) of a particular disease could enhance the understanding of the specific aspect of the disease. This suggests that a consideration of the text source is required when interpreting the results. Clinically, the present results suggest that clinicians pay more attention to the pain a patient is experiencing, and provide information based on medical knowledge.

Discovering Meaningful Trends in the Inaugural Addresses of North Korean Leader Via Text Mining (텍스트마이닝을 활용한 북한 지도자의 신년사 및 연설문 트렌드 연구)

  • Park, Chul-Soo
    • Journal of Information Technology Applications and Management
    • /
    • v.26 no.3
    • /
    • pp.43-59
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korean new year addresses, one of most important and authoritative document publicly announced by North Korean government. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. We propose a procedure to find meaningful tendencies based on a combination of text mining, cluster analysis, and co-occurrence networks. To demonstrate applicability and effectiveness of the proposed procedure, we analyzed the inaugural addresses of Kim Jung Un of the North Korea from 2017 to 2019. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. We found that uncovered semantic structures of North Korean new year addresses closely follow major changes in North Korean government's positions toward their own people as well as outside audience such as USA and South Korea.

Is Text Mining on Trade Claim Studies Applicable? Focused on Chinese Cases of Arbitration and Litigation Applying the CISG

  • Yu, Cheon;Choi, DongOh;Hwang, Yun-Seop
    • Journal of Korea Trade
    • /
    • v.24 no.8
    • /
    • pp.171-188
    • /
    • 2020
  • Purpose - This is an exploratory study that aims to apply text mining techniques, which computationally extracts words from the large-scale text data, to legal documents to quantify trade claim contents and enables statistical analysis. Design/methodology - This is designed to verify the validity of the application of text mining techniques as a quantitative methodology for trade claim studies, that have relied mainly on a qualitative approach. The subjects are 81 cases of arbitration and court judgments from China published on the website of the UNCITRAL where the CISG was applied. Validation is performed by comparing the manually analyzed result with the automatically analyzed result. The manual analysis result is the cluster analysis wherein the researcher reads and codes the case. The automatic analysis result is an analysis applying text mining techniques to the result of the cluster analysis. Topic modeling and semantic network analysis are applied for the statistical approach. Findings - Results show that the results of cluster analysis and text mining results are consistent with each other and the internal validity is confirmed. And the degree centrality of words that play a key role in the topic is high as the between centrality of words that are useful for grasping the topic and the eigenvector centrality of the important words in the topic is high. This indicates that text mining techniques can be applied to research on content analysis of trade claims for statistical analysis. Originality/value - Firstly, the validity of the text mining technique in the study of trade claim cases is confirmed. Prior studies on trade claims have relied on traditional approach. Secondly, this study has an originality in that it is an attempt to quantitatively study the trade claim cases, whereas prior trade claim cases were mainly studied via qualitative methods. Lastly, this study shows that the use of the text mining can lower the barrier for acquiring information from a large amount of digitalized text.

Applying Academic Theory with Text Mining to Offer Business Insight: Illustration of Evaluating Hotel Service Quality

  • Choong C. Lee;Kun Kim;Haejung Yun
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.615-643
    • /
    • 2019
  • Now is the time for IS scholars to demonstrate the added value of academic theory through its integration with text mining, clearly outline how to implement this for text mining experts outside of the academic field, and move towards establishing this integration as a standard practice. Therefore, in this study we develop a systematic theory-based text-mining framework (TTMF), and illustrate the use and benefits of TTMF by conducting a text-mining project in an actual business case evaluating and improving hotel service quality using a large volume of actual user-generated reviews. A total of 61,304 sentences extracted from actual customer reviews were successfully allocated to SERVQUAL dimensions, and the pragmatic validity of our model was tested by the OLS regression analysis results between the sentiment scores of each SERVQUAL dimension and customer satisfaction (star rates), and showed significant relationships. As a post-hoc analysis, the results of the co-occurrence analysis to define the root causes of positive and negative service quality perceptions and provide action plans to implement improvements were reported.

Case Study on Public Document Classification System That Utilizes Text-Mining Technique in BigData Environment (빅데이터 환경에서 텍스트마이닝 기법을 활용한 공공문서 분류체계의 적용사례 연구)

  • Shim, Jang-sup;Lee, Kang-wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.1085-1089
    • /
    • 2015
  • Text-mining technique in the past had difficulty in realizing the analysis algorithm due to text complexity and degree of freedom that variables in the text have. Although the algorithm demanded lots of effort to get meaningful result, mechanical text analysis took more time than human text analysis. However, along with the development of hardware and analysis algorithm, big data technology has appeared. Thanks to big data technology, all the previously mentioned problems have been solved while analysis through text-mining is recognized to be valuable as well. However, applying text-mining to Korean text is still at the initial stage due to the linguistic domain characteristics that the Korean language has. If not only the data searching but also the analysis through text-mining is possible, saving the cost of human and material resources required for text analysis will lead efficient resource utilization in numerous public work fields. Thus, in this paper, we compare and evaluate the public document classification by handwork to public document classification where word frequency(TF-IDF) in a text-mining-based text and Cosine similarity between each document have been utilized in big data environment.

  • PDF