• Title/Summary/Keyword: Text frequency analysis

Search Result 453, Processing Time 0.025 seconds

Text-Mining of Online Discourse to Characterize the Nature of Pain in Low Back Pain

  • Ryu, Young Uk
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.14 no.3
    • /
    • pp.55-62
    • /
    • 2019
  • PURPOSE: Text-mining has been shown to be useful for understanding the clinical characteristics and patients' concerns regarding a specific disease. Low back pain (LBP) is the most common disease in modern society and has a wide variety of causes and symptoms. On the other hand, it is difficult to understand the clinical characteristics and the needs as well as demands of patients with LBP because of the various clinical characteristics. This study examined online texts on LBP to determine of text-mining can help better understand general characteristics of LBP and its specific elements. METHODS: Online data from www.spine-health.com were used for text-mining. Keyword frequency analysis was performed first on the complete text of postings (full-text analysis). Only the sentences containing the highest frequency word, pain, were selected. Next, texts including the sentences were used to re-analyze the keyword frequency (pain-text analysis). RESULTS: Keyword frequency analysis showed that pain is of utmost concern. Full-text analysis was dominated by structural, pathological, and therapeutic words, whereas pain-text analysis was related mainly to the location and quality of the pain. CONCLUSION: The present study indicated that text-mining for a specific element (keyword) of a particular disease could enhance the understanding of the specific aspect of the disease. This suggests that a consideration of the text source is required when interpreting the results. Clinically, the present results suggest that clinicians pay more attention to the pain a patient is experiencing, and provide information based on medical knowledge.

HTML Text Extraction Using Frequency Analysis (빈도 분석을 이용한 HTML 텍스트 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1135-1143
    • /
    • 2021
  • Recently, text collection using a web crawler for big data analysis has been frequently performed. However, in order to collect only the necessary text from a web page that is complexly composed of numerous tags and texts, there is a cumbersome requirement to specify HTML tags and style attributes that contain the text required for big data analysis in the web crawler. In this paper, we proposed a method of extracting text using the frequency of text appearing in web pages without specifying HTML tags and style attributes. In the proposed method, the text was extracted from the DOM tree of all collected web pages, the frequency of appearance of the text was analyzed, and the main text was extracted by excluding the text with high frequency of appearance. Through this study, the superiority of the proposed method was verified.

Text Mining of Wood Science Research Published in Korean and Japanese Journals

  • Eun-Suk JANG
    • Journal of the Korean Wood Science and Technology
    • /
    • v.51 no.6
    • /
    • pp.458-469
    • /
    • 2023
  • Text mining techniques provide valuable insights into research information across various fields. In this study, text mining was used to identify research trends in wood science from 2012 to 2022, with a focus on representative journals published in Korea and Japan. Abstracts from Journal of the Korean Wood Science and Technology (JKWST, 785 articles) and Journal of Wood Science (JWS, 812 articles) obtained from the SCOPUS database were analyzed in terms of the word frequency (specifically, term frequency-inverse document frequency) and co-occurrence network analysis. Both journals showed a significant occurrence of words related to the physical and mechanical properties of wood. Furthermore, words related to wood species native to each country and their respective timber industries frequently appeared in both journals. CLT was a common keyword in engineering wood materials in Korea and Japan. In addition, the keywords "MDF," "MUF," and "GFRP" were ranked in the top 50 in Korea. Research on wood anatomy was inferred to be more active in Japan than in Korea. Co-occurrence network analysis showed that words related to the physical and structural characteristics of wood were organically related to wood materials.

Analysis of Aviation Safety Management Issues using Text Mining (Text Mining 기법을 활용한 항공안전관리 이슈 분석)

  • Moonjin Kwon;Jang Ryong Lee
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.31 no.4
    • /
    • pp.19-27
    • /
    • 2023
  • In this study, a total of 2,584 domestic research papers with the keywords "Aviation Safety" and "Aviation Accidents" were subjected to Text Mining analysis. Various text mining techniques, including keyword frequency analysis, word correlation analysis, network analysis, and topic modeling, were applied to examine the research trends in the field of aviation safety. The results revealed a significant increase in research using the keyword "Aviation Safety" since 2015, with over 300 papers published annually. Through keyword frequency analysis, it was observed that "Aircraft" was the most frequently mentioned term, followed by "Drones" and "Unmanned Aircraft." Phi coefficients were calculated for words closely related to "Aircraft," "Aviation," "Drones," and "Safety." Furthermore, topic modeling was employed to identify 12 distinct topics in the field of aviation safety and aviation accidents, allowing for an in-depth exploration of research trends.

Caption Extraction in News Video Sequence using Frequency Characteristic

  • Youglae Bae;Chun, Byung-Tae;Seyoon Jeong
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.835-838
    • /
    • 2000
  • Popular methods for extracting a text region in video images are in general based on analysis of a whole image such as merge and split method, and comparison of two frames. Thus, they take long computing time due to the use of a whole image. Therefore, this paper suggests the faster method of extracting a text region without processing a whole image. The proposed method uses line sampling methods, FFT and neural networks in order to extract texts in real time. In general, text areas are found in the higher frequency domain, thus, can be characterized using FFT The candidate text areas can be thus found by applying the higher frequency characteristics to neural network. Therefore, the final text area is extracted by verifying the candidate areas. Experimental results show a perfect candidate extraction rate and about 92% text extraction rate. The strength of the proposed algorithm is its simplicity, real-time processing by not processing the entire image, and fast skipping of the images that do not contain a text.

  • PDF

A Big Data Analysis on Research Keywords, Centrality, and Topics of International Trade using the Text Mining and Social Network (텍스트 마이닝과 소셜 네트워크 기법을 활용한 국제무역 키워드, 중심성과 토픽에 대한 빅데이터 분석)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.47 no.4
    • /
    • pp.137-159
    • /
    • 2022
  • This study aims to analyze international trade papers published in Korea during the past 2002-2022 years. Through this study, it is possible to understand the main subject and direction of research in Korea's international trade field. As the research mythologies, this study uses the big data analysis such as the text mining and Social Network Analysis such as frequency analysis, several centrality analysis, and topic analysis. After analyzing the empirical results, the frequency of key word is very high in trade, export, tariff, market, industry, and the performance of firm. However, there has been a tendency to include logistics, e-business, value and chain, and innovation over the time. The degree and closeness centrality analyses also show that the higher frequency key words also have been higher in the degree and closeness centrality. In contrast, the order of eigenvector centrality seems to be different from those of the degree and closeness centrality. The ego network shows the density of business, sale, exchange, and integration appears to be high in order unlike the frequency analysis. The topic analysis shows that the export, trade, tariff, logstics, innovation, industry, value, and chain seem to have high the probabilities of included in several topics.

A Text Mining Analysis for Research Trend about the Mathematics Education (텍스트 마이닝 분석을 통한 수학교육 연구 동향 분석)

  • Jin, Mireu;Ko, Ho Kyoung
    • East Asian mathematical journal
    • /
    • v.35 no.4
    • /
    • pp.489-508
    • /
    • 2019
  • In this paper we used text mining method to analyze journals of mathematics education posterior to the year of 2016. To figure out trends of mathematics education research. we analyzed the key words largely mentioned in the recent mathematics education journals by Term Frequency and Term Frequency-Inverse Document Frequency method. We also looked at how these keywords match up with the key words that appear of education to prepare for future society. This result can infer the characteristics of mathematics education research in the aspect upcoming research topics.

WCTT: Web Crawling System based on HTML Document Formalization (WCTT: HTML 문서 정형화 기반 웹 크롤링 시스템)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.495-502
    • /
    • 2022
  • Web crawler, which is mainly used to collect text on the web today, is difficult to maintain and expand because researchers must implement different collection logic by collection channel after analyzing tags and styles of HTML documents. To solve this problem, the web crawler should be able to collect text by formalizing HTML documents to the same structure. In this paper, we designed and implemented WCTT(Web Crawling system based on Tag path and Text appearance frequency), a web crawling system that collects text with a single collection logic by formalizing HTML documents based on tag path and text appearance frequency. Because WCTT collects texts with the same logic for all collection channels, it is easy to maintain and expand the collection channel. In addition, it provides the preprocessing function that removes stopwords and extracts only nouns for keyword network analysis and so on.

HTML Text Extraction Using Tag Path and Text Appearance Frequency (태그 경로 및 텍스트 출현 빈도를 이용한 HTML 본문 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1709-1715
    • /
    • 2021
  • In order to accurately extract the necessary text from the web page, the method of specifying the tag and style attributes where the main contents exist to the web crawler has a problem in that the logic for extracting the main contents. This method needs to be modified whenever the web page configuration is changed. In order to solve this problem, the method of extracting the text by analyzing the frequency of appearance of the text proposed in the previous study had a limitation in that the performance deviation was large depending on the collection channel of the web page. Therefore, in this paper, we proposed a method of extracting texts with high accuracy from various collection channels by analyzing not only the frequency of appearance of text but also parent tag paths of text nodes extracted from the DOM tree of web pages.

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.