• Title/Summary/Keyword: 빈도 기반 텍스트 분석

Search Result 105, Processing Time 0.033 seconds

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

Sentiment lexicon modeling for consumer analysis (소비자 분석을 위한 감성사전 모델링)

  • Lee, Jae-Woong;Yun, Hyun-Noh;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.850-853
    • /
    • 2017
  • 본 논문은, 크롤링을 통해 얻은 비정형 데이터를 'Python'의 'KoNLPy' 라이브러리를 사용해 형태소 분석한 후 텍스트 마이닝을 통한 감성사전 구축을 목표로 하고 있으며, 형태소들의 빈도수를 기반으로 가중치로 두어 선별된 단어들을 이용해 긍정과 부정으로 나누어 카테고리화 한다. 이후, 선별한 카테고리에 단어의 극성을 판단하여 감성사전을 모델링한다. 실험을 위하여, 온라인 쇼핑몰 리뷰를 크롤링하여 비정형 데이터를 수집하고, 수집한 데이터를 분석, 가공 과정을 거쳐 정형화된 단어를 추출한다. 그 후에, 리뷰에 자주 사용되는 단어를 바탕으로 카테고리를 구성하였다. 구성된 카테고리 별로 단어의 극성을 판단하여 소비자 성향을 분석한 결과, 단순히 긍정과 부정을 표현하는 범용 감성사전보다 더 세분화된 감성 사전을 구축 할 수 있었다.

Analysis of Transportation Big Data in Busan on Media (미디어에 나타난 부산 교통 관련 빅데이터의 분석)

  • Ban, ChaeHoon;Kim, YongSu;Lee, YeChan;Jung, YoonSeung;Jeong, DongMin;Cho, HaeChan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.378-381
    • /
    • 2016
  • 정보기술과 디지털 경제의 확산으로 대규모의 데이터가 생산되는 정보화시대에서 빅데이터의 중요성이 강조되고 있으며 다양한 분야에서 이를 응용하고 있다. 빅 데이터 분석 도구인 R은 통계 기반의 정보 분석을 가능하게 하는 언어와 환경이다. 본 논문에서는 R을 이용하여 미디어에 나타난 부산 교통 관련 빅데이터를 분석한다. 다양한 미디어에서 부산 교통 관련 데이터를 수집하고 어떠한 텍스트가 분포되어 있는지 빈도 조사를 수행한다.

  • PDF

Content Analysis of Webzine for Gist-based Health Message Design (핵심정보 중심 건강 메시지 디자인을 위한 웹진 내용분석)

  • Cho, Young Hoan;Choi, Hyoseon;You, Myoung Soon
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.2
    • /
    • pp.192-204
    • /
    • 2014
  • Gist-based message design is essential in the Internet in which a lot of health messages are created and shared constantly. This study intended to identify the characteristics of health messages in a webzine and explore a way to design gist-based health messages. A total of 72 webzine articles published in Korean Ministry of Food and Drug Safety were selected, and text and visual messages of the articles were analyzed in terms of content types, the frequency and position of gists, and intuitive expression. The articles were also categorized regarding the characteristics of health messages through cluster analyses. This study found that most texts of the health articles consisted of facts and methods, while most visual messages represented concepts. In addition, both text and visual messages had limitations in presenting a gist effectively. It was also necessary to explore an effective way to improve an intuitive understanding of jargon and quantitative information in health messages. Based on these findings, this study provided suggestions for the design of gist-based health messages in the Internet.

An Analysis of Keywords on 'School Space Innovation' Policies using Text Mining - Focused on News Articles - (텍스트 마이닝을 활용한 '학교 공간 혁신' 정책 키워드 분석 - 뉴스 기사를 중심으로 -)

  • Lee, Dongkuk
    • The Journal of Sustainable Design and Educational Environment Research
    • /
    • v.19 no.2
    • /
    • pp.11-20
    • /
    • 2020
  • The goal of this study was to investigate the implementation and related issues of the school space innovation issued by key Korean mass media using text mining. To accomplish this goal, this study collected 519 news articles associated with the school space innovation issued by 54 Korean mass media companies. Based on this data, this study performed the frequency analysis and network analysis regarding the keywords. Based on the findings, the characteristics of school space innovation are summarized as follows: First, school space innovation has progressed in response to future education. Second, users are actively participating in school space innovation. Third, experts are supporting the innovation of school space by establishing a cooperative system. Fourth, the community is actively considering the innovation of school space. Fifth, the main projects of the Ministry of Education and the Provincial Offices of Education are actively conducted in a mix of top-down and bottom-up approaches. The findings of this study will contribute to providing a clear direction for contemporary school space innovation and implications for future research agenda and implementation.

Research to establish a road map for the standardization in military and commercial terminology (민·군규격용어 표준화를 위한 로드맵 구축 연구)

  • Park, jeong-ho;Choi, young-ho;Im, ik-soon;Jang, hyo-jun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2015.05a
    • /
    • pp.251-252
    • /
    • 2015
  • 본 연구는 국방규격서의 전문어, 오용어, 어문규정 및 순화어 미(未)준수 어휘를 추출, 정의 또는 순화어로 정제하는 맵핑구조를 제시, 민 군규격용어 표준화를 위한 정보업무 로드맵을 구축하여 민간용어와의 호환성 및 일관성을 유지할 수 있는 지원체계를 연구하였다. 대상 규격용어는 KS용어표준 원칙을 기본으로 한 신뢰도 평가와 텍스트 마이닝 (text mining)빈도분석을 이용하여 선정하였으며, 시소러스(thesaurus) 체계를 삽입, 개념기반 서비스의 확장성을 제시하였다. 이를 기반으로 산출된 규격용어 DB는 민간 및 국방 관련분야의 용어표준관리 정보체계에 검색 및 용어설명에 활용될 수 있다.

  • PDF

A Suggestion and an analysis on Changes on trend of the 'Virtual Tourism' before and after the Covid 19 Crisis using Textmining Method (텍스트 마이닝을 활용한 '가상관광'의 코로나19 전후 트렌드 분석 및 방향성 제언)

  • Sung, Yun-A
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.155-161
    • /
    • 2022
  • The outbreak of the Covid 19 increased the interest on the 'Virtual Tourism. In this research the key word related to "Virtual Tourism" was collected through the search engine and was analyzed through the data mining method such as Log-odds ratio, Frequency, and network analysis. It is clear that the information and communication dependency increased in the field of "Virtual Tourism" after Covid 19 and also the trend have changed from "securement of the contents diversity" to "project related to economic recovery." Since the demands for the "Virtual Reality" such as metaverse is increasing, there should be an economic and circular structure in which the government establishing a related policy and the funding plan based on the research, local government and the private companies planning and producing discriminate contents focusing on AISAS(Attension, Interest, Search, Action, Share) aand the research institutions and universities developing, applying, assessing and commercializing the technology.

Creation and clustering of proximity data for text data analysis (텍스트 데이터 분석을 위한 근접성 데이터의 생성과 군집화)

  • Jung, Min-Ji;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.451-462
    • /
    • 2019
  • Document-term frequency matrix is a type of data used in text mining. This matrix is often based on various documents provided by the objects to be analyzed. When analyzing objects using this matrix, researchers generally select only terms that are common in documents belonging to one object as keywords. Keywords are used to analyze the object. However, this method misses the unique information of the individual document as well as causes a problem of removing potential keywords that occur frequently in a specific document. In this study, we define data that can overcome this problem as proximity data. We introduce twelve methods that generate proximity data and cluster the objects through two clustering methods of multidimensional scaling and k-means cluster analysis. Finally, we choose the best method to be optimized for clustering the object.

A Topic Related Word Extraction Method Using Deep Learning Based News Analysis (딥러닝 기반의 뉴스 분석을 활용한 주제별 최신 연관단어 추출 기법)

  • Kim, Sung-Jin;Kim, Gun-Woo;Lee, Dong-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.873-876
    • /
    • 2017
  • 최근 정보검색의 효율성을 위해 데이터를 분석하여 해당 데이터를 가장 잘 나타내는 연관단어를 추출 및 추천하는 연구가 활발히 이루어지고 있다. 현재 관련 연구들은 출현 빈도수를 사용하는 방법이나 LDA와 같은 기계학습 기법을 활용해 데이터를 분석하여 연관단어를 생성하는 방법을 제안하고 있다. 기계학습 기법은 결과 값을 찾는데 사용되는 특징들을 전문가가 직접 설계해야 하며 좋은 결과를 내는 적절한 특징을 찾을 때까지 많은 시간이 필요하다. 또한, 파라미터들을 직접 설정해야 하므로 많은 시간과 노력을 필요로 한다는 단점을 지닌다. 이러한 기계학습 기법의 단점을 극복하기 위해 인공신경망을 다층구조로 배치하여 데이터를 분석하는 딥러닝이 최근 각광받고 있다. 본 논문에서는 기존 기계학습 기법을 사용하는 연관단어 추출연구의 한계점을 극복하기 위해 딥러닝을 활용한다. 먼저, 인공신경망 기반 단어 벡터 생성기인 Word2Vec를 사용하여 다양한 텍스트 데이터들을 학습하고 룩업 테이블을 생성한다. 그 후, 생성된 룩업 테이블을 바탕으로 인공신경망의 한 종류인 합성곱 신경망을 활용하여 사용자가 입력한 주제어와 관련된 최근 뉴스데이터를 분석한 후, 주제별 최신 연관단어를 추출하는 시스템을 제안한다. 또한 제안한 시스템을 통해 생성된 연관단어의 정확률을 측정하여 성능을 평가하였다.

A Study on Environmental research Trends by Information and Communications Technologies using Text-mining Technology (텍스트 마이닝 기법을 이용한 환경 분야의 ICT 활용 연구 동향 분석)

  • Park, Boyoung;Oh, Kwan-Young;Lee, Jung-Ho;Yoon, Jung-Ho;Lee, Seung Kuk;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.2
    • /
    • pp.189-199
    • /
    • 2017
  • Thisstudy quantitatively analyzed the research trendsin the use ofICT ofthe environmental field using the text mining technique. To that end, the study collected 359 papers published in the past two decades(1996-2015)from the National Digital Science Library (NDSL) using 38 environment-related keywords and 16 ICT-related keywords. It processed the natural languages of the environment and ICT fields in the papers and reorganized the classification system into the unit of corpus. It conducted the text mining analysis techniques of frequency analysis, keyword analysis and the association rule analysis of keywords, based on the above-mentioned keywords of the classification system. As a result, the frequency of the keywords of 'general environment' and 'climate' accounted for 77 % of the total proportion and the keywords of 'public convergence service' and 'industrial convergence service' in the ICT field took up approximately 30 % of the total proportion. According to the time series analysis, the researches using ICT in the environmental field rapidly increased over the past 5 years (2011-2015) and the number of such researches more than doubled compared to the past (1996-2010). Based on the environmental field with generated association rules among the keywords, it was identified that the keyword 'general environment' was using 16 ICT-based technologies and 'climate' was using 14 ICT-based technologies.