• Title/Summary/Keyword: 텍스트 빈도 분석

Search Result 332, Processing Time 0.03 seconds

Analysis of the National Police Agency business trends using text mining (텍스트 마이닝 기법을 이용한 경찰청 업무 트렌드 분석)

  • Sun, Hyunseok;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.301-317
    • /
    • 2019
  • There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.

A Content Analysis of Journal Articles Using the Language Network Analysis Methods (언어 네트워크 분석 방법을 활용한 학술논문의 내용분석)

  • Lee, Soo-Sang
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.4
    • /
    • pp.49-68
    • /
    • 2014
  • The purpose of this study is to perform content analysis of research articles using the language network analysis method in Korea and catch the basic point of the language network analysis method. Six analytical categories are used for content analysis: types of language text, methods of keyword selection, methods of forming co-occurrence relation, methods of constructing network, network analytic tools and indexes. From the results of content analysis, this study found out various features as follows. The major types of language text are research articles and interview texts. The keywords were selected from words which are extracted from text content. To form co-occurrence relation between keywords, there use the co-occurrence count. The constructed networks are multiple-type networks rather than single-type ones. The network analytic tools such as NetMiner, UCINET/NetDraw, NodeXL, Pajek are used. The major analytic indexes are including density, centralities, sub-networks, etc. These features can be used to form the basis of the language network analysis method.

Text Undestanding System for Summarization (텍스트 이해 모델에 기반한 정보 검색 시스템)

  • Song, In-Seok;Park, Hyuk-Ro
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.1-6
    • /
    • 1997
  • 본 논문에서는 인지적 텍스트 이해 모형을 제시하고 이에 기반한 자동 요약 시스템을 구현하였다. 문서는 정보의 단순한 집합체가 아닌 정형화된 언어 표현 양식으로서 단어의 의미적 정보와 함께 표현 양식, 문장의 구조와 문서의 구성을 통해 정보를 전달한다. 요약 목적의 텍스트 이해 및 분석 과정을 위해 경제 분야 기사 1000건에 대한 수동 요약문을 분석, 이해 모델을 정립하였고. 경제 분야 기사 1000건에 대한 테스트 결과를 토대로 문장간의 관계, 문서의 구조에서 요약 정보 추출에 사용되는 정보를 분석하였다. 본 텍스트 이해 모형은 단어 빈도수에 의존하는 통계적 모델과 비교해 볼 때, 단어 간의 관련성을 찾아내고, 문서구조정보에 기반한 주제문 추출 및 문장간의 관계를 효과적으로 사용함으로서 정보를 생성한다. 그리고 텍스트 이해 과정에서 사용되는 요약 지식과 구조 분석정보의 상관관계를 체계적으로 연결함으로서 자동정보 추출에서 야기되는 내용적 만족도 문제를 보완한다.

  • PDF

A Text Network Analysis of North Korean Library Journal, 『Reference Materials for Librarian』 (북한 도서관잡지 『도서관일군 참고자료』의 텍스트 네트워크 분석)

  • Lee, Seongsin;Kim, Hyunsook;Baek, Sumin;Yoon, Subin;Choi, Jae-Hwang
    • Journal of Korean Library and Information Science Society
    • /
    • v.53 no.3
    • /
    • pp.169-191
    • /
    • 2022
  • The purpose of this study is to attempt a text network analysis for two years of 『Reference Materials for Librarian』 (2016-2017) published by the Library Operation Methodology Research Institute in North Korea. A text network analysis can measure how important a particular word by grasping the connectivity and relationship between words beyond a simple word frequency analysis, and it is also possible to interpret specific social phenomena and derive implications. Frequency, degree centrality, the betweenness centrality, community analysis of the collected words were calculated using NetMiner. As a result, the terms 'users', 'information services', 'information needs', 'information technology', 'social learning', 'computers', 'databases', 'information acquisition', 'information retrieval' and 'librarian' were appeared as important ones in understanding North Korean libraries.

Analysis of User Reviews of Running Applications Using Text Mining: Focusing on Nike Run Club and Runkeeper (텍스트마이닝을 활용한 러닝 어플리케이션 사용자 리뷰 분석: Nike Run Club과 Runkeeper를 중심으로)

  • Gimun Ryu;Ilgwang Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.11-19
    • /
    • 2024
  • The purpose of this study was to analyze user reviews of running applications using text mining. This study used user reviews of Nike Run Club and Runkeeper in the Google Play Store using the selenium package of python3 as the analysis data, and separated the morphemes by leaving only Korean nouns through the OKT analyzer. After morpheme separation, we created a rankNL dictionary to remove stopwords. To analyze the data, we used TF, TF-IDF and LDA topic modeling in text mining. The results of this study are as follows. First, the keywords 'record', 'app', and 'workout' were identified as the top keywords in the user reviews of Nike Run Club and Runkeeper applications, and there were differences in the rankings of TF and TF-IDF. Second, the LDA topic modeling of Nike Run Club identified the topics of 'basic items', 'additional features', 'errors', and 'location-based data', and the topics of Runkeeper identified the topics of 'errors', 'voice function', 'running data', 'benefits', and 'motivation'. Based on the results, it is recommended that errors and improvements should be made to contribute to the competitiveness of the application.

Data Analysis Web Application Based on Text Mining (텍스트 마이닝 기반의 데이터 분석 웹 애플리케이션)

  • Gil, Wan-Je;Kim, Jae-Woong;Park, Koo-Rack;Lee, Yun-Yeol
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.103-104
    • /
    • 2021
  • 본 논문에서는 텍스트 마이닝 기반의 토픽 모델링 웹 애플리케이션 모델을 제안한다. 웹크롤링 기법을 활용하여 키워드를 입력하면 요약된 논문 정보를 파일로 저장할 수 있고 또한 키워드 빈도 분석과 토픽 모델링 등을 통해 연구 동향을 손쉽게 확인해볼 수 있는 웹 애플리케이션을 설계하고 구현하는 것을 목표로 한다. 제안 모델인 웹 애플리케이션을 통해 프로그래밍 언어와 데이터 분석 기법에 대한 지식이 부족하더라도 논문 수집과 저장, 텍스트 분석을 경험해볼 수 있다. 또한, 이러한 웹 시스템 개발은 기존의 html, css, java script와 같은 언어에 의존하지 않고 파이썬 라이브러리를 활용하였기 때문에 파이썬을 기반으로 데이터 분석과 머신러닝 교육을 수행할 경우 프로젝트 기반 수업 교육 과정으로 채택이 가능할 것으로 기대된다.

  • PDF

A Study on the Archival Information Services of Economic Policy Using Text Mining Methods: Focusing on Economic Policy Directions (텍스트 마이닝을 활용한 경제정책기록서비스 연구: 경제정책방향을 중심으로)

  • Yeon, Jihyun;Kim, Sungwon
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.22 no.2
    • /
    • pp.117-133
    • /
    • 2022
  • The archival content listed arbitrarily makes it difficult for users to efficiently access the records of major economic policies, especially given that they use it without understanding the required period and context. Using the text mining techniques in the 30-year economic policy direction from 1991 to 2021, this paper derives economic-related keywords and changes that the government mainly dealt with. It collects and preprocesses major economic policies' background, main content, and body text and conducts text frequency, term frequency-inverse document frequency (TF-IDF), network, and time series analyses. Based on these analyses, the following words are recorded in order of frequency: "job(일자리)," "competitive(경쟁력)," and "restructuring(구조조정)." In addition, the relative ratio of "job (일자리)," "real estate(부동산)," and "corporation(기업)," by year was analyzed in terms of chronological order while presenting major keywords mentioned by each government. Based on the results, this study presents implications for developing and broadening the area of archival information services related to economic policies.

Trend Analysis of FinTech and Digital Financial Services using Text Mining (텍스트마이닝을 활용한 핀테크 및 디지털 금융 서비스 트렌드 분석)

  • Kim, Do-Hee;Kim, Min-Jeong
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.131-143
    • /
    • 2022
  • Focusing on FinTech keywords, this study is analyzing newspaper articles and Twitter data by using text mining methodology in order to understand trends in the industry of domestic digital financial service. In the growth of FinTech lifecycle, the frequency analysis has been performed by four important points: Mobile Payment Service, Internet Primary Bank, Data 3 Act, MyData Businesses. Utilizing frequency analysis, which combines the keywords 'China', 'USA', and 'Future' with the 'FinTech', has been predicting the FinTech industry regarding of the current and future position. Next, sentiment analysis was conducted on Twitter to quantify consumers' expectations and concerns about FinTech services. Therefore, this study is able to share meaningful perspective in that it presented strategic directions that the government and companies can use to understanding future FinTech market by combining frequency analysis and sentiment analysis.

Authorship Attribution in Korean Using Frequency Profiles (빈도 정보를 이용한 한국어 저자 판별)

  • Han, Na-Rae
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.2
    • /
    • pp.225-241
    • /
    • 2009
  • This paper presents an authorship attribution study in Korean conducted on a corpus of newspaper column texts. Based on the data set consisting of a total of 160 columns written by four columnists of Chosun Daily, the approach utilizes relative frequencies of various lexical units in Korean such as fully inflected words, morphemes, syllables and their bigrams in an attempt to establish authorship of a blind text selected from the set. Among these various lexical units, "the morpheme" is found to be most effective in predicting who among the four potential candidates authored a text, reporting accuracies of over 93%. The results indicate that quantitative and statistical techniques in authorship attribution and computational stylistics can be successfully applied to Korean texts.

  • PDF

Keyword Analysis of Research on Consumption of Children and Adolescents Using Text Mining (텍스트마이닝을 활용한 아동, 청소년 대상 소비관련 연구 키워드 분석)

  • Jin, Hyun-Jeong
    • Journal of Korean Home Economics Education Association
    • /
    • v.33 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • The purpose of this study is to identify trends and potential themes of research on consumption of children and adolescents for 20 years by analyzing keywords. The keywords of 869 studies on consumption of children and adolescents published in journals listed in Korean Citation Index were analyzed using text mining techniques. The most frequent keywords were found in the order of youth, youth consumers, consumer education, conspicuous consumption, consumption behavior, and character. As a result of analyzing the frequency of keywords by dividing into five-year periods, it was confirmed that the frequency of consumer education was significantly higher betwn 2006 and 2010. Research on ethical consumption has been active since 2011, and research has been conducted on various topics instead of without a prominent keyword during the most recent 5-year period. Looking at the keywords based on the TF-IDF, the keywords related to the environment and the Internet were the main keywords between 2001 and 2005. From 2006 to 2010, the TF-IDF values of media use, advertisement education, and Internet items were high. From 2011 to 2015, fair trade, green growth, green consumption, North Korean defector youths, social media, and from 2016 to 2020, text mining, sustainable development education, maker education, and the 2015 revised curriculum appeared as important themes. As a result of topic modeling, eight topics were derived: consumer education, mass media/peer culture, rational consumption, Hallyu/cultural industry, consumer competency, economic education, teaching and learning method, and eco-friendly/ethical consumption. As a result of network analysis, it was found that conspicuous consumption and consumer education are important topics in consumption research of children and adolescents.