• Title/Summary/Keyword: R 텍스트 마이닝

Search Result 89, Processing Time 0.028 seconds

Quantitative Text Mining for Social Science: Analysis of Immigrant in the Articles (사회과학을 위한 양적 텍스트 마이닝: 이주, 이민 키워드 논문 및 언론기사 분석)

  • Yi, Soo-Jeong;Choi, Doo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.118-127
    • /
    • 2020
  • The paper introduces trends and methodological challenges of quantitative Korean text analysis by using the case studies of academic and news media articles on "migration" and "immigration" within the periods of 2017-2019. The quantitative text analysis based on natural language processing technology (NLP) and this became an essential tool for social science. It is a part of data science that converts documents into structured data and performs hypothesis discovery and verification as the data and visualize data. Furthermore, we examed the commonly applied social scientific statistical models of quantitative text analysis by using Natural Language Processing (NLP) with R programming and Quanteda.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

A Study on the Analysis of ICT R&D using Text Mining Method: Focused on ICT Field and Smart City (텍스트 마이닝을 활용한 국가 R&D과제 동향 분석: ICT 분야와 스마트시티 중심으로)

  • Kim, Seong-soon;Yang, Myung-seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.462-465
    • /
    • 2021
  • 본 연구는 최근 ICT분야 R&D 동향을 파악하기 위하여 NTIS에서 제공하는 국가연구개발사업 과제정보를 텍스트 마이닝 기법을 통해 분석하였다. 2017년부터 2020까지의 과제 정보에서 키워드를 추출하고 연결 관계 마이닝을 통해 키워드 네트워크를 시각화하였다. 분석 결과는 다음과 같다. 첫째, 정보통신 각 분야에서 핵심 연구주제가 기술의 발전에 따라 변화하고 있음을 관찰하였다. 둘째, 키워드 네트워크 상에서 허브 역할을 하는 키워드를 통해 분야 간 융합의 매개 기술을 파악할 수 있었다. 마지막으로, 연도별 키워드 네트워크를 비교·분석함으로써 새롭게 등장하거나 연결 상태의 변화를 보이는 이머징(Emerging) 키워드를 통해 미래 유망 기술이나 최신 연구 방향성을 감지할 수 있음을 보였다.

A Study on the Analysis of Agricultural R&D Keywords Using Textmining Method (텍스트마이닝을 활용한 농업 R&D 키워드 분석)

  • Kim, Ji-Hoon;Kim, Seong-Sup
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.2
    • /
    • pp.721-732
    • /
    • 2021
  • This study analyzed keywords for agricultural R&D using the textmining method to examine the trend of agricultural R&D. Data used for the analysis included R&D project information provided by NTIS, and the research and development step by year from 2003 to 2018 were classified and applied. The TF-IDF approach was used as the analysis method, and ranking was derived based on score. Furthermore, we analyzed by grouping for similar keywords. The main analysis results are as follows. First, agricultural R&D trends are changing according to the introduction of new technologies and changes in the external environment. Second, keyword changes appeared with a time lag in the R&D step. The main keywords are changing in the order of basic research - applied research - development research. Third, the main keyword of agricultural R&D was 'rice.' However, the direction and purpose of the research were changing according to changes in the domestic and foreign agricultural environments.

Time Series Analysis of Patent Keywords for Forecasting Emerging Technology (특허 키워드 시계열 분석을 통한 부상 기술 예측)

  • Kim, Jong-Chan;Lee, Joon-Hyuck;Kim, Gab-Jo;Park, Sang-Sung;Jang, Dong-Sick
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.355-360
    • /
    • 2014
  • Forecasting of emerging technology plays important roles in business strategy and R&D investment. There are various ways for technology forecasting including patent analysis. Qualitative analysis methods through experts' evaluations and opinions have been mainly used for technology forecasting using patents. However qualitative methods do not assure objectivity of analysis results and requires high cost and long time. To make up for the weaknesses, we are able to analyze patent data quantitatively and statistically by using text mining technique. In this paper, we suggest a new method of technology forecasting using text mining and ARIMA analysis.

Intelligent Wordcloud Using Text Mining (텍스트 마이닝을 이용한 지능적 워드클라우드)

  • Kim, Yeongchang;Ji, Sangsu;Park, Dongseo;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.325-326
    • /
    • 2019
  • This paper proposes an intelligent word cloud by improving the existing method of representing word cloud by examining the frequency of nouns with text mining technique. In this paper, we propose a method to visually show word clouds focused on other parts, such as verbs, by effectively adding newly-coined words and the like to a dictionary that extracts noun words in text mining. In the experiment, the KoNLP package was used for extracting the frequency of existing nouns, and 80 new words that were not supported were added manually by examining frequency.

  • PDF

R&D Redundancy and Similarity Check System (클라우드 기반 R&D 연구 보고서 문서표절 및 유사도 검출 시스템)

  • Shin, Hyojoung;Park, Kiheung;Haing, Huhduck
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.31-32
    • /
    • 2016
  • 최근 정부의 R&D 연구에 대한 지원 규모 증가로 인해 전국가적으로 활발하게 기술 연구가 진행되고 있지만 예산을 집행하는 과정에서 기술 연구개발 과제의 중복연구로 시간과 예산을 낭비하는 사례를 노출하고 있다. 이와 같은 문제점을 해결하기 위해서는 정부 R&D 과제 선정과정에서 연구주제의 중복성 방지 등 근원적 혁신이 필요하다. 본 논문에서는 텍스트 마이닝 기술 및 빅데이터 분석 기술(하둡, 아마존 웹 서비스)과 같은 데이터 분석 기술이 도입된 클라우드 기반 R&D 연구 보고서 문서표절 및 유사도를 검출하는 시스템을 제안한다. 본 시스템은 SaaS 형태의 "on-demand software"로 웹 접속만으로 사용이 가능하다.

  • PDF

Topic Analysis of Papers of JKIICE Using Text Mining (텍스트 마이닝을 이용한 한국정보통신학회 논문지의 주제 분석)

  • Woo, Young Woon;Cho, Kyoung Won;Lee, KwangEui
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.74-75
    • /
    • 2017
  • In this paper, we analyzed 3,668 papers of JKIICE from 2007 to 2016 using text mining methods for understanding research fields. We used web scraping programs of Python language for data collection, and utilized topic modeling methods based on LDA algorithm implemented by R language. In the results, we verified that representative research areas of JKIICE could be downsized to 9 areas only by the analysis though the submission areas were 19 areas by 2016.

  • PDF

A Study on the Research Trends in the Area of Geospatial-Information Using Text-mining Technique Focused on National R&D Reports and Theses (텍스트마이닝 기술을 이용한 공간정보 분야의 연구 동향에 관한 고찰 -국가연구개발사업 보고서 및 논문을 중심으로-)

  • Lim, Si Yeong;Yi, Mi Sook;Jin, Gi Ho;Shin, Dong Bin
    • Spatial Information Research
    • /
    • v.22 no.4
    • /
    • pp.11-20
    • /
    • 2014
  • This study aims to provide information about the research-trends in the area of Geospatial Information using text-mining methods. We derived the National R&D Reports and papers from NDSL(National Discovery for Science Leaders) site. And then we preprocessed their key-words and classified those in separable sectors. We investigated the appearance rates and changes of key-words for R&D reports and papers. As a result, we conformed that the researches concerning applications are increasing, while the researches dealing with systems are decreasing. Especially, with in the framework of the keyword, '3D-GIS', 'sensor' and 'service' xcept ITS are emerging. It could be helpful to investigate research items later.