• Title/Summary/Keyword: research topic analysis

Search Result 1,270, Processing Time 0.027 seconds

Big Data Analysis of Busan Civil Affairs Using the LDA Topic Modeling Technique (LDA 토픽모델링 기법을 활용한 부산시 민원 빅데이터 분석)

  • Park, Ju-Seop;Lee, Sae-Mi
    • Informatization Policy
    • /
    • v.27 no.2
    • /
    • pp.66-83
    • /
    • 2020
  • Local issues that occur in cities typically garner great attention from the public. While local governments strive to resolve these issues, it is often difficult to effectively eliminate them all, which leads to complaints. In tackling these issues, it is imperative for local governments to use big data to identify the nature of complaints, and proactively provide solutions. This study applies the LDA topic modeling technique to research and analyze trends and patterns in complaints filed online. To this end, 9,625 cases of online complaints submitted to the city of Busan from 2015 to 2017 were analyzed, and 20 topics were identified. From these topics, key topics were singled out, and through analysis of quarterly weighting trends, four "hot" topics(Bus stops, Taxi drivers, Praises, and Administrative handling) and four "cold" topics(CCTV installation, Bus routes, Park facilities including parking, and Festivities issues) were highlighted. The study conducted big data analysis for the identification of trends and patterns in civil affairs and makes an academic impact by encouraging follow-up research. Moreover, the text mining technique used for complaint analysis can be used for other projects requiring big data processing.

Topic Model Analysis of Research Themes and Trends in the Journal of Economic and Environmental Geology (기계학습 기반 토픽모델링을 이용한 학술지 "자원환경지질"의 연구주제 분류 및 연구동향 분석)

  • Kim, Taeyong;Park, Hyemin;Heo, Junyong;Yang, Minjune
    • Economic and Environmental Geology
    • /
    • v.54 no.3
    • /
    • pp.353-364
    • /
    • 2021
  • Since the mid-twentieth century, geology has gradually evolved as an interdisciplinary context in South Korea. The journal of Economic and Environmental Geology (EEG) has a long history of over 52 years and published interdisciplinary articles based on geology. In this study, we performed a literature review using topic modeling based on Latent Dirichlet Allocation (LDA), an unsupervised machine learning model, to identify geological topics, historical trends (classic topics and emerging topics), and association by analyzing titles, keywords, and abstracts of 2,571 publications in EEG during 1968-2020. The results showed that 8 topics ('petrology and geochemistry', 'hydrology and hydrogeology', 'economic geology', 'volcanology', 'soil contaminant and remediation', 'general and structural geology', 'geophysics and geophysical exploration', and 'clay mineral') were identified in the EEG. Before 1994, classic topics ('economic geology', 'volcanology', and 'general and structure geology') were dominant research trends. After 1994, emerging topics ('hydrology and hydrogeology', 'soil contaminant and remediation', 'clay mineral') have arisen, and its portion has gradually increased. The result of association analysis showed that EEG tends to be more comprehensive based on 'economic geology'. Our results provide understanding of how geological research topics branch out and merge with other fields using a useful literature review tool for geological research in South Korea.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Cross-national Analysis of Robot Research Using Non-Structured Text Analytics for R&D Policy

  • Kim, Jeong Hun;Seo, Han Sol;Lee, Jae Woong;Lee, Jung Won;Kwon, Oh Byung
    • Asia Pacific Journal of Business Review
    • /
    • v.1 no.2
    • /
    • pp.63-88
    • /
    • 2017
  • With the advent of new frontiers in robotics, the spectrum of robot research area has widened in many fields and applications. Other than conventional robot research, many technologies such as smart devices, drones, healthcare robots, and soft robots are emerging as promising applications. Due to the research complexity of this topic, this research requires international collaboration and should be fertilized by R&D policies. This paper aims to propose a method to perform a cross-national analysis of robot research with unstructured data such as papers in the proceedings of an international conference. Text analytics are applied to extract research issues and applications in an automatic manner.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Timeline-Based Topic Trend Analysis of Archives Management in Korea (시계열 기반 국내 기록관리학 토픽 트렌드 분석)

    • Park, JunHyeong;Ryu, Pum-Mo;Oh, Hyo-Jung
      • Journal of Korean Society of Archives and Records Management
      • /
      • v.18 no.1
      • /
      • pp.29-47
      • /
      • 2018
    • The size and speed of archives and records information have recently increased. The purpose of this study is to find the direction of archives management research in Korea through an in-depth analysis of research trends in records management in the country. We collected articles related to archives management from 1997 to 2016 from two journals related to archives management and four journals related to library and information science in Korea. The collected articles from the first and second halves of the decade were subjected to five-year cycle analysis. As a result, research on electronic records, record information services, and archiving methods of various records has been gradually improved.

    A study on the Trend of Researches in Food and Culture - Focusing on published papers from 1986 to 2020 in the Journal of the Korean Society of Food Culture - (식문화 연구동향 분석 - 1986년부터 2020년까지 한국식생활문화학회지에 발표된 논문을 중심으로 -)

    • Lee, Kyou-Jin;Jang, Se-Eun;Oh, Yoon Sin
      • Journal of the Korean Society of Food Culture
      • /
      • v.37 no.3
      • /
      • pp.196-212
      • /
      • 2022
    • This study examines the trend of research on food and culture in papers published in the Journal of The Korean Society of Food Culture from 1986 to 2020. The journals published a total of 329 papers, which we classified into 5 main categories and 13 middle categories. Of these, 204 articles were on "Korean traditional food culture." The most studied topic in the entire period was "Perception of Koreans towards traditional food, preference, satisfaction, and usage." A total of 76 studies related to "Korean contemporary food culture." The most advanced topic researched concerned "Recognition and attitude"; these studies were consistently carried out throughout the research period. The main classification of "World food culture" encompassed 32 studies, with major research focused on "World's Modern Food Culture" and the most advanced being "Comparison of Food Cultures of Foreign and Korean Food Cultures." All studies were consistently spaced out during the study period. These studies provide an integrated knowledge in the field of food and culture and can be used as a basic material for related research in the future.

    The Study on the Asymmetry of Inertia and Variety-Seeking State - Using Section-Aggregated Multinomial Logit Analysis (관성 및 다양성추구 상태의 비대칭성에 관한 연구 - 구간통합 다항로짓분석을 활용하여)

    • Lee, Seung-yon
      • Knowledge Management Research
      • /
      • v.14 no.1
      • /
      • pp.73-94
      • /
      • 2013
    • Customer's purchase state consists of purchase inertia and variety-seeking. As the growing brand familiarity triggers the increase of brand attractiveness, customers purchase state will be of inertia. However the excessively growing brand familiarity ignites the decrease of brand attractiveness. Followingly the purchase state will be tend to plunge into the variety-seeking state. The main topic of this study is to validate the asymmetric formation of customer's purchase states between inertia and variety-seeking. In order to follow up the main topic, this article introduces a model to freely describe the velocity of value changes depending upon the purchase states. This model will help overcome the limitation of the past studies having been based on the symmetric value changes. Based on this approach marketer will be able to decide the timing of sales promotions. This research utilized local telecommunication carrier's database of smartphone application purchase/download records. This database was collected from two years (2009 and 2010) span, the time when the smartphones started commodifying in Korea whilst most of the past studies had used purchase data of maturity stage products. From this approach utilizing the introduction stage data in the product life cycle, the probability of brand choice depending upon the purchase state on the early-stage can be probed. Moreover, this study tries to expand the research methodology from the other areas of research by knowledge sharing. Here this study introduces the methodology of section-aggregated multinomial logit to simultaneously estimate the parameters that were included in the plural multinomial logit functions while the plural functions were inter-connected. This adoption of section-aggregated multinomial logit model procedures from the computerized statistics areas is expected to nourish the marketing research for more precise analysis and estimation of effects of marketing activities.

    • PDF

    Research Trends of Studies Related to the Nature of Science in Korea Using Semantic Network Analysis (언어 네트워크 분석을 이용한 과학의 본성에 관한 국내연구 동향)

    • Lee, Sang-Gyun
      • Journal of the Korean Society of Earth Science Education
      • /
      • v.9 no.1
      • /
      • pp.65-87
      • /
      • 2016
    • The purpose of this study is to examine Korean journals related to science education in order to analyze research trends into Nature of science in Korea. The subject of the study is the level of Korean Citation Index (KCI-listed, KCI listing candidates), that can be searched by the key phrase, "Nature of science" in Korean language through the RISS service. In this study, the Descriptive Statistical Analysis Method is utilized to discover the number of research articles, classifying them by year and by journal. Also, the Sementic Network Analysis was conducted to Word Cloud Analysis the frequency of key words, Centrality Analysis, co-occurrence and Cluster Dendrogram Analysis throughout a variety of research articles. The results show that 91 research papers were published in 25 journals from 1991 to 2015. Specifically, the 2 major journals published more than 50% of the total papers. In relation to research fields., In addition, key phrases, such as 'Analysis', 'recognition', 'lessons', 'science textbook', 'History of Science' and 'influence' are the most frequently used among the research studies. Finally, there are small language networks that appear concurrently as below: [Nature of science - high school student - recognize], [Explicit - lesson - effect], [elementary school - science textbook - analysis]. Research topic have been gradually diversified. However, many studies still put their focus on analysis and research aspects, and there have been little research on the Teaching and learning methods.

    Weighted Subject - Method Network Analysis of Library and Information Science Studies (문헌정보학 분야 핵심 학술지들의 가중 주제-방법 네트워크 분석)

    • Lee, Keehoen;Jung, Hyojung;Song, Min
      • Journal of the Korean Society for Library and Information Science
      • /
      • v.49 no.3
      • /
      • pp.457-488
      • /
      • 2015
    • In this study, we analyzed the current research state of Library and Information science in top 20 journals from 1990 to 2015, in subject and method perspectives. We developed weighted subject-method network to investigate on centralities of a subject and a method as well as their relations. This network is composed of subject nodes and method nodes and gives a weight on each node by topic occurrence. As a result, for 25 years, management information system, information need analysis, bibliometrics, information policy were top topics. Modeling, literature review, scientific research impact analysis, web data analysis were top methods. A recent rise of text mining is highlighted. We also analyzed communities made from the past 25 years and the recent 5 years. Bibliometrics is extending its field by applying various network analyzing algorithms. Text mining is specialized in medical information system and user interface. This result identifies the interests of excellent studies in Library and Information Science. It also can be fundamental resource for the development of Library and Information Science.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.