• Title/Summary/Keyword: 알고리즘 기반

Search Result 13,923, Processing Time 0.034 seconds

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Analysis of promising countries for export using parametric and non-parametric methods based on ERGM: Focusing on the case of information communication and home appliance industries (ERGM 기반의 모수적 및 비모수적 방법을 활용한 수출 유망국가 분석: 정보통신 및 가전 산업 사례를 중심으로)

    • Jun, Seung-pyo;Seo, Jinny;Yoo, Jae-Young
      • Journal of Intelligence and Information Systems
      • /
      • v.28 no.1
      • /
      • pp.175-196
      • /
      • 2022
    • Information and communication and home appliance industries, which were one of South Korea's main industries, are gradually losing their export share as their export competitiveness is weakening. This study objectively analyzed export competitiveness and suggested export-promising countries in order to help South Korea's information communication and home appliance industries improve exports. In this study, network properties, centrality, and structural hole analysis were performed during network analysis to evaluate export competitiveness. In order to select promising export countries, we proposed a new variable that can take into account the characteristics of an already established International Trade Network (ITN), that is, the Global Value Chain (GVC), in addition to the existing economic factors. The conditional log-odds for individual links derived from the Exponential Random Graph Model (ERGM) in the analysis of the cross-border trade network were assumed as a proxy variable that can indicate the export potential. In consideration of the possibility of ERGM linkage, a parametric approach and a non-parametric approach were used to recommend export-promising countries, respectively. In the parametric method, a regression analysis model was developed to predict the export value of the information and communication and home appliance industries in South Korea by additionally considering the link-specific characteristics of the network derived from the ERGM to the existing economic factors. Also, in the non-parametric approach, an abnormality detection algorithm based on the clustering method was used, and a promising export country was proposed as a method of finding outliers that deviate from two peers. According to the research results, the structural characteristic of the export network of the industry was a network with high transferability. Also, according to the centrality analysis result, South Korea's influence on exports was weak compared to its size, and the structural hole analysis result showed that export efficiency was weak. According to the model for recommending promising exporting countries proposed by this study, in parametric analysis, Iran, Ireland, North Macedonia, Angola, and Pakistan were promising exporting countries, and in nonparametric analysis, Qatar, Luxembourg, Ireland, North Macedonia and Pakistan were analyzed as promising exporting countries. There were differences in some countries in the two models. The results of this study revealed that the export competitiveness of South Korea's information and communication and home appliance industries in GVC was not high compared to the size of exports, and thus showed that exports could be further reduced. In addition, this study is meaningful in that it proposed a method to find promising export countries by considering GVC networks with other countries as a way to increase export competitiveness. This study showed that, from a policy point of view, the international trade network of the information communication and home appliance industries has an important mutual relationship, and although transferability is high, it may not be easily expanded to a three-party relationship. In addition, it was confirmed that South Korea's export competitiveness or status was lower than the export size ranking. This paper suggested that in order to improve the low out-degree centrality, it is necessary to increase exports to Italy or Poland, which had significantly higher in-degrees. In addition, we argued that in order to improve the centrality of out-closeness, it is necessary to increase exports to countries with particularly high in-closeness. In particular, it was analyzed that Morocco, UAE, Argentina, Russia, and Canada should pay attention as export countries. This study also provided practical implications for companies expecting to expand exports. The results of this study argue that companies expecting export expansion need to pay attention to countries with a relatively high potential for export expansion compared to the existing export volume by country. In particular, for companies that export daily necessities, countries that should pay attention to the population are presented, and for companies that export high-end or durable products, countries with high GDP, or purchasing power, relatively low exports are presented. Since the process and results of this study can be easily extended and applied to other industries, it is also expected to develop services that utilize the results of this study in the public sector.

    Basic Research on the Possibility of Developing a Landscape Perceptual Response Prediction Model Using Artificial Intelligence - Focusing on Machine Learning Techniques - (인공지능을 활용한 경관 지각반응 예측모델 개발 가능성 기초연구 - 머신러닝 기법을 중심으로 -)

    • Kim, Jin-Pyo;Suh, Joo-Hwan
      • Journal of the Korean Institute of Landscape Architecture
      • /
      • v.51 no.3
      • /
      • pp.70-82
      • /
      • 2023
    • The recent surge of IT and data acquisition is shifting the paradigm in all aspects of life, and these advances are also affecting academic fields. Research topics and methods are being improved through academic exchange and connections. In particular, data-based research methods are employed in various academic fields, including landscape architecture, where continuous research is needed. Therefore, this study aims to investigate the possibility of developing a landscape preference evaluation and prediction model using machine learning, a branch of Artificial Intelligence, reflecting the current situation. To achieve the goal of this study, machine learning techniques were applied to the landscaping field to build a landscape preference evaluation and prediction model to verify the simulation accuracy of the model. For this, wind power facility landscape images, recently attracting attention as a renewable energy source, were selected as the research objects. For analysis, images of the wind power facility landscapes were collected using web crawling techniques, and an analysis dataset was built. Orange version 3.33, a program from the University of Ljubljana was used for machine learning analysis to derive a prediction model with excellent performance. IA model that integrates the evaluation criteria of machine learning and a separate model structure for the evaluation criteria were used to generate a model using kNN, SVM, Random Forest, Logistic Regression, and Neural Network algorithms suitable for machine learning classification models. The performance evaluation of the generated models was conducted to derive the most suitable prediction model. The prediction model derived in this study separately evaluates three evaluation criteria, including classification by type of landscape, classification by distance between landscape and target, and classification by preference, and then synthesizes and predicts results. As a result of the study, a prediction model with a high accuracy of 0.986 for the evaluation criterion according to the type of landscape, 0.973 for the evaluation criterion according to the distance, and 0.952 for the evaluation criterion according to the preference was developed, and it can be seen that the verification process through the evaluation of data prediction results exceeds the required performance value of the model. As an experimental attempt to investigate the possibility of developing a prediction model using machine learning in landscape-related research, this study was able to confirm the possibility of creating a high-performance prediction model by building a data set through the collection and refinement of image data and subsequently utilizing it in landscape-related research fields. Based on the results, implications, and limitations of this study, it is believed that it is possible to develop various types of landscape prediction models, including wind power facility natural, and cultural landscapes. Machine learning techniques can be more useful and valuable in the field of landscape architecture by exploring and applying research methods appropriate to the topic, reducing the time of data classification through the study of a model that classifies images according to landscape types or analyzing the importance of landscape planning factors through the analysis of landscape prediction factors using machine learning.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.